diff --git "a/test_generations.txt" "b/test_generations.txt" deleted file mode 100755--- "a/test_generations.txt" +++ /dev/null @@ -1,2345 +0,0 @@ -we run mecab on hadoop 11, an open source software that implemented the map-reduce framework, for word segmenting and pos tagging. -to minimize such confusion, a system must separately represent noun phrases, the underlying concepts to which they can refer, and the many-to-many ° can refer to relation between them. -we used a 5-gram language model with modified kneser-ney smoothing and a 5-gram lm with modified kneser-ney smoothing. -for our experiments we use the unlexicalised berkeley parser and the lexicalised form of the stanford parser. -wedekind achieves the reordering by first generating nodes that are connected, that is, whose semantics is instantiated. -our method is based on a theoretically clear statistical model that integrates linguistic, acoustic and situational information. -a key aspect of our approach is the representation of content by phrases rather than entire sentences. -using a similar corpus, tang et al induced sentiment specific word embeddings, for the twitter domain. -in this paper, we propose a case study of analyzing annoying behaviors. -to predict labels, we train conditional random fields, which are directly optimized for splitting. -this paper describes the largest scale annotation project involving the enron email corpus to date. -figure 1 shows the sequence structured lstm of hochreiter and schmidhuber and the treestructured lstm of, illustrating the input, cell and hidden nodes at a certain time step t. -in our work, we focus on supervised domain adaptation. -we then looked at the argument made by shimoyama and her e-type analysis of ihrc. -the verb choice is highly dependent on its usage context which is not consistently captured by local features. -davidov et al introduced the use of term frequency patterns for relationship discovery. -in addition to the joint optimization framework using ilp, we explore pool-based active learning to further reduce the required feedback. -second, we cluster the extracted patterns to identify the semantically related patterns. -in addition, we constructed a chinese dataset to evaluate the generality of the method performance on humor recognition against different languages. -galley et al describe an algorithm for inducing a string-to-tree grammar using a parallel corpus with syntax trees on target side. -the problem of polarity classification has been studied in detail by wilson, wiebe, and hoffmann, who used a set of carefully devised linguistic features. -previous work proved that conditional random fields can outperform other sequence labeling models like memms in abbreviation generation tasks. -predicate vectors are learned from the contexts of preceding arguments, and are required to contribute to the prediction of upcoming arguments. -processing long, complex sentences is challenging. -previous work on adapting temporal taggers focused on scaling up to more languages. -in this paper, we propose a simple but novel approach to automatically generate large-scale pseudo training data for zero pronoun resolution. -we use an extension of the lexrank algorithm to rank sentences. -details on the computation of this code length are given in. -our work presents a method to automatically construct a large corpus of text pairs describing the same underlying events. -for example, cut can be used in the sense of ° cutting costs, which carries with it restrictions on instruments, locations, and so on that somewhat overlap with eliminate. -we adopt this method as well but with no use of manually labeled data in training. -focusing on the adaptability to user and domain changes, we report the results of comparative experiments with two online algorithms and the standard batch approach. -we propose udl, a model for estimating sentence pair semantic similarity. -the taxonomy kernel was trained using the svm package. -named entity recognition ( ner ) is a fundamental problem in natural language processing ( nlp ). -as an alternative, we apply latent semantic analysis to compute a reduced-rank representation. -in standard ptb evaluation, our parser achieved a 1.8 % accuracy improvement over the parser of cite-p-24-1-1, which shows the effect of combining search and learning. -we formulate the problem as a classification task using various linguistic features including tense, mood, aspect, modality, experiencer, and verb classes. -we describe a highly efficient monotone search algorithm. -in this paper we describe the three approaches we submitted to the semantic textual similarity task of semeval 2012. -we also show how this architecture can be used for domain adaptation. -in the first stage, we propose the siamese hierarchical convolutional neural network ( shcnn ) to estimate conversation-level similarity between pairs of closely posted messages. -vogel and tresner-kirsch ( 2012 ) use the logarithm of the frequency for some experimental runs, reporting that it improved accuracy in some cases. -in addition, we illustrate with an example that our method can generate coherent topics even based on only one document. -identifying whether the subject has a disease / symptom. -xiong and zhang attempted to improve lexical coherence via a topic-based model, using a hidden topic markov model to determine the topic in the source sentence. -analysis of social media content for health has been a topic of wide interest. -we incorporate adversarial training into shared space to guarantee that specific features of tasks do not exist in shared space. -in particular, we carefully studied the fastus system of hobbs et al, who have clearly and eloquently set forth the advantages of this approach. -we use the output of the unsupervised pos tagger as a direct replacement for the output of a fully supervised pos tagger for the task of shallow parsing. -punyakanok et al, 2005a, typically involves multiple stages to 1 ) parse the input, 2 ) identify arguments, 3 ) classify those arguments, and then 4 ) run inference to make sure the final labeling for the full sentence does not violate any linguistic constraints. -it also causes little increase in the translation time, and compares favorably to another alternative retrieval-based method with respect to accuracy, speed, and simplicity of implementation. -the user and advisor in this exchange share one belief that we have not represented. -the task of semantic textual similarity is aimed at measuring the degree of semantic equivalence between a pair of texts. -ucca s representation is guided by conceptual notions and has its roots in the cognitive linguistics tradition and specifically in cognitive grammar ( cite-p-11-3-6 ). -in this paper, we expand our translation options by desegmenting n-best lists or lattices. -we present a unifying framework of ° violation-fixing perceptron which guarantees convergence with inexact search. -cite-p-12-5-7 reported a pos tagger based on cyclic dependency network. -we use the term ° word generalization to refer to this problem of associating a word with the meaning at an appropriate category level, given some sample of experiences with the word. -variables tend to be complex rather than atomic entities and expressed as noun phrases containing multiple modifiers, e. g. oxygen depletion in the upper 500 m of the ocean or timing and magnitude of surface temperature evolution in the southern hemisphere in deglacial proxy records. -huang, harper, and wang and huang, eidelman, and harper mainly focused on the generative hmm models. -we compare our method with both competitive neural and non-neural models, including rnnoie, openie4, 5 clausie, and props. -in this work, we propose an alternative way to address the word ambiguity and word mismatch problems by taking advantage of potentially rich semantic information drawn from other languages. -vector-based semantic models can explain a significant portion of systematic variance in the observed neural activity. -we use the substring based approach and obtain this local tagging information by labeling on the substring of the full character sequence. -stance classification is the task of automatically determining from text whether the author of the text is in favor of, against, or neutral towards a target of interest. -we conducted an experimental evaluation on the test collection for single document summarization evaluation contained in the rst discourse treebank distributed by the linguistic data consortium 3. -in order to evaluate the performance of our new co-compositional model with prototype projection and word representation learning algorithm, we make use of the disambiguation task of transitive sentences developed by grefenstette and sadrzadeh. -in this paper we investigate distributed training strategies for the structured perceptron as a means to reduce training times when computing clusters are available. -following chiang, we describe our algorithms in a deductive system. -performance is measured by bleu and ter using the multeval script. -to determine the word classes, one can use the algorithm of brown et al for finding the classes. -we show that these analyses can be obtained without requiring power beyond mildly context-sensitive grammars. -experiments were conducted with four publicly available datasets of conversations from reddit and irc channels. -in this section, we summarize the main ideas of dsms that were proposed in for building semantic networks, which are extended here for the creation of affective networks. -in this paper, we have studied and compared how the web content reacts to bursty events in multiple contexts of web search and online media. -gaustad showed that evaluations using pseudowords can over-estimate the accuracy of a word sense disambiguation system on real data. -the generation of referring expressions is an integral part of most natural language generation systems. -our evaluation demonstrates that scisumm achieves higher quality summaries than a state-of-the-art multidocument summarization system ( cite-p-15-3-4 ). -huang et al use a bilstm with a crf layer in addition to making use of explicit spelling and context features along with word embeddings. -for the english-german experiments, the translation system was trained and tested using the europarl corpus. -for word embeddings, we use averaged word embeddings. -in some sense, our model can be seen as a compromise between the hierarchical phrase-based model and the tree-to-string model, specifically,. -his work has been followed by schwenk, who has shown that neural network language models actually work very well in the state-of-the-art speech recognition systems. -brockett et al consider error correction as a machine translation problem. -but a number of augmentations and changes become necessary when dealing with highly inflected or agglutinative languages, as well as analytic languages, of which chinese is the focus of this article. -experimental results show that the deebrnn model outperforms both feature-based and representation-based state-of-the-art methods in terms of recall and f1-measure. -traditional label propagation is a graph-based semi-supervised learning approach with a single view. -we consider review spam detection for multiple domains ( e. g., hotel and restaurant ) as a multi-task learning problem. -birke and sarkar propose a minimally supervised algorithm for distinguishing between literal and non-literal usages of verbs in context. -our model is also easier to understand than ibm model 4. -to set the model parameters , we used the minimum error rate training algorithm to maximize the f-measure of the 1-best alignment of the model on a development set consisting of sentence pairs with manually generated alignments. -jiang and zhai proposed an instance reweighting framework to take domain shift into account. -our experience with a critiquing system shows that when the system detects problems with the user's performance, multiple critiques are often produced. -in this paper, we study the problem of interpreting visual scenes and rendering their content using natural language. -in this paper, we argue that the extra aspect ( opinion ) information extracted using these previous works can effectively improve the quality of generated reviews. -the system outperforms a comparable publicly available system, as well as a previously published form of our system. -we use a maximum entropy classifier which allows an efficient combination of many overlapping features. -there is a method to automatically learn the interpolation weights but it requires reference phrase pairs which are not easily available. -the word embeddings are pre-trained using the word2vec toolkit on the xinhua portion of the gigaword corpus. -to this end, we propose bridge correlational neural networks ( bridge corrnets ) which learn aligned representations across multiple views using a pivot view. -large language models have been shown to improve quality, especially in machine translation. -following work has been described in the first shared task on language identification in code-switched data held at emnlp 2014. -we calculated the semantic similarity between sense vectors of each target word in the sentence to obtain its sentence vector and score each target word. -charniak 2000 ) describes a different method which achieves very similar performance to. -in this paper, we present an overview of our participation in the timeline generation task of semeval-2015. -we propose an iterative reinforcement framework, and under this framework, review feature words and opinion words are organized into categories in a simultaneous and iterative manner. -chang et al stated that one reason is that the objective function of topic models does not always correlate well with human judgments. -djuric et al were the first to propose a self-taught learning strategy in the context of hateful speech detection, where they simultaneously learn low-dimension representations of documents and words in a common vector space. -sen modeled the topic coherence as the groups of co-occurring entities. -we used the liblinear scikit-learn implementation of support vector machines with ovr, one vs. -in this paper, we present a neural keyphrase extraction framework that exploits conversation context, which is represented by neural encoders for capturing salient content. -all languages use brahmidescended scripts. -our method is based on a decision list proposed by yarowsky. -recently, many accurate statistical parsers have been proposed for english, for japanese ). -current smt systems typically decode with single translation models and can not benefit from the strengths of other models in decoding phase. -goldwater et al used context as a means of avoiding undersegmentation, through a method based on hierarchical dirichlet processes. -our experiment focuses on investigating aspects of predictive opinions by learning lexical patterns and comparing them with judgment opinions. -the three systems are unsupervised and relies on dictionary-based similarity measures. -named entity recognition is the task of identifying named entities in text. -paul s. jacobs : a generator for natural language interfaces. -in this work, we use the rnn abstraction as a building block, and recursively combine several rnns to obtain our tree representation. -we compute the translation probabilities according to the estimated co-occurrence counts, using the standard training method in phrase-based smt. -in this paper, 4 word boundary tags are employed : b ( beginning of a word ), m ( middle part of a word ), e ( end of a word ). -in the first approach, heuristic rules are used to find the dependencies or penalties for label inconsistency are required to handset ad-hoc. -data sparsity is a fundamental problem in natural language processing ( nlp ). -on the larger oceanic data, our model can achieve cluster purity scores of 91. 8 %, while maintaining pairwise recall of 62. 1 %. -we train a phrase-based smt system over the entire parallel corpus. -sgaard and goldberg showed that a higher-level task can benefit from making use of a shared representation learned by training on a lower-level task. -results suggest that the tensor-based methods we propose are more robust than the basic hal model in some respects. -for this task, we use the simplified factual statement extraction toolkit. -experimental results show that the character-level dependency parsing models outperform the word-based methods on all the data sets. -users also subjectively rate the rl-based policy on average 10 % higher. -in this paper, we propose a new approach to detecting erroneous sentences by integrating pattern discovery with supervised learning models. -in this paper, we propose a bi-lstm model with fewer features. -english tweets are automatically identified using a compression-based language identification tool. -zhang et al explore a shallow convolutional neural network and achieve competitive performance. -named entity recognition on mixed case text is easier than on upper case text, where case information is unavailable. -topic correlations in weakly-related collections typically lie in the tail of the topic distribution, where they would be overlooked by models unable to fit large numbers of topics. -universal dependencies is a framework for cross-linguistically consistent treebank annotation. -skadia et al and skadia et al argue that the advantages of comparable corpora in machine translation are considerable and more beneficial than those of parallel corpora. -we use an ensemble technique inspired by bagging. -convolution tree kernel defines a feature space consisting of all subtree types of parse trees and counts the number of common subtrees to express the respective distance in the feature space. -one can then match the peco elements in the query to the elements detected in documents. -stochastic models have been widely used in pos tagging for simplicity and language independence of the models. -the selectional preference distribution is defined in terms of selectional association measure introduced by resnik over the noun classes automatically produced by sun and korhonen. -we used a partitioning algorithm of the cluto library for clustering. -the results demonstrate the superiority of a clustered approach over both traditional prototype and exemplar-based vector space models. -word embeddings have been shown to be useful in nlp tasks. -to this end, we develop a novel summarization system called priorsum to automatically exploit all possible semantic aspects latent in the summary prior nature. -we use svm for sentiment classification. -on the other hand, a deletion-based method does not face such a problem in a cross-domain setting. -as far as we know, our work is the first of its kind. -a bare-bones statistical model is still useful in that it allows us to quantify precise improvements in performance upon the integration of each specific cue into the model. -we propose to exploit entailment relationships holding among re patterns by structuring the candidate set in an entailment graph. -we consider the domain adversarial training network on the user factor adaptation task. -in addition, we extend the sick dataset to include unscored fluency-focused sentence comparisons and we propose a toy metric for evaluation. -we used pre-trained word embeddings from the conll 2017 shared task. -demberg applies a fourth-order hmm to the syllabification task, as a component of a larger german text-tospeech system. -lapata et al demonstrated that the cooccurrence frequency of an adjective-noun combination is the best predictor of its rated plausibility. -this paper discusses semeval-2018 task 5 : a referential quantification task of counting events and participants in local, long-tail news documents with high ambiguity. -words or expressions are aligned using a word similarity model based on a combination of latent semantic analysis and semantic distance in the wordnet knowledge graph. -we plan to explore this possibility in future work. -in this paper, we propose a novel lifelong learning approach to sentiment classification. -contrary to cite-p-21-3-0, we proved that these models can be exactly collapsed into a single backoff language model. -stability is measured across ten randomized embedding spaces trained on the training portion of the ptb ( determined using language modeling splits ( cite-p-16-1-18 ). -experimental results show that our method consistently outperforms various baselines across languages. -to address this drawback, ranking models were proved to be useful solutions. -using multi-word phrases instead of individual words as the basic translation unit has been shown to increase translation performance. -in addition, our system substantially improves upon the baseline presented by silfverberg and hulden. -a pattern is defined as a path between a verb node and any other node in the dependency tree passing through zero or more intermediate nodes. -snow et al showed that crowdsourced annotations can produce similar results to annotations made by experts. -our analysis shows that the high-performance of the acm lies in the asymmetry of the model. -in the second phase, it selects an optimal substitute for each given word from the synonyms according to the context in math-w-3-4-0-74. -reviews depict sentiments of customers towards various aspects of a product or service. -with large amounts of data, phrase-based translation systems achieve state-of-the-art results in many typologically diverse language pairs. -this approach was pioneered by galley et al, and there has been a lot of research since, usually referred to as tree-to-tree, treeto-string and string-to-tree, depending on where the analyses are found in the training data. -in this paper, we propose a simplified assumption of one-tag-per-word. -our hypothesis is that script knowledge may be a significant factor in human anticipation of discourse referents. -we find that topic-sensitive propagation can largely help boost the performance. -related work benamara and dizier present the cooperative question answering approach which generates natural language responses for given questions. -we plan to incorporate such signals in future work. -semantic textual similarity is a core problem in the computational linguistic field. -based on this semi-supervised boosting algorithm, we investigate two boosting methods for word alignment. -phrase structure trees in ctb have been semiautomatically converted to deep derivations in the ccg, lfg, tag and hpsg formalisms. -cahill et al present a method to automatically obtain approximations of ldd resolution for lfg resources acquired from a treebank. -the obtained average observations are set as constraints, and the improved iterative scaling algorithm is employed to evaluate the weights. -the basic idea behind topic models is that documents are mixtures of topics, where a topic is a probability distribution over words. -an analysis of the results has shown that the other approaches appear to be overgeneralizing, at least for this task. -our method works by modifying the attention mechanism of a pointer-generator neural network to make it focus on text relevant to a topic. -noise-contrastive estimation has been successfully used for training neural language models with large vocabularies. -we compute statistical significance using the approximate randomization test. -in this work, we present, at word level, the correlation between perplexity and word frequency. -kurokawa et al showed that french-to-english smt systems whose translation models were constructed from human translations from french to english yielded better translation quality than ones created from translations in the other direction. -following the common practice of domain adaptation research on this dataset, we use news as the source domain and bc, cts, wl as three different target domains. -all the experiments are built using the scikit-learn machine learning library. -users do not know in which terms the categories are expressed, they might query the same concept by a paraphrase. -recnns need a given external topological structure, like syntactic tree. -we develop an iterative distillation method that transfers the structured information of logic rules into the weights of neural networks. -on the one hand, we do not expect such pairs to occur in any systematic pattern, so they could obscure an otherwise more systematic pattern in the high pmi bins. -to evaluate the quality of the spatial representations learned in the previous task, we introduce a task consisting in a set of 1,016 human ratings of spatial similarity between object pairs. -evaluation on a standard data set shows that our method consistently outperforms the supervised state-of-the-art method for the task. -we find that entice is able to significantly increase nell ’ s knowledge density by a factor of 7. 7 at 75. 5 % accuracy. -for retrieving the discussion pages, we use the java wikipedia library, which offers efficient, database-driven access to the contents of wikipedia. -bohnet et al presented a joint approach for morphological and syntactic analysis for morphologically rich languages, integrating additional features that encode whether a tag is in the dictionary or not. -on the base of zhao and ng, chen and ng further investigate their model, introducing two extensions to the resolver, namely, novel features and zero pronoun links. -in the same space, riezler et al develop smt-based query expansion methods and use them for retrieval from faq pages. -hyp is available for download at github.com / sdl-research / hyp. -we used the nave bayes implementation in the weka machine learning toolkit, a support vector machine, and the crf implementation in mallet. -in this paper, we present a statistical analysis model for coordination disambiguation that uses the dual decomposition as a framework. -the dependencies were included in the crf model using a relatively straightforward feature expansion scheme. -phrase table pruning is the task of removing phrase pairs from a phrase table to make it smaller, ideally removing the least useful phrases first. -identifying long-span dependencies between discourse units is crucial to improve discourse parsing performance. -in this paper, we explore the estimation of sense priors by first calibrating the probabilities from naive bayes. -morphologically rich languages ( mrl ) are languages in which much of the structural information is contained at the word-level, leading to high level word-form variation. -in our experiments, we used the kyoto university text corpus 11 and the kyoto university web document leads corpus 12. -a document is represented as a nested tree where each node of the outer tree corresponds to an inner tree. -word embeddings, as a low-dimensional continuous vectors of words, are regarded as an efficient representation of word semantics. -jokinen et al combine a manually built tree for main topics with an n-gram model for topic shifts. -as there is no closed form solution for the maximum likelihood estimate, we resort to iterative training via the em algorithm. -snow et al use syntactic path patterns as features for supervised hyponymy and synonymy classifiers, whose training examples are derived automatically from wordnet. -in, the authors first cluster sentences into topic-specific scenarios, and then focus on building a dataset of causal text spans, where each span is headed by a verb. -a drawback of the previous annotation works is the limitation that only links between expressions in the same or in succeeding sentences are annotated. -in our experiments, we use the english portion of the conll-2012 dataset. -however, the current recursive architecture is limited by its dependence on syntactic tree. -although novelty mining studies have mainly been conducted on the english language, studies on the chinese language have been performed on topic detection and tracking. -our aim is to improve the relation extraction task by considering both the plain text and the layout. -the promoted instances are either added to the initial seed set or used to replace it. -the detection of subjects and objects from japanese sentences is more difficult than that from english, while it is the key process to generate correct english word orders. -for each word math-w-2-6-2-17, we construct a vector math-w-2-6-2-24 of size math-w-2-6-2-27, where math-w-2-6-2-30 is the size of the lexicon. -even though they are related tasks, multilingual skip-gram and cross-lingual sentence similarity models are always in a conflict to modify the shared word embeddings according to their objectives. -alternative expressions of the same meaning, and the degree of their semantic similarity has proven useful for a wide variety of natural language processing applications. -to address the issue of lack of data, zeng et al incorporate multi-instance learning with a piece-wise convolutional neural network to extract relations in distantly supervised data. -word ordering is a fundamental problem in nlp and has been shown to be np-complete in discourse ordering ( cite-p-16-1-1 ) and in smt with arbitrary word reordering ( cite-p-16-3-6 ). -balamurali et al show that senses are better features than words for in-domain sa. -this domain is a simplification of the miniature language acquisition task proposed by feldman et al. -results demonstrate the effectiveness and generality of our approach. -with the extended itg constraints, the coverage improves significantly on both tasks. -pgf is a simple “ machine language ”, to which the much richer gf source language is compiled by the gf grammar compiler. -elsner and charniak, elsner and charniak, elsner and charniak, are presenting a combination of local coherence models initially provided for monologues showing that those models can satisfactorily model local coherence in chat dialogues. -we enabled such large-scale clustering by parallelizing the clustering algorithm, and we demonstrate the usefulness of the gazetteer constructed. -we build a ranking model which successfully mimics human judgments using previously proposed automatic measures. -briscoe et al and copestake illustrate some lexical entries with the qualia structure following pustejovsky and aniek, pustejovsky, 1989 pustejovsky, 1991 pustejovsky. -le and mikolov presented the paragraph vector algorithm to learn a fixed-size feature representation for documents. -we introduce a polylingual topic model that discovers topics aligned across multiple languages. -in the parliament domain, this means ( and is translated as ) “ report. -hpsg is a syntactic theory based on lexicalized grammar formalism. -stolcke et al apply a somewhat more complicated hmm method to the switchboard corpus, one that exploits both the order of words within utterances and the order of dialogue acts over utterances. -table 1 : different types of disfluencies. -experiment results show that our weighted evaluation metrics give more reasonable and distinguishable scores and correlate well with human judgement. -rst tells us that sentences with discourse relations are related to each other and can help us answer certain kinds of questions. -we design our model for ssl as a natural semisupervised extension of conventional supervised conditional random fields. -categorial grammar provides a functional approach to lexicalised grammar, and so can be thought of as defining a syntactic calculus. -several studies have shown encouraging results for wsd based on parallel corpora. -we use the maximum entropy model to train a classifier for wsd and tc tasks. -in recent years, neural lms have become the prominent class of language modeling and have established state-of-the-art results on almost all sufficiently large benchmarks. -one such work is proposed by scaiella et al which uses wikipedia articles to develop a bipartite graph and employs spectral clustering over it to discover relevant clusters. -math word problems form a natural abstraction to a lot of these quantitative reasoning problems. -cite-p-19-5-7 proposed a dynamic distance-margin model to learn term embeddings that capture properties of hypernymy. -it was shown to correlate significantly with human judgments and behave similarly to bleu. -string regeneration can also be viewed as a natural language realization problem. -we show that a “ cluster and label ” strategy relying on these two proposed components generates training data of good purity. -contractor et al, 2010, used an mt model as well but the focus of his work is to generate an unsupervised method to clean noisy text in this domain. -lord et al analyzed the language style synchrony between counselors and clients. -in this paper, we propose a weakly supervised learning framework to mine fine-grained and multiple-typed relations from chinese ugcs. -bethard et al and kim and hovy explore the usefulness of semantic roles provided by framenet for both opinion holder and opinion target extraction. -in this paper we propose an algorithm that utilizes transitivity constraints to learn a globally-optimal set of entailment rules for typed predicates. -in this paper, we propose a novel neural belief tracking ( nbt ) framework to overcome current obstacles to deploying dialogue systems in real-world dialogue domains. -we use an extension of the lexrank algorithm to rank sentences. -domain labels ( such as medicine, architecture and sport ) provide a natural and powerful way to establish semantic relations among word senses, which can be profitably used during the disambiguation process. -to begin, all state sets are initialized to empty and the initial state math-w-2-3-9-140 is put into so ; here _1_ is the end-of-input marker. -it is standard practice to write english language specifications for input formats. -zeng et al proposed a piecewise convolutional neural network architecture, which can build an extractor based on distant supervision. -popescu and etzioni proposed a relaxed labeling approach to utilize linguistic rules for opinion polarity detection. -the fisher kernel is one of the best known kernels belonging to the class of probability model based kernels. -the feature weights are tuned using the pairwise ranking optimization algorithm. -this task usually requires aspect segmentation, followed by prediction or summarization. -latent semantic analysis ( lsa ) is a natural language processing ( nlp ) task. -neubig et al present a bottom-up method for inducing a preorder for smt by training a discriminative model to minimize the loss function on the hand-aligned corpus. -we run parfda smt experiments using moses in all language pairs in wmt15 and obtain smt performance close to the top constrained moses systems. -this work was then extended by to create an unsupervised noisy channel approach using probabilistic models for common abbreviation types and choosing the english word with the highest probability after combining the models. -our model converts the decoding order problem into a sequence labeling problem, i. e. a tagging task. -we made use of examples from the dso corpus and semcor as part of our training data. -on sentences of length 40, our system achieves an f-score of 89. 0 %, a 36 % relative reduction in error over a generative baseline. -bleu is a widely accepted baseline measure of mt quality at the system level and, as such, is an obvious choice for a baseline adequacy metric. -on a data set composed of 1.5 million citations extracted with pubmed, our best model obtains an increase of 28 % for map and nearly 50 % for p @ 5 over the classical language modeling approach. -it is used to support semantic analyses in hpsg english grammar -erg, but also in other grammar formalisms like lfg. -there is a large body of work in the linguistics literature that argues that paraphrases are not restricted to strict synonymy. -as illustrated in figure 1, source and target word embeddings are at the two ends of a long information processing procedure. -uszkoreit et al proposed a distributed system that reliably mines parallel text from large corpora. -in this paper, we present the details of training a global lexical selection model using classification techniques and sentence reconstruction models using permutation automata. -several methods have been proposed, mainly in the context of product review mining. -we studied open ie s output compared with other dominant structures, highlighting their main differences. -zero-extension is known to preserve positive definiteness. -such methods are highly scalable and have been applied in information retrieval, large-scale taxonomy induction, and knowledge acquisition. -articles from current week are clustered monolingually several times a day. -our results show significant improvement over a majority class baseline as well as a more difficult baseline consisting of lexical n-grams. -a similar method is presented in andreevskaia and bergler, where wordnet synonyms, antonyms, and glosses are used to iteratively expand a list of seeds. -in this paper, we explore the use of personalization in the context of voice searches rather than web queries. -we described tweetingjay, a supervised model for detecting twitter paraphrases with which we participated in task 1 of semeval 2015. -evaluation results demonstrate the effectiveness of the proposed methods. -in this setting, where we use both word-level and character-level representations, it is beneficial to use a smaller lstm than in the character-level only setting. -the proposed method is based on a deep learning architecture named long short term memory. -in this paper we present the crotal semantic role labelling system, which has been used in the conll 2009 shared task. -amr parsing is a fundamental task in natural language processing ( nlp ). -as shown in similar to the first step, we use a sequence labelling approach with a crf model. -this paper focuses on unsupervised discovery of intra-sentence discourse relations for sentence level polarity classification. -in recent years, many accurate phrase-structure parsers have been developed. -in this paper, we propose an algorithm for transductive semi-supervised learning. -similarly, riedel et al learn universal schemas by matrix factorization without pre-defined relations. -the research that comes closest to ours is the work of schwenk et al on continuous space ngram models, where a neural network is employed to smooth translation probabilities. -the document-level information and the sentenceto-document relationship are incorporated into the graph-based ranking algorithm. -we hope that these findings can serve as a guide for future research in the field. -in this paper, we focus on the problem of decoding given a trained neural machine translation model. -the conll 2008 shared task was intended to be about joint dependency parsing and semantic role labeling, but the top performing systems decoupled the tasks and outperformed the systems which attempted to learn them jointly. -our second model takes a more conservative approach by additionally penalizing data instances similar to the out-domain data. -in this paper, we demonstrate how computational systems designed to recognize textual entailment can be used to enhance the accuracy of current open-domain automatic question answering ( q/a ) systems. -examples of such neural networks are linear networks, deeper feed-forward neural networks, or recurrent neural networks. -the idea is to perform inference via a linear programming formulation with the features of narratives adopted as soft constraints. -domestic abuse is the 12 th leading cause of years of life lost ( cite-p-17-1-15 ), and it contributes to health issues including frequent headaches, chronic pain, difficulty sleeping, anxiety, and depression ( cite-p-17-1-1 ). -for example, in figure 1, mutating the to no induces the f relation ; mutating cat to carnivore induces the math-w-4-2-0-46 relation. -we demonstrated the value of errant by performing a detailed evaluation of system error type performance for all teams in the conll2014 shared task on grammatical error correction. -we use the wapiti toolkit to train a 5-gram language model on the xinhua portion of the gigaword corpus. -in this paper, we use the wordim353 dataset. -they showed results only on the narrow domain of cooking videos with a small set of predefined objects and actors. -the translations are generated by an in-house phrase-based translations system. -we adopted the second release of the american national corpus frequency data 3, which provides the number of occurrences of a word in the written and spoken anc. -in this paper, we present and make publicly available 1 a new dataset for darknet active domains, which we call it “ darknet usage text addresses ” ( duta ). -we incorporate the recurrent neural network language model as an additional feature into the standard log-linear framework of translation. -we evaluated the model using the wmt data set, computing the ter and bleu scores on the decoded output. -we also computed the inter-annotator agreement via kappa. -the accuracy of the first-stage parser on the standard parseval metric matches that of the ( cite-p-16-3-5 ) parser on which it is based, despite the data fragmentation caused by the greatly enriched space of possible node labels. -the statistical part implements an entropy based decision tree ( c4.5 ). -however, to go beyond tuning weights in the loglinear smt model, a cross-lingual objective function that can deeply integrate semantic frame criteria into the mt training pipeline. -tang et al design user and product preference matrices to tune word representations, based on which convolutional neural networks are used to model the whole document. -as a first step, to test our first hypothesis, we remove the pos blocks with a low probability of occurrence from each query, on the assumption that these blocks are content-poor. -in this paper, we investigate the effect of discriminative reranking to semantic parsing. -when combining with content-related features, most persuasive argumentation features give superior performance compared to the baselines. -the hyp toolkit provides a c++ library and a command line executable. -we use the system described in the literature to compute the lexical and string similarity between two sentences by using a logistic regression model with eighteen features based on n-grams. -some machine learning approaches have been applied to coreference resolution. -it is based on 5-grams with extended kneser-ney smoothing. -we present a pro, a new tuning method for machine translation. -experimental results show that our model achieves significant and consistent improvements as compared with baselines. -the rnn encoder cdecoder model suffers from poor performance when the length of the input sequence is long. -our work is based on the dual supervision framework using constrained non-negative tri-factorization proposed in ( cite-p-17-1-10 ). -however, large parallel corpora are only available for a few language pairs and for limited domains. -in this paper, we investigate whether similarity should be measured on the sense level. -some researchers have applied the rule of transliteration to automatically translate proper names. -morphological disambiguation is the process of assigning a set of morphological features to each word in a text. -mikolov et al extended this model to two languages by introducing bilingual embeddings where word embeddings for two languages are simultaneously represented in the same vector space. -this paper presents a novel model for japanese predicate argument structure ( pas ) analysis. -we propose a divide-and-conquer strategy by decomposing a hypergraph into a set of independent subhypergraphs. -bunescu and mooney, 2007 ) connects weak supervision with multi-instance learning and extends it to relation extraction. -v-measure assesses the quality of a clustering solution against reference clusters in terms of clustering homogeneity and completeness. -liu et al propose two models that capture the interdependencies between two parallel lstms encoding the two sentences for the tasks of recognizing textual entailment and matching questions and answers. -however, they attribute responsibility for non-arbitrariness differently. -the former approach involves adding self-labelled data from the target domain produced by a model trained in-domain. -the temporal relation is dictated by the causal relation. -some systems and zhang et al exploit kinds of extra information such as unlabeled data or other knowledge. -we investigate whether argumentation features derived from a coarse-grained argumentative structure of essays can help predict essays scores. -sentence similarity computation plays an important role in text summarization and social network applications. -the taxonomy kernel was trained using the svm package. -with this approach, we reduce the error rate for english by 33 %, relative to the best existing system. -in particular, we consider conditional random fields and a variation of autoslog. -we employ a random forest classifier, an ensemble of decision tree classifiers learned from many independent subsamples of the training data. -they usually start by selecting the logical facts to express. -chinese is a pro-drop language ( cite-p-21-3-1 ) that allows the subject to be dropped in more contexts than english does. -in the second experiment we show that classification results improve when information on definition structure is included. -on microblogs are short, noisy and informal texts with little context, and often contain phrases with ambiguous meanings. -our baseline system was a phrase-based smt system built with moses using default settings. -we investigate prototype-driven learning for primarily unsupervised sequence modeling. -for parsing, we use the berkeley parser. -in this work, we propose allvec that uses batch gradient learning to generate word representations from all training samples. -we have shown by experiments that large number of deterministic constraints can be learned from training examples, as long as the proper representation is used. -gao et al and moore and lewis apply this method to language modeling, while foster et al and axelrod et al use it on the translation model. -relevant applications deal with numerous domains such as news stories and product reviews. -we use a linear classifier trained with a regularized perceptron update rule as implemented in snow. -all classifiers and kernels have been implemented within the kernel-based learning platform called kelp. -our experiments demonstrate that r ealm outperforms these approaches on sparse data. -in recent years, the development of large-scale knowledge bases, such as freebase, provides a rich resource to answer open-domain questions. -approaches to dependency parsing either generate such trees by considering all possible spanning trees, or build a single tree by means of shift-reduce parsing actions. -in this paper, we present the methods we used while participating in the 2016 clinical tempeval task. -we perform an analysis of humans perceptions of formality in four different genres. -in this paper, we present a system that automatically extracts the pros and cons from online reviews. -we use the svm light implementation of the svm toolkit. -this paper presents a novel approach to semantic grounding of noun phrases within tutorial dialogue for computer programming. -this process continues until the two base rankers can not learn from each other. -choi et al used an integer linear programming approach to jointly extract entities and relations in the context of opinion oriented information extraction. -pereira et al use an information-theoretic based clustering approach, clustering nouns according to their distribution as direct objects among verbs. -in addition, the average accuracy of the classifier is 81. 5 % on the sentences the judges tagged with certainty. -in order to evaluate the method, we applied the results of topic detection to extractive multi-document summarization. -data-to-text generation is the task of automatically generating text from non-linguistic data. -we show that simple, unsupervised models using web counts can be devised for a variety of nlp tasks. -in particular, we define the task of classifying the purchase stage of each tweet in a user s tweet sequence. -cohn et al, the annotators were instructed to distinguish between sure and possible alignments, depending on how certainly, in their opinion, two predicates describe verbalizations of the same event. -it requires a high-risk strategy combining heightened learning rate and greedy processing of the context. -distinguishing between antonyms and synonyms is a key task to achieve high performance in nlp systems. -most existing vector space models are based on the traditional vector space model. -for the language model, we use the gaussian prior smoothing method. -several recent studies use high-level information to aid local event extraction systems. -li et al rank a set of candidate points of interest using language and temporal models. -brown clustering is a commonly used unsupervised method for grouping words into a hierarchy of clusters. -dinu and lapata propose a probabilistic framework that models the meaning of words as a probability distribution over latent factors. -we use the penn discourse treebank, the largest available manually annotated corpora of discourse on top of one million word tokens from the wall street journal. -then, the variable size boews are aggregated into fixed-length vectors by using fk. -5 ) transfer the semantic difference vector to the probability distribution over similarity scores by fully-connected neural network. -we used the svm-light-tk toolkit to train the reranker with kneser-ney smoothing. -maas et al presented a probabilistic model that combined unsupervised and supervised techniques to learn word vectors, capturing semantic information as well as sentiment information. -we evaluate the proposed triangulation method through pivot translation experiments on the europarl corpus, which is a multilingual corpus including 21 european languages widely used in pivot translation work. -in later work, this idea was applied to the disambiguation of translations in a bilingual dictionary. -in this paper, we study the problem of topic modeling for hypertexts. -zhang and clark proposed a graph-based scoring model, with features based on complete words and word sequences. -the current release of the odin ( online database of interlinear text ) database contains over 150,000 linguistic examples in the form of interlinear glossed text ( igt ). -probabilistic context-free grammars are commonly used in parsing and grammar induction systems. -chambers and jurafsky model narrative flow in the style of schankian scripts. -in all the above models, the word embeddings and the weights of the compositional layers are optimized against a task-specific objective function. -zelenko et al and culotta and sorensen used tree kernels for relation extraction. -gao et al modeled interestingness between two documents with deep nns. -this paper describes our deep learning system for sentiment analysis of tweets. -in this paper, we propose a novel inter-weighted layer to measure the importance of each word. -the tweets are tokenized using the cmu pos tagger. -vuli et al and sun et al apply a clustering algorithm to the input words and measure how well the clusters correspond to the word groupings in verbnet via purity and collocation. -although there are many language resources on the internet, most intercultural collaboration activities still lack multilingual support. -semantic relatedness is the task of determining which words in a text refer to the same real-world entity. -we report bleu as the main evaluation metric of the question generation systems. -we use negative sampling to approximate softmax in the objective function. -as for english, we used a pretrained google news word embeddings 2, which has shown high performance in several word similarity tasks. -distributional semantics is based on the hypothesis that words co-occurring in similar contexts tend to have similar meaning. -this representation is the basis for the lexical-semantic level that is included in the kr component. -tag is a class of tree rewriting systems, and a derivation relation can be defined on strings in the following way. -details about svm and kfd can be found in. -we used the pre-trained glove word embeddings. -this paper presents novel methods to improve neural entity recognition tasks. -the pinchak and lin system is unable to assign individual weights to different question contexts, even though not all question contexts are equally important. -here, °clothed is the pun and °closed is the target. -however, due to the incremental nature of shift-reduce parsing, the right-hand side constituents of the current word can not be used to guide the action at each step. -lebret et al used a conditional neural language model to generate the first sentence of a biography. -we use the movie reviews dataset from zaidan et al that was originally released by pang and lee. -on simlex999, our model is superior to six strong baselines, including the state-of-the-art word2vec skip-gram model by as much as 5. 5–16 % in spearman ’ s score. -instead, our source and target domains were taken from specifications in, which we assumed to ensure a more stratified and generally applicable set of domains involved in meaning shifts. -we use the maximum entropy segmenter of to segment the chinese part of the fbis corpus. -this is, in part, inspired by the recent conll shared task, which was the first evaluation of syntactic and semantic dependency parsing to include unmarkable nominals. -in this paper, we study a parsing technique whose purpose is to improve the practical efficiency of rcl parsers. -next, we use tensor factorization to perform tensor decomposition, and the representations of reviewers and products are embedded in a latent vector space by collective learning. -in the context of this discussion, we will refer to the target partitions, or clusters, as classes, referring only to hypothesized clusters. -the pdtb is the largest corpus annotated for discourse relations, formed by newspaper articles from the wall street journal. -topic models alone can not model the dynamics of a conversation. -recently, new reordering strategies have been proposed such as the reordering of each source sentence to match the word order in the corresponding target sentence, see kanthak et al and. -we study which factors contribute to the uptake of ( hip hop-related ) anglicisms in an online community of german hip hop fans over a span of 11 years. -we introduce a novel graph that incorporates three fine-grained relations. -experimental results show that our model outperforms state-of-the-art methods in both the supervised and semi-supervised settings. -paraphrase identification is a fundamental task in natural language processing ( nlp ). -reiter and frank exploit linguistically-motivated features in a supervised approach to distinguish between generic and specific nps. -below, i show how this can be done by extending a k-dnf 4 learner of to a paradigm-learner. -to reduce the search space, we add a transition to an existing non-projective parsing algorithm. -we have extracted paraphrase rules from our annotations using the grammar induction algorithm from cohn and lapata. -cite-p-18-1-11 proved that leveraging topics at multiple granularity can model short texts more precisely. -the remaining passages are clustered using a combination of hierarchical clustering and n-bin classification. -with the aid of this tool, a domain expert reduced her model building time from months to two days. -in this paper, we adopt partial-label learning with conditional random fields to make use of this valuable knowledge for semi-supervised chinese word segmentation. -word embeddings are based on low-dimension vectors representing the features of the words, captured in context. -zelenko et al proposed a kernel between two parse trees, which recursively matches nodes from roots to leaves in a top-down manner. -in addition, we use the pre-trained word embeddings available from google 5 as input features for our convolutional neural network. -random forest algorithm is a decision tree algorithm which uses multiple random trees to vote for an overall classification of the given input. -rationales are never given during training. -these include the karma system and the att-meta project. -we extract hierarchical rules from the aligned parallel texts using the constraints developed by chiang. -in this paper, we incorporate the mers model into a stateof-the-art linguistically syntax-based smt model, the tree-to-string alignment template model. -for the evaluation, we used the same measures as brent, venkataraman and goldwater, namely token precision, recall and f-score. -in this demo paper, we present need4tweet, a twitterbot for named entity extraction ( nee ) and disambiguation ( ned ) for tweets. -for example, in they proposed a corpus-based sentence similarity measure as a function of string similarity, word similarity and common word order similarity. -this representation is the basis for the lexical-semantic level that is included in the kr component. -luong and manning, 2015 ) adapts an already existing nmt system to a new domain by further training on the in-domain data only. -the results show that 98. 3 % of distractors generated by our methods are reliable. -we explore whether coreference can improve the learning process. -similar to their work, we further integrate the multi-word phrasal lexical disambiguation model to the n-gram prediction model, paraphrase model and translation model of our system. -in this paper, we propose a supervised srl system. -our experimental results show that our proposed approach performs well for sentence dependency tagging. -in addition, we have compared the results with a system which translates selected document features. -cohn and lapata formulated sentence compression as a tree-to-tree rewrite problem. -in particular, the vector-space word representations learned by a neural network have been shown to successfully improve various nlp tasks. -bethard et al and kim and hovy explore the usefulness of semantic roles provided by framenet for both opinion holder and opinion target extraction. -our decoder is implemented as a cascade of weighted finite-state transducers using the functionalities of the openfst library. -fung and cheung present the first exploration of very non-parallel corpora, using a document similarity measure based on bilingual lexical matching defined over mutual information scores on word pairs. -under the nist measure, we achieve results in the range of the state-of-the-art phrase-based system of koehn et al for in-coverage examples of the lfgbased system. -these make docchat as a general response generation solution to chatbots, with high adaptation capability. -autotutor eschews the pattern-based approach entirely in favor of a bag-of-words lsa approach. -le and mikolov introduced paragraph-level vectors, a fixed-length feature representation for variable-length texts. -through extensive experiments on real-world datasets, we demonstrate the effectiveness of neuraldater over existing state-of-the-art approaches. -a wide variety of language problems can be treated as or cast into a tree annotating problem. -in the no context, partial profile and full profile conditions, annotators often selected the °neutral option ( x-axis ) when the model inferred the true label was ° clinton or °trump ( y-axis ). -alternation is a pattern in which a number of words share the same relationship between? a pair of senses. -barzilay and mckeown obtained paraphrases from a monolingual parallel corpus using a co-training algorithm. -in this paper, we propose to improve the robustness of nmt models with adversarial stability training. -in this paper, we propose computable measures to capture genre-specific text quality. -cite-p-20-1-22 used a crf sequence modeling approach for deletion-based abbreviations. -therefore, the size of the corpora used in some previous approaches leads to data sparseness, and the extraction procedure can therefore require extensive smoothing. -in this paper, we propose a temporal orientation measure based on language in social media. -in section 4, we present a tool to efficiently access wikipedia ’ s edit history. -we implement a hierarchical phrase-based system similar to the hiero and evaluate our method on the chinese-to-english translation task. -early works primarily assumed a large parallel corpus and focused on exploiting them to project information from high-to low-resource. -in the most general case, initial anchors are only the first and final sentence pairs of both texts. -beaufort et al combined a noisy channel model with a rule-based finite-state transducer and got reasonable results on french sms, but have not tried their method on english text. -blitzer et al introduced an extension to a structural correspondence learning algorithm, which was specifically designed to address the task of domain adaptation. -we show that directly driving itg induction with a crosslingual semantic frame objective function helps to further sharpen the itg constraints, but still avoids excising relevant portions of the search space, and leads to better performance than either conventional itg or giza++ based approaches. -the words appearing in vocabulary are indexed and associated with high-dimensional vectors. -for example, “ reserate ” is correctly included in c rown as a hypernym of unlock % 2:35:00 : : ( to open the lock of ) and “ awesometastic ” as a synonym of fantastic % 3:00:00 : extraordinary:00 ( extraordinarily good or great ). -bidirectional long short-term memory ( blstm ) recurrent neural network ( rnn ) has been successfully applied in many tagging tasks. -recent approaches try to minimize the amount of supervision needed ( cite-p-20-3-15, cite-p-20-1-1, cite-p-20-3-12 ). -previous results suggest that some degree of tokenization is helpful when translating from arabic. -in this paper, we propose a novel microblog search task called microblog event retrieval. -dependency path is the shortest path between the two entities in a dependency parse graph and has been shown to be important for relation extraction. -for the decoder, we use a recurrent neural network language model, which is widely used in language generation tasks. -our logistic regression model improves f1-scores by over 80 % in comparison to state-of-the-art approaches. -f is the non-linear activation function and we use relu in this paper. -lui and baldwin showed that it is relatively easy to attain high accuracy for language iden-, and later shown to be effective for feature selection in text categorization. -the learning technique follows other representation learning algorithms in using negative sampling. -we use the word2vec tool to train word embeddings on the target side of the parallel corpus. -in addition, combining the relevance feedback and pseudo-relevance feedback, the induction process can be guided to induce more relevant semantic patterns. -we used word2vec to pretrain word embeddings at token level. -grefenstette and nioche and jones and ghani use the web to generate corpora for languages where electronic resources are scarce, while resnik describes a method for mining the web for bilingual texts. -resolving coordination ambiguity is a difficult task. -gedigian et al used hand-annotated corpora to train an automatic metaphor classifier. -by removing the tensor s surplus parameters, our methods learn better and faster. -gp is a non-parametric model which allows for powerful modelling of the underlying intensity function. -in the initial formulation of velldal, an svm classifier was trained using simple n-gram features over words, both full forms and lemmas, to the left and right of the candidate cues. -in this paper, we describe a cross-domain sentiment classification method using an automatically created sentiment sensitive thesaurus. -ganin et al proposed an adversarial network for domain adaptation. -experimental results show that our model achieves the state-of-the-art performance on the benchmark dataset. -experimental results show that the composite kernel outperforms the previously best-reported methods. -pado and lapata proposed a semantic space based on dependency paths. -more recently, alkanhal et al wrote a paper about a stochastic approach used for word spelling correction and attia et al created a dictionary of 9 million entries fully inflected arabic words using a morphological transducer. -we show that the modality attention based model outperforms other state-of-the-art baselines when text was the only modality available, by better combining word and character level information. -galley and manning propose a shift-reduce style method to allow hieararchical non-local reorderings in a phrase-based decoder. -we compare the entity and relation extraction performance of our model with other systems. -this is also the strategy pursued in recent work on deep learning approaches to nlp tasks. -adaptor grammars is a non-parametric bayesian framework for performing grammatical inference over parse trees. -we show that standard intrinsic metrics such as f-score alone do not predict the outcomes well. -shi and mihalcea propose the integration of verbnet, wordnet and framenet into a knowledge base and use it in the building of a semantic parser. -the results point to ways in which dialogue systems can effectively leverage affective channels to improve dialogue act classification. -in this paper, we focus on learning the plan elements and the ordering constraints between them. -we apply sixteen feature templates, motivated by ratnaparkhi. -coreference trees are not given in the training data, we assume that these structures are latent and use the latent structured perceptron as the learning algorithm. -one of the first studies on acquisition of hyponymy relations was made by hearst. -an analysis of the experimental results showed that the extrinsic evaluation captured a different dimension of translation quality than that captured by manual and automatic intrinsic evaluation. -we use the mstparser to generate k-best lists, and optimize k and on the development set. -to overcome the independence assumptions imposed by the bilstm and exploit these kind of labeling constraints in our arabic segmentation system, we model label sequence logic jointly using conditional random fields. -we evaluate the reliability of these candidates using simple metrics based on co-occurrence frequencies, similar to those used in associative approaches to word alignment. -in this paper, we propose a practical technique that addresses this issue in a web-scale language understanding system : microsoft ’ s personal digital assistant cortana. -we believe this work is useful for a variety of applications. -higher-order dependency features are known to improve dependency parser accuracy. -different smt systems for subtitles were developed in the framework of the sumat project, 6 including serbian and slovenian. -this paper presents an empirically motivated theory of the discourse focusing function of accent. -bouchard-ct et al employ a graphical model to reconstruct the proto-word forms from the synchronic word-forms for the austronesian language family. -we also provide interesting future directions, which we believe are fruitful in advancing this field by building high-quality tweet representation learning models. -to address this issue, we propose a hierarchical neural network to incorporate global user and product information into sentiment classification. -vector space word representations learned using unsupervised algorithms are often effective features in supervised learning methods. -louis and nenkova define genre-specific and general features to predict the article quality in science journalism domain. -jindal and liu used pattern mining to identify comparative sentences in a supervised learning setting. -such grammars are, however, typically created manually, which is time-consuming and error-prone. -in conclusion, we presented a sequence of ‘ negative ’ results culminating in a ‘ positive ’ one – showing that while most invented languages are effective ( i. e. achieve near-perfect rewards ). -we obtained bleu scores for e2f direction as shown in table 2. -in this paper we propose a statistical model for measure word generation for englishto-chinese smt systems, in which contextual knowledge from both source and target sentences is involved. -anderson et al found no advantage in decoding neural activity patterns associated with concrete words for image-based models. -when the large-scale bilingual corpus is unavailable, some researchers acquired class-based alignment rules with existing dictionaries to improve word alignment. -the dependency structure is according to stanford dependency. -in this paper, we address the influence of text type and domain differences on text prediction quality. -in this paper, we present the first completely data-driven approach for generating high level summaries of source code. -hearst extracted information from lexico-syntactic expressions that explicitly indicate hyponymic relationships. -experimental results show that our model is able to extract a wide variety of major life events. -in this paper, we approach the problem of verb alternations from the perspective of verb alternations. -our baseline system is a phrase-based system using btgs, which includes a contentdependent reordering model discriminatively trained using reordering examples. -we describe tweetingjay, a system for detecting paraphrases and semantic similarity of tweets, with which we participated in task 1 of semeval 2015. -unlike a conventional cnn which considers a whole text as input, the proposed regional cnn uses an individual sentence as a region, dividing an input text into several regions such that the useful affective information in each region can be extracted and weighted according to their contribution to the va prediction. -this feature is produced using nltk to generate the lemma of each word according to its tagged pos. -in this work, we focus on tree-structured representations for semantics. -kyoto-nmt implements the sequence-to-sequence model with attention mechanism first proposed in as well as some more recent improvements. -co-training methods exploit predicted labels on the unlabeled data and select samples based on prediction confidence to augment the training. -we demonstrate how it can be used to improve existing applications in information retrieval and summarization. -we use the specialist ensemble learning framework to combine component similarities into the relation strength for clustering. -graph-based models and transition-based models are two dominant paradigms in the dependency parsing community. -style transfer is the task of automatically transforming a piece of text into another. -the goal of semantic parsing is to map text into a complete and detailed meaning representation. -johnson showed that word segmentation accuracy improves if the model can learn different consonant sequences for word-inital onsets and wordfinal codas. -this paper presents a dependency parsing scheme using an extended finite state approach. -finkel and manning propose a hierarchical bayesian extension of this idea. -as mentioned above, the baseline model is a char-lstm-lstm-crf model. -language is a natural language processing ( nlp ) task ( cite-p-12-1-8 ), which aims at determining the correct language ( cite-p-12-3-6, cite-p-12-3-6, cite-p-12-3-6, cite-p-12-3-6, cite-p-12-3-6, cite-p-12-3-6, cite-p-12-3-6, cite-p-12-3-6 ). -results also indicate that learning character-level representations from the data is beneficial as the char -lstm joint model significantly outperforms the baselines used in prior work. -the vocabulary size of the participants was measured using a japanese language vocabulary evaluation test. -we propose a new algorithm for semi-supervised text categorization. -the standard way to handle this problem is to handcraft a finite set of features which provides a sufficient summary of the unbounded history. -feature function scaling factors m are optimized based on a maximum likelihood approach or on a direct error minimization approach. -however, one disadvantage of their models, as in, is that their time complexity is cubic in the number of tokens in the sentence. -for word embeddings, we use averaged word embeddings. -entrainment is a natural language processing ( nlp ) task ( cite-p-12-1-8 ), which aims at determining the correct entrainment of a sentence ( cite-p-12-1-8, cite-p-12-1-8, cite-p-12-1-8, cite-p-12-1-8, cite-p-12-1-8, cite-p-12-1-8, cite-p-12-1-8 ). -we introduce significance-based selection, which reduces model size, but also improves perplexity for several smoothing methods, including katz back-off and absolute discounting. -in this paper, we introduce a randomized greedy algorithm that can be easily used with any rich scoring function. -we present plato, a simple and scalable entity resolution system that leverages unlabeled data to produce state-of-the-art results. -recent applications of tree-adjoining grammar ( tag ) to the domain of semantics as well as new attention to syntactic phenomena have given rise to increased interest in more expressive and complex multicomponent tag formalisms ( mctag ). -we extend a recurrent neural network language model so that its output can be conditioned on a featurized melody. -the questions generated by our model help to improve a strong extractive qa system. -in order to acquire class attributes in particular, a common strategy is to first acquire attributes of instances, then aggregate or propagate attributes, from instances to the classes to which the instances belong. -experimental results show that our proposed framework outperforms the state-of-the-art baseline by over 7 % in f-measure. -the transferable knowledge was assimilated in terms of selective labeled instances from different source domain to form a k-class auxiliary training set. -we have crowdsourced a dataset of more than 14k comparison paragraphs comparing entities from a variety of categories such as fruits and animals. -we look for further alternatives in wordnet, which has previously been widely used to find semantically related words. -this paper presents a new method for systematically organizing a large set of such phrases. -this hypothesis is the foundation for distributional semantics, in which words are represented by context vectors. -the word embeddings are pre-trained using the fasttext toolkit. -xue et al adopted the noisy-channel framework for normalisation of microtext and proved that it is an effective method for performing normalisation. -then we automatically induce the english word sense correspondence to l2. -multi-layer attention aims to capture multiple word dependencies in partial trees for action prediction. -as shown in similar to the first step, we use a sequence labelling approach with a crf model. -finding linguistic transformations which can be applied reliably and often is a challenging problem for linguistic steganography. -our main corpus is europarl, specifically portions collected over years 1996 to 365 1999 and 2001 to 2009. -we use the word2vec tool to train the word embeddings on the xinhua portion of the gigaword corpus. -propbank encodes propositional information by adding a layer of argument structure annotation to the syntactic structures of verbs in the penn treebank. -in a previous study, we used this paradigm for collecting data on how humans elicit feedback in humancomputer dialogue. -as for experiments, state-of-the-art svm and knn algorithms are employed for topic classification. -in this section, we will discuss these four conversational agents briefly. -since words in any language are grounded to the english wikipedia, the corresponding wikipedia categories and freebase types can be used as language-independent features. -the key assumption is that redundancy provides a reliable way of generating grammatical sentences. -using the cluster-pair representations, our network learns when combining two coreference clusters is desirable. -experimental results show that our approach produces sentences that are both relevant and readable. -many attempts have been made along these lines, as for example brill and goto et al, with some claiming performance equivalent to lexicon-based methods, while kwok reports good results with only a small lexicon and simple segmentor. -subtasks a and b should give participants enough tools to create a cqa system to solve the main task. -in the training phase, a sample is then selected from the system outputs and provided with the correct interpretation by a human expert. -we use the maximum entropy model to train a classifier for wsd and tc tasks. -section 5 compares our method with the previous work from the viewpoint of feature exploration. -ties and relations in a knowledge base ( kb ) by jointly embedding the union of all available schema types anot only types from multiple structured databases ( such as freebase or wikipedia infoboxes ), but also types expressed as textual patterns from raw text. -in this paper, we investigate binary polarity classification ( positive vs. negative ). -nlp annotation projects employ guidelines to maximize inter-annotator agreement. -f-structures and udrss are underspecified syntactic and semantic representations, respectively. -however, some phrases have two or more idiomatic meanings without context. -we use the stanford corenlp package for tokenization and sentence splitting. -this framework facilitates detailed research into evaluation metrics and will therefore provide a productive research tool in addition to the immediate practical benefit of improving the fluency and readability of generated texts. -they used babelnet synsets to identify semantic concepts and disambiguate words using the word sense disambiguation system babelfy. -in this study, two classes of metrics were adopted for evaluating rc datasets : prerequisite skills and readability. -in this paper, we propose a nonlinear model for the quality of translation hypotheses based on neural networks, which allows more complex interaction between features. -opinionfinder ( cite-p-8-1-11 ) is a system for mining opinions from text. -categories are generated using a novel graph clique-set algorithm. -this is mainly caused by the flexible word ordering and the existence of the large number of synonyms for words. -the oov tokens that should be considered for normalization are referred to as ill-formed words. -hearst handcrafted a set of lexico-syntactic paths that connect the joint occurrences of x and y which indicate hypernymy in a large corpus. -this paper proposes a knowledge representation model and a logic proving setting with axioms on demand successfully used for recognizing textual entailments. -named-entity recognition ( ner ) is a fundamental task in natural language processing ( nlp ). -for benchmarking the progress, we filter a collection of these paragraphs to create a test set, on which humans perform with an accuracy of 94. 2 %. -current research in summarization focuses on processing short articles, primarily in the news domain. -the growth of online social networks provides the opportunity to analyse user text in a broader context. -we first initialize the model parameters by sampling from glorot-uniform distribution. -entity linking is the task of mapping an entity mention in a text to an entity in a knowledge base. -current smt systems typically decode with single translation models and can not benefit from the strengths of other models in decoding phase. -this limitation is already discussed in and in, in which bilingual extensions of the word2vec architecture are proposed. -zhou et al proposed attention-based, bidirectional lstm networks for relation classification. -this is partly because the description of an individual event can spread across several sentences. -it was resulted in that using only adjectives as features actually results in much worse performance than using the same number of most frequent unigrams. -experimental results show that the proposed top-down parser achieves competitive results with other data-driven parsing algorithms. -recently, there has been increasing awareness of the need for appropriate handling of multiword expressions in nlp tasks. -a similar effort was also made in the eurowordnet project. -to minimize such confusion, a system must separately represent noun phrases, the underlying concepts to which they can refer, and the many-to-many “ can refer to ” relation between them. -we use bleu and ter to show the statistical decisions in eqn. -we illustrate that it is possible to measure diachronic semantic drifts within social media and within the span of a few years. -su et al propose a system which can not only detect, but also rephrase abusive language in chinese. -within each feature subspace, using only the basic unit features can already give reasonably good performance. -in the future, we plan to work towards our long-term goal, i. e., including more linguistic information in the skl framework. -we tested our model on a semantic role labeling benchmark, using propbank annotations and automatic charniak parse trees as provided for the conll 2005 evaluation campaign. -bert is a language model trained on the xinhua portion of the gigaword corpus using the transformer network. -as new instructions are given, the instruction history expands, and as the agent acts the world state changes. -small et al showed that paraphrased repetition is just as effective as verbatim repetition. -experiments were run with a variety of machine learning algorithms using scikit-learn. -al-onaizan and knight compare a grapheme-based approach, a phoneme-based approach and a linear combination of both for transliteration. -katiyar and cardie presented a standard lstm-based sequence labeling model to learn the nested entity hypergraph structure for an input sentence. -zhao and ng use the learning-based model to locate and resolve zero anaphoras. -phological analysis is to reduce the sparse data problem in under-resourced languages. -however, these models require a large corpus of dialogues to learn effectively. -after the contest, we tuned the parameter used in the simple bayes method, and it obtained higher precision. -for instance, ®seq-kd + seq-inter + word-kd in table 1 means that the model was trained on seq-kd data and fine-tuned towards seq-inter data with the mixture cross-entropy loss at the word-level. -in this paper, we propose a new method for automatically creating datasets for the offline evaluation of job posting similarities. -we also show that conditioning the generation on topic models makes generated responses more relevant to the document content. -we compare the system summaries with the manual summaries using the rouge-1 metric. -the incorrectly predicted tags are shown with the symbol. -in this paper, we study the relative merits of these approaches. -grosz and sidner classify cue phrases based on changes to the attentional stack and intentional structure found in their theory of discourse. -the performance of many natural language processing tasks, such as shallow parsing and named entity recognition, has been shown to depend on integrating many sources of information. -sadamitsu et al proposed a bootstrapping method that uses unsupervised topic information estimated by latent dirichlet allocation to alleviate semantic drift. -in this paper, we present a simple approach to unsupervised semantic role labeling. -we present the first computational study of word generalization integrated within a word-learning model. -other than bengali, the works on hindi can be found in li and mccallum with crf and cucerzan and yarowsky with a language independent method. -on wmt ’ 16 english-romanian translation we achieve accuracy that is very competitive to the current state-of-the-art result. -the framenet database, fillmore et al, is an english lexical resource based on the description of some prototypical situations, the frames, and the frame-evoking words or expressions associated to them, the lexical units. -there has also been efforts to automatically solve school level math word problems. -in this paper, we report the effect of corpus size on case frame acquisition for discourse analysis in japanese. -we use an lstm-based neural language model with class-based input rather than words. -wilson et al present an approach to classify contextual polarity building on a two-step process. -goyal et al generate a lexicon of patient polarity verbs that imparts positive or negative states on their patients. -for the simulation, dps will be autonomous conversational agents with a cognitive state consisting of goals, a notion of their expected behaviour in a political interview, priorities, and some knowledge of the world. -our results show that self-training is of substantial benefit for the problem. -section 4 describes our main contribution, a new approach to cross-language text classification based on structural correspondence learning. -in this paper, we introduce word appearance in context. -culotta and sorensen proposed a tree kernel for dependency trees. -in this paper, we evaluate various features and the domain effect on sentimental polarity classification. -we use the morfessor baseline and the morfessor categories-map algorithms. -the parsing model used is essentially that of chiang, which is based on a highly restricted version of tree-adjoining grammar. -these results were corroborated by lembersky et al, 2012a lembersky et al, 2013, who showed that translation models can be adapted to translationese, thereby improving the quality of smt even further. -we investigate style accommodation in online discussions, in particular its interplay with content agreement and disagreement. -in this paper, we propose a simple generative model for distributional induction of hierarchical linguistic structure. -in nmt, bahdanau et al first proposed to use an attention mechanism in the decoder. -we tested our wsd program, named lexas, on both a common data set used in previous work, as well as on a large sense-tagged corpus that we separately constructed. -we evaluated the performance of the composition models on the test split of the dataset, using the rank evaluation proposed by baroni and zamparelli. -cucerzan and yarowsky learn a postagger from existing linguistic resources, namely a dictionary and a reference grammar, but these resources are not available, much less digitized, for most under-studied languages. -the feature set used in assert is a combination of features described in gildea and jurafsky as well as those introduced in pradhan et al, surdeanu et al, and the syntactic-frame feature proposed in. -most related work in the field of abusive language detection has focused on detecting profanity using list-based methods to identify offensive words. -latent dirichlet allocation is a widely used type of topic model in which documents can be viewed as probability distributions over topics. -we applied the algorithm of galley et al to extract tree-to-string translation rules. -at competition time, we achieved the sixth best result on the task from a set of twelve systems. -for each node n, state is assigned a state of ag as specified above. -graph unification remains the most expensive component of unification-based grammar parsing. -sangati et al proposed a k-best generative reranking algorithm for dependency parsing. -the maximum entropy model estimates the probability of a time-bin given the observed medical event. -our parsing model is the discriminatively trained, conditional random field-based context-free grammar parser of. -labeledlda outperforms co-training, increasing f 1 by 25 % over ten common entity types. -we present a novel active learning framework for smt which utilizes both labeled and unlabeled data. -stochastic optimality theory ( cite-p-17-1-2 ) is a widely-used model in linguistics that did not have a theoretically sound learning method previously. -this method allows us to exploit the dependency between different unsupervised annotations to further improve the accuracy of the entire set of annotations. -whitehill et al proposed a probabilistic model to combine labels from both human labelers and automatic classifiers in image classification. -experiment results show that the proposed structural topic model can effectively discover topical structures in text, and the identified structures significantly improve the performance of tasks such as sentence annotation and sentence ordering. -it was first implemented in chinese word segmentation by using the maximum entropy method. -in section 4, we propose a na? ve rule based approach to detect thwarting. -in this work, we deal with the problem of detecting a textual review as spam or not, i. e., non-spam. -on the macro-averaged f 1 -measure, our lexical classifier outperformed the majority-class baseline by 0.33 ( on b eetle ) and 0.18 ( on s ci e nts b ank ) and by 13 % and 3 % on accuracy. -the performance is always evaluated on a test set from the same domain as the training set. -we use the penn arabic treebank for labeling the data. -morfessor is a family of methods for unsupervised morphological segmentation. -this paper presents results on part-of-speech tagging spanish-english code-switched discourse. -in this paper, we propose a method to select the appropriate insertion position. -yu et al, 2002 ) used pattern recognition techniques to summarize interesting features of automatically generated graphs of time-series data from a gas turbine engine. -in german, subject-object ambiguities are frequent. -after the frequently used verbs were identified, the usage notes of dictionaries and thesarus demonstrating the fine differences of the verbs were employed in a two-part representation for lexical differentiation. -as a consequence, word senses occurring in a coherent portion of text tend to maximize domain similarity. -the third one is a tweet collection, which are gathered by. -this hypothesis is the foundation for distributional semantics, in which words are represented by context vectors. -in this paper, we present spot, a sentence planner, and a new methodology for automatically training spot on the basis of feedback provided by human judges. -our model significantly outperforms the previously mentioned hypergraph model of lu and roth and muis and lu on entity mention recognition for the ace2004 and ace2005 corpora. -we present an algorithm for incremental statistical parsing with parallel multiple context-free grammars ( pmcfg ). -we present a novel approach to predict reader s rating of texts. -we used the frame guidelines developed by boydstun et al. -lai et al proposed recurrent cnn while johnson and zhang proposed semi-supervised cnn for solving text classification task. -we used the bidirectional lstm architecture introduced by for named entity recognition. -we show that knowing multiple scores for each example instead of a single score results in a more reliable estimation of a system quality. -all the experiments are built using the scikit-learn machine learning library. -automatic annotation adaptation for sequence labeling aims to enhance a tagger with one annotation standard by transferring knowledge from a source corpus annotated in another standard. -without any loss of generality, we propose a simple classification model using gated recurrent unit coupled with attention. -our dataset uses clinical case reports with around 100,000 gap-filling queries about these cases. -rules are typically defined by creating patterns around the entities, such as lexico-syntactic surface word patterns and dependency tree patterns. -the main innovation of w asp is its use of state-of-the-art statistical machine translation techniques. -word alignment is a fundamental task in natural language processing ( nlp ). -in this paper we presented the word prediction system soothsayer. -we focus on textual structures which correspond to a well defined discourse structure and which often bear hypernymy relations. -we use the penn arabic treebank for labeling the data. -huck et al proposed a phrase orientation model for hpb translation. -sarawgi et al attempted to remove topic bias for identifying gender-specific stylistic markers. -cite-p-25-3-10 and cite-p-25-3-8 employed attention-based sequenceto-sequence ( seq2seq ) framework only for sentence summarization. -in this paper, we propose a novel approach to determine textual similarity. -in this paper, we propose an endto-end attention-based neural network. -in the following example, the first occurrence of aluminum is only considered to be markable because it corefers with the occurrence of this noun as a bare np. -in this work, we extend this hypothesis to multilingual data and joint-space embeddings. -in this approach can be interpreted as a conditional language model, it is suitable for nlg tasks. -we present a new flexible and efficient kernel-based framework for classification with relational similarity. -this approach is very similar to the one used successfully by nivre et al, but we use a maximum entropy classifier to determine parser actions, which makes parsing extremely fast. -the language model is used to capture the term dependence. -a system combination implementation which has been developed at rwth aachen university is used to combine the outputs of different engines. -vikner and jensen type-shift the possessor noun using one of the qualia roles to explain the meaning of the genitive phrases following partee. -we propose a novel semi-supervised machine learning objective for estimating a crf model integrated with ve. -in contrast to previous methods, we are able to select chains based on their cohesive strength. -to our knowledge, we are the first to apply seq2seq model to the task of math word problem solving. -negated event is the shortest group of words that is actually affected by the negation cue. -there are a number of excellent textbook presentations of hidden markov models, so we do not present them in detail here. -in this section we provide experiments comparing the performance of algorithm 2 with algorithm 1 as well as a baseline algorithm based on the approach of ( cite-p-13-1-15 ). -the analysis in ( partee, 1984 ) of quantified sentences, introduced by a temporal connective, gives the wrong truth-conditions when the temporal connective in the subordinate clause is before or after. -for instance, the frequency distributions of most commonly-used words in a native and seven eastern european learner corpora are compared on various parts-of-speech categories. -the parsing strategies differ in terms of the order in which they recognize productions in the derivation tree. -figure 2 : a comparison between the performance of baseline hmm and hmm+type+gen model for two test alignment. -the first confusion network decoding method was based on multiple string alignment borrowed from biological sequence analysis. -we use reservoir sampling to reduce the storage complexity of a previously-studied online algorithm, namely the particle filter, to constant. -blanco and moldovan annotate focus on the negations marked with argm-neg role in propbank. -in this paper, we address one aspect of this problem c inferring predictive models to structure task-oriented dialogs. -our solution, “ hcs, ” is a convolutional neural network to classify sentiment scores. -this is consistent with observations made in previous work that subjectivity is a property associated not with words, but with word meanings. -the switchboard corpus and the british national corpus are used in this study. -recent work by baroni et al shows that word embeddings trained by predict models outperform the count-based models in various lexical semantic tasks. -reasoning is a fundamental problem in natural language processing. -medline, medie, and a gui-based efficient medline search tool, info-pubmed. -in this paper, we report on an experiment that consisted in adapting the english data of tempeval-1 to portuguese. -the set of relation types is not pre-specified but induced from observed unlabeled data. -this task has not been well studied in microblogs yet. -we present a bipartite graph model for drawing comparisons among large groups of documents. -complexity of this task challenges systems to establish the meaning, reference and identity across documents. -it is worth noting that this method only relies on the hierarchies in roget ’ s and wordnet. -the focus has mostly been on detecting insincere reviews or arguments. -the problem of correct identification of nes is specifically addressed and benchmarked by the developers of information extraction system, such as the gate system. -particularly with the premise and supportrel types appear to be better predictors of a speaker s influence rank. -our monolingual objective follows the glove model, which learns from global word co-occurrence statistics. -this allows in turn to compute by intersection the occurrences of discontinuous treelets, much like what is done in for discontinuous strings. -in this paper, we explore a novel learning framework, posterior regularization, for incorporating rich constraints over the posterior distributions of word alignments. -despite the joint approach, our system is still efficient. -perplexity is often used as a quality measure for language models built with n-grams extracted from text corpora. -the optimization problem is solved by using a linear programming model. -chambers and jurafsky proposed unsupervised induction of narrative event chains from raw newswire texts, with narrative cloze as the evaluation metric. -the evaluation metric is case-sensitive bleu-4. -this paper describes the pku_hit system on event detection in the semeval-2010 task. -the method by gedigian et al discriminates between literal and metaphorical use. -for work on l-pcfgs using the em algorithm, see petrov et al, matsuzaki et al, pereira and schabes. -aso is a recently proposed linear multi-task learning algorithm, which extracts the common structures of multiple tasks to improve accuracy, via the use of auxiliary problems. -in this paper, we present a novel query expansion approach for image captioning, in which we utilize a distributional model of meaning for sentences. -we first establish a state-of-the-art baseline with a rich feature set. -in this paper, we propose koosho, an integrated environment for japanese text input. -we show that the model succeeds in this task and, furthermore, that it is capable of predicting correct spatial arrangements for unseen objects if either cnn features or word embeddings of the objects are provided. -in this paper, we propose a generative model – called entity-topic model, to effectively join the above two complementary directions together. -dropout is a technique that involves randomly dropping units during training to prevent overfitting and co-adaptation of neurons. -for tasks c c f, they operated on raw text while all other systems used tagged events and temporal expressions in the corpus. -mikolov et al have shown that distributed vector representations over large corpora in a continuous space model capture many linguistic regularities and key aspects of words. -we built a type signature for the xtag english grammar, an existing broad-coverage grammar of english. -the adversarial examples in neural image captioning highlight the inconsistency in visual language grounding between humans and machines, suggesting a possible weakness of current machine vision and perception machinery. -as discussed in section 4, these findings shed new light on why “ syntactic ” constraints have not yet helped to improve the accuracy of statistical machine translation. -in this paper, we describe a method of using document similarity measures to describe differences in behavior between native and non-native speakers of english. -in this paper, we present a bayesian model of category acquisition. -in this paper, we present different models for sentence realisation. -on the other hand, mem2seq is able to produce the correct responses in this two examples. -we present an aggregation approach that learns a regression model from crowdsourced annotations to predict aggregated labels for instances that have no expert adjudications. -cui et al learned transformations of dependency paths from questions to answers to improve passage ranking. -this kind of one-shot approach is useful but it does not usually perform well to various datasets or tasks. -zhu et al also used the bioscope corpus and employed techniques developed for shallow semantic parsing for detecting scope. -the extracted mwes are integrated into the urdu pargram grammar, a computational grammar for urdu running with xle and based on the syntax formalism of lfg. -most of the following works focused on feature engineering and machine learning models. -this process enables the system to understand user utterances based on the context of a dialogue. -for the representation of textual data, we use tfidf and the word embedding representation of the data. -as pointed out in section 3.5, the majority of sentences require zero or few corrections. -summarization systems that directly optimize for more topic signatures during content selection have fared very well in evaluation. -our nmt systems are trained on 1m parallel sentences of the europarl corpus for en-fr and en-de. -many complex emotions are ignored by current automatic emotion detectors because they are not programmed to seek out these “ undefined ” emotions. -in this paper we propose a model which incorporates coreferential information of candidates to improve pronoun resolution. -we used the rules for reordering german constituent parses of collins et al together with the additional rules described by fraser. -these energy functions are encoded from interior design guidelines or learned from input scene data. -te is the task of determining if the truth of a text entails the truth of another text. -one is to acquire unknown words from corpora and put them into a dictionary, and the other is to estimate a model that can identify unknown words correctly. -we use the voted perceptron algorithm as the kernel machine. -we described mineral, a system for extraction and normalization of disorder mentions in clinical text, with which we participated in task 14 of semeval 2015. -our experiment results demonstrate the effectiveness of our nsw detection method and the benefit of nsw detection for ner. -automatically learning representations of book plots, as structured summaries of their content, has attracted much attention ( cf, cite-p-15-1-16 for a review ). -to train the feature weights, we made use of a novel two-phase training algorithm that incorporates a probabilistic training objective and standard minimum error training. -the results in the unsupervised setting are comparable to the best reported values. -a synset is a set of synonyms that are interchangeable in some context. -in this paper, we propose a novel framework that learns the term-weighting function. -in subtask b, participants must determine which type of irony a tweet contains. -we describe the tagging strategies that can be found in the literature and evaluate their relative performance. -experiments on pku, msra and ctb6 benchmark datasets show that our model outperforms the previous neural network models and state-of-the-art methods. -generation quality is evaluated with bleu, using sacrebleu. -cite-p-18-1-4 combine pattern matching and machine learning. -in this paper, we describe a novel approach to cascaded learning and inference on sequences. -explicit semantic analysis is a variation on the standard vector space model in which the dimensions of the vector are directly equivalent to abstract concepts. -these systems have been created for english, portuguese, italian and german. -glorot et al proposed a deep learning approach which learns to extract a meaningful representation for each review in an unsupervised fashion. -in addition to simplifying the task, k & m s noisy-channel formulation is also appealing. -the encoder units are bidirectional lstms while the decoder unit incorporates an lstm with dot product attention. -in this perspective, kernel methods are a viable approach to implicitly and easily explore feature spaces encoding dependencies. -the model parameters, u, are estimated using numerical optimization methods to maximize the log-likelihood of the training data. -our hdp-based method outperforms all methods over the semeval-2010 wsi dataset, and is also superior to other topic modelling-based approaches to wsi based on the semeval-2007 dataset. -we used the stanford parser to create parse trees that modeled the language structure of patient and therapist utterances. -we train and test unsupervised pos induction on the conll 2007 splits of the penn treebank using the hyperparameter settings from ontonotes. -szarvas et al also applied a method based on conditional random fields. -in addition, the task has generated considerable information for further examination of preposition behavior. -motivation is that it is beneficial to have access to more than one source form since different source forms can provide complementary information, e. g., different stems. -in this paper, we perform nnjm adaptation using l1-specific learner text with a kl divergence regularized objective function. -it will be made freely available to other researchers. -in this paper, we propose a linguistically grounded algorithm for alias detection. -we define sense annotation as a synonymy judgment task, following al-sabbagh et al, 2013, 2014b. -we used conditional random fields to conduct the automatic annotation experiments using our annotated corpus. -by coupling different relations, cpra takes into account relation associations and enables implicit data sharing among them. -in this paper, we extend mert and mbr decoding to work on hypergraphs produced by scfg-based mt systems. -cite-p-17-1-18 proposed a multi-level feature-based framework for spelling error correction including a modification of brill and moore s model ( 2000 ). -document clustering and document classification results show that our models improve the document-topic assignments compared to the baseline models, especially on datasets with few or short documents. -in these respects it is quite similar to the lkb parser-generator system. -in this paper, we focus on the extraction of temporal relations between medical events ( event ), temporal expressions ( timex3 ) and document creation time ( dct ). -cahill et al developed a method for automatic annotation of lfg f-structure on the penn-ii treebank. -despite being a natural comparison and addition, previous work on attentive neural architectures do not consider hand-crafted features. -we presented a host of neural models and a novel semantic-driven approach for tackling the task of guesstwo. -the tagger uses a bigram hidden markov model augmented with a statistical unknown word guesser. -the french treebank is a treebank of 21 564 sentences annotated with constituency annotation. -as a sequence labeler, we use conditional random fields. -we use the svm light implementation of the svm toolkit. -stroppa et al added source-side contextual features to a state-of-the-art log-linear pb-smt system by incorporating context-dependent phrasal translation probabilities learned using decision trees. -in such models, the target character can only influence the prediction as features. -for each node p/, compute thickness hij of each subrf & ant sij in the following way : math-p-20-7-0. -in the training data, we found that 50. 98 % sentences labeled as “ should be extracted ” belongs to the first 5 sentences, which may cause the trained model tends to select more leading sentences. -our parser performs a weighted deductive parsing, based on this deduction system. -chiang et al added thousands of linguistically-motivated features to hierarchical and syntax systems, however, the source syntax features are derived from the research above. -in this paper, we apply this new method to text chunking. -in this subtask, lin et al and rutherford and xue explored rich features such as word-pairs, dependency rules, production rules and brown cluster pairs. -the topic of large-scale distributed language models is relatively new, and existing work is restricted to n-grams only. -the stanford parser was used to generate constituent structure trees. -to address the above-mentioned issues, we present wikikreator c, a system that can automatically generate content for wikipedia stubs. -the smt systems were trained using the moses toolkit with modified kneser-ney smoothing. -this paper explores document-level smt from the tense perspective. -the ibm translation models ( cite-p-14-3-1 ) have been widely used in statistical machine translation ( smt ). -we used the kernel version of the large-margin ranking approach from which solves the optimization problem in figure 2. -neural networks have achieved promising results in sentiment classification. -evodag 4 is a genetic programming system specifically tailored to tackle classification and regression problems on very high dimensional vector spaces and large datasets. -chapman et al created a simple regular expression algorithm called negex that can detect phrases indicating negation and identify medical terms falling within the negative scope. -re-search in cognitive science suggests that human meaning representations are not merely a product of our linguistic exposure, but are also grounded in our perceptual system and sensorimotor experience. -these features are present in many spoken dialogue systems and do not require additional computation, which makes this a very cheap method to detect problems. -we treat candidate extraction as a latent variable and train these two stages jointly with reinforcement learning ( rl ). -property norms have the potential to aid a wide range of semantic tasks, provided that they can be obtained for large numbers of concepts. -we use the performance measure optimization framework proposed by joachims for optimizing these metrics. -the feature weights were tuned by using pairwise ranking optimization on the wmt12 benchmark. -in this work, we focus on the coherence and readability aspects of the problem. -recent studies show that character sequence labeling is an effective method of chinese word segmentation for machine learning. -this paper describes our deep learning-based approach to sentiment analysis in twitter as part of semeval-2016 task 4. -as an area of great linguistic and cultural diversity, asian language resources have received much less attention than their western counterparts. -in general, we could get the optimized parameters through minimum error rate training on the development set. -our architecture can also capture multiple granular interactions by several stacked coupled-lstms layers. -in order to be applicable as an slu model, semantic information must be added manually, since only syntactic structures can be induced automatically. -to our knowledge, our work is the first to perform both identification and resolution of chinese anaphoric zero pronouns using a machine learning approach. -bollen et al explored the notion that public mood can be correlated to and even predictive of economic indicators. -word2vec has become a standard method that builds dense vector representations, which are the weights of a neural network layer predicting neighboring words. -the uniform information density hypothesis holds that speakers tend to maintain a relatively constant rate of information transfer during speech production. -we use the sockeye implementation of a transformer for all of our experiments. -centering and other discourse theories argue that topical entities are likely to appear in prominent syntactic positions such as subject or object. -we propose a selectional preference feature for string-to-tree statistical machine translation based on the information theoretic measure of resnik. -then, we will compare the translation results when restricting the search to either of these constraints. -we used the labeled bracketing metric parseval. -in this paper, we introduce the multi-column convolutional neural networks ( mccnns ) to automatically analyze questions from multiple aspects. -a major aspect of the bild project is that a specific parametrization of the deduction process is represented in the lexicon as well as in the grammar to obtain efficient structures of control. -framenet is a database based on frame semantics. -we present a novel beam-search decoder for grammatical error correction. -in this paper, we address a major challenge in paraphrase research — the lack of parallel corpora. -experiments show that knowledge about multiword expressions leads to an increase of between 7.5 % and 9.5 % in accuracy of shallow parsing in sentences containing these multiword expressions. -our system gave the highest scores reported for the nlg 2011 shared task on deep input linearization ( cite-p-25-1-3 ). -in order to avoid over-fitting, dropout regularization was also used. -chart parsing is the task of building a parse tree that systematically explores combinations based on a set of grammatical rules, while using a chart to store partial results. -we observe noticeable improvements over the baselines on machine translation and summarization tasks by using pointer softmax. -the task of semantic textual similarity is aimed at measuring the degree of semantic equivalence between a pair of texts. -little is known on their true ability to reveal the underlying morphological structure of a word and their semantic capabilities. -for why-questions, we also expect to gain improvement from the addition of structural information. -we compare the feature-based logistic regression classifier to different convolutional neural network architectures. -unsupervised pos tagging is a fundamental problem in unsupervised learning. -we participated in the semeval-2007 coarse-grained english all-words task and fine-grained english all-words task. -in this paper, we propose a method to reduce the number of wrong labels. -such architectures have been extended to jointly model intent detection and slot filling in multiple domains. -secondly, we propose a log-linear model for computing the paraphrase likelihood. -in this paper, we present a simple semi-supervised approach to learning the meta features from the auto-parsed data for dependency parsing. -coreference resolution systems are typically trained with heuristic loss functions that require careful tuning. -type theory with records is an extension of standard type theory shown to be useful in semantics and dialogue modelling. -the underlying seq2seq model consists of an lstm encoder and an lstm decoder. -in this work, we aim to take advantage of both the classification and the smt approaches. -from a theoretical perspective, it is accepted that negation has scope and focus, and that the focus yields positive interpretations. -bastings et al relied on graph-convolutional networks primarily developed for modelling graph-structured data. -johnson and charniak proposed a tag-based noisy channel model, which showed great improvement over a boosting-based classifier. -quality estimation is the task of predicting the quality of a machine translation system without human intervention or reference translations. -cite-p-18-1-7 argue that the key to success lies in hyperparameter tuning rather than in the model s architecture. -mann encoded specific inference rules to improve extraction of information about ceos. -incorporation ( cgi ), based on the technique developed by cgi. -in this paper, we describe a new multimodal dataset that consists of gaze measurements and spoken descriptions collected in parallel during an image inspection task. -in this paper, we describe a sequenceto-sequence model for amr parsing and present different ways to tackle the data sparsity problem. -in this paper, we aim to incorporate word sememes into word representation learning ( wrl ) and learn improved word embeddings in a low-dimensional semantic space. -among them, the machine learning-based techniques showed excellent performance in many recent research studies. -experimental results demonstrate promising and reasonable performance of our approach. -brown clustering is a commonly used unsupervised method for grouping words into a hierarchy of clusters. -in this paper, we propose a new hybrid kernel for re. -chu et al have demonstrated that many standard machine learning algorithms can be phrased as mapreduce tasks, thus illuminating the versatility of this framework. -these representations are fed into a classifier to detect the review spam. -in this paper, we describe a text chunking system using regularized winnow. -for string-to-tree translation, we parse the german target side with bitpar. -we consider the problem of parsing non-recursive context-free grammars, i. e. context-free grammars that generate finite languages. -gaze behaviour is more reliable when the reader has understood the text. -this is precisely the relative frequency estimate we seek. -our performance comparison shows how effective our voting strategies can be : they top the rankings in the semeval task, outperforming even elaborate ensemble strategies. -while we report bleu, the primary goal in our work is to achieve highest possible f1 score. -in this paper, we propose a new probabilistic model for word alignment where word alignments are associated with linguistically motivated alignment types. -we present a novel method for aligning a sequence of instructions to a video of someone carrying out a task. -one method that has been quite successful in many applications is the snow architecture. -idf weighting and part-of-speech tagging are applied on the examined sentences to support the identification of words that are highly descriptive in each sentence. -we used the srilm toolkit to train a 5-gram language model with modified kneser-ney smoothing on the target-side training corpus. -corston-oliver et al treated the evaluation of mt outputs as classification problem between human translation and machine translation. -we use an implementation based on blocks and theano for evaluation. -we introduced marian, a self-contained neural machine translation toolkit written in c++ with focus on efficiency and research. -we explore the fact that many poorly resourced languages are closely related to well equipped languages, which enables low-level techniques such as character-based translation. -in this paper, we study how to incorporate extrinsic cues into the network, beyond just generic word embeddings. -best-worst scaling ( bws ) is a less-known, and more recently introduced, variant of comparative annotation. -to integrate multiple tk representations into a single model, we apply a classifier stacking approach. -semantic role labeling is the task of assigning semantic roles to predicate-argument structures. -kalchbrenner and blunsom use a simple convolution model to generate phrase embeddings from word embeddings. -in section 6, we consider the implications of our experimental results. -to enable other researchers to use this new notion of s-relevance, we have published the annotated s-relevance corpus used in this paper. -it has been proposed to combine chat-oriented dialogue systems with task-oriented dialogue systems. -we describe and evaluate two approaches to this compilation problem. -word sense disambiguation is the task of determining the meaning of a word in a given context. -word representations have proven useful for many nlp tasks, e. g., brown clusters as features in dependency parsing ( cite-p-15-3-5 ). -our confidence based approach can be used to improve these tasks. -zheng et al proposed a gated attention neural network model to address the contextual relevance and diversity of comments. -stroppa et al added source-side contextual features to a state-of-the-art log-linear pb-smt system by incorporating context-dependent phrasal translation probabilities learned using decision trees. -a pcfg math-w-3-1-3-146 is reduced if math-w-3-1-3-154 is reduced. -kalchbrenner and blunsom introduced recurrent continuous translation models that comprise a class for purely continuous sentence-level translation models. -so far, they have been quite successfully applied to 56 part-of-speech tagging, syntactic parsing, semantic role labeling, opinion mining, etc. -we use the stacked denoising autoencoder to build the corpus-based model. -we use glove vectors for word embeddings. -we used the pre-trained glove word embeddings. -in this paper, we have shown that this extra formal power can be used in nl processing. -deep learning models in various forms have been the standard for solving vqa. -blei and mcauliffe proposed supervised lda that can handle sentiments as observed labels. -we used the hindmono corpus which contains roughly 45 million sentences to build our language model in hindi. -the salt behind the corn flakes on the shelf above the fridge is in this context preferable to the white powder. -bleu and nist are calculated as the geometric mean of n-grams multiplied by a brevity penalty, comparing a machine translation and a reference text. -we used two available re tools for extracting semantic relations from scientific publications. -in this paper, we discuss inter-dialect mt in general and cantonese-mandarin mt in particular. -in order to amplify the contribution of important words in the final representation, we use a context-aware attention mechanism, which aggregates all intermediate hidden states using their relative importance. -one of the important open questions in natural language generation is how the common, rulebased approaches to generation can be combined with recent insights from statistical nlp. -in this paper, we propose a deep architecture to model the strong interaction of sentence pair with two coupled-lstms. -the experiments show that our framework is effective ; it achieves higher f1-measure than three state-of-the-art systems. -it has been shown that the skip-gram with negative sampling algorithm in word2vec corresponds to implicit factorization of the pmi matrix. -we compare our method with the template-based method and the verb-categorization method. -the algorithm takes word vectors and uses them and the network structure to induce the sense vectors. -extensive experiments have leveraged word embeddings to find general semantic relations. -previous work has shown that unlabeled text can be used to induce unsupervised word clusters that can improve performance of many supervised nlp tasks. -the key component is the alignment model. -bykh and meurers systematically explored non-lexicalized and lexicalized context-free grammar production rules. -we use weight tying between target and output embeddings. -we use minimum error rate training to train a 5-gram language model with modified kneser-ney smoothing on the target-side training corpus. -the vhred model ( serban et al, 2017 ) integrates the vae with the hred to model twitter and ubuntu irc conversations by introducing an utterance latent variable. -a comparison between models shows that rnns outperform crfs, even when they use word embeddings as the only features. -bastings et al relied on graph-convolutional networks primarily developed for modelling graph-structured data. -yasuda et al and foster et al ranked the sentence pairs in the general-domain corpus according to the perplexity scores of sentences, which are computed with respect to in-domain language models. -for image features, we use the precomputed features provided by faghri et al, which are extracted from the fc7 layer of vgg-19. -in previous work on relation extraction, it has been shown that the shortest dependency path between any two entities captures the information required to assert a relationship between them. -to maximize size and heterogeneity, we here refer to the argument web ( cite-p-17-1-4 ), which is to our knowledge the largest ground-truth argument database available so far. -we show that such cues have predictive power even when extracted from the first 20 seconds of the conversations. -each sentence is linguistically analyzed by a pcfg-la parser trained on the penn treebank. -we use the symmetric kl-divergence metric to measure the tag distribution divergence. -syntax-based models either use linguistic annotation on the source language side, target language side or are syntactic in a structural sense only. -blacoe and lapata compare several types of vector representations for semantic composition tasks. -while world knowledge has been shown to improve learning-based coreference resolvers, the improvements were typically obtained by incorporating world knowledge into a fairly weak baseline resolver. -barzilay and mckeown presented an unsupervised learning approach to extract paraphrases of words and phrases from different english translations of the identical source language sentences. -this paper describes the method hitz-icrc system used for qa tempeval challenge. -the desired output is a mapping from terms to their corresponding hypernyms, which can naturally be represented as a weighted bipartite graph ( term-label graph ). -dong et al used dependency parsing for twitter sentiment classification to find the words syntactically connected to the target of interest. -supervised approaches to dependency parsing have been successful for languages where relatively large treebanks are available. -we describe the tagging strategies that can be found in the literature and evaluate their relative performance. -some of the very effective ml approaches used in ner are me, crf and svm. -to remedy the above mentioned effects, we extended the normalized frequency of cite-p-12-1-0 to a normalized correlation criterion to spot translation equivalents. -inui et al propose a rule-based system for text simplification aimed at deaf people. -in addition, we used word category information of a chinese thesaurus for verb disambiguation. -experimental results show the effectiveness of the proposed approach. -the model is based on the idea that missing or corrupted values for one field can be inferred from values in other fields of the record. -standard lm benchmarks in english include the penn treebank, the 1 billion word benchmark, and the hutter prize data. -this phenomenon suggests that grammatical features may play a more important role in predicting and measuring l2 readability. -reliably resolving these references is critical for dialogue success. -in this work, we present the first application of nli to non-english data. -researchers have shown that the target-side monolingual data can greatly enhance the decoder model of nmt. -in this paper we present an in-depth analysis of the state of the art in order to clarify this issue. -bo? rschinger et al introduced an approach to grounded language learning based on unsupervised pcfg induction. -the derivations licenced by a lexical category sequence were created using the ccg parser described in clark and curran. -school of thought analysis is an important yet not-well-elaborated scientific knowledge discovery task. -we present b aye s um ( for °bayesian summarization ), a model for sentence extraction in query-focused summarization. -we do this by compiling the rules resulting from an adaboost classifier into a finite-state transducer. -we evaluate our approach on word similarity and relational similarity frameworks, reporting state-of-the-art performance on multiple datasets. -shimbo and hara and hara et al considered many features for coordination disambiguation and automatically optimized their weights, which were heuristically determined in kurohashi and nagao, by using a discriminative learning model. -the main advantage for the proposed new method for nlg is that the complexity of the grammatical decision making process during nlg can be vastly reduced, because the ebl method supports the adaption of a nlg system to a particular use of a language. -moore et al introduced a discriminative model of 1-to-n and m-to-1 alignments, and similarly to the best results were obtained using hmms trained to agree and intersected model 4. -domain-free rules aim to help the human annotator in scoring semantic equivalence of sentence pair. -we use the penndiscourse treebank and penn treebank data. -as each edge in the confusion network only has a single word, it is possible to produce inappropriate translations such as ° he is like of apples . -in this approach, tree structures for the source, target, or both are used for model training. -a critical problem for the task-specific ranking is training data insufficiency, which may be solved by using the data extracted from click log. -the system was optimized on the wmt08 french-english development data using minimum error rate training and tested on the wmt08 test data. -we used weighted textual matrix factorization to model the semantics of the sentences. -there exists a variety of different metrics, eg, word error rate, position-independent word error rate, bleu score, nist score, meteor, gtm. -lin and hovy introduced topic signatures which are topic relevant terms for summarization. -sennrich et al proposed a method to control the level of politeness in target sentence in english-to-german translation. -costa and branco explore the usefulness of a wider range of explicitly aspectual features for temporal relation classification. -in this paper, we address the semantic modeling of relational patterns. -our baseline system is a state-of-the-art smt system which adapts bracketing transduction grammars to phrasal translation and equips itself with a maximum entropy based reordering model. -bhargava and kondrak present a method for applying transliterations to grapheme-to-phoneme conversion. -various sequence labeling models have been proposed, like hidden markov models, structured perceptron, conditional random fields and svm-hmm. -the work by nepveu et al constitutes a domain adaptation technique and not an online learning technique, since the proposed cache components require pre-existent models estimated in batch mode. -in this paper, we apply syntactic substitution for generating sentences, which corresponds to transfer-based machine translation. -in particular, many neural networks have been proposed and shown better performance in relation classification and relation extraction. -the tasks are organized based on some research works. -the current state-of-the-art in smt are phrase-based systems. -using lexical sets improves the model ’ s performance on three of the most challenging verbs. -socher et al proposed a feature learning algorithm to discover explanatory factors in sentiment classification. -we also find that the rules used in our model are more suitable for long-distance reordering and translating long sentences. -as a separate task, there has been extensive work on utterance segmentation. -central to our approach is the construction of high-accuracy, high-coverage multilingual wikipedia entity type mappings. -by contrast, the construction specific transformations appear to be more sensitive to parsing strategy but have a constant positive effect over several languages. -this year, we were unable to follow the methodology outlined in graham et al for evaluation of segment-level metrics because the sampling of sentences did not provide sufficient number of assessments of the same segment. -we use the penn discourse treebank, the largest available manually annotated corpora of discourse on top of one million word tokens from the wall street journal. -for englishto-arabic translation, our model yields a +1. 04 bleu average improvement over a state-of-the-art baseline. -in this paper, we focus on the problem of decoding given a trained neural machine translation model. -all our smt systems are built with the moses toolkit, and word alignments are generated by the berkeley aligner. -we follow the inference approach in and formalize this process as an integer linear program. -in this paper, we study and design models for learning to detect ancillary information in the context of pi. -based on bleu, it computes n-gram precision of the system output against reference sentences. -one such feature is the constraint that two case elements with the same case do not modify a verb. -the argument comprehension reasoning task aims to reconstruct and analyze the argument reasoning. -then, from dependency trees in the data, we extract different types of subtrees. -finkel and manning propose a crf-based constituency parser which takes each named entity as a constituent in the parsing tree. -we use glove embeddings for the english and fasttext embeddings for all newswire tasks. -a typical query spelling correction system employs a noisy channel model. -we present a lwfg parser as a deductive system. -we used the machine translation quality metric bleu to measure the similarity between machine generated tweets and the held out tests sets. -in contrast, we successfully utilize implicit task-based feedback collected in a cross-lingual search task to improve task-specific and machine translation quality metrics. -in this paper we propose a shift in focus from constraining locality and complexity through tree-and set-locality to constraining locality and complexity through restrictions on the derivational distance between trees in the same tree set in a valid derivation. -we then consider approaches drawing off of word2vec, paragraph vectors, and skip-thoughts. -for instance, klein and manning introduced an approach where the objective function is the product of the probabilities of a generative phrase-structure and a dependency parser. -the additional baseline, bigram baseline, is a bigram-based language model trained on the bnc with srilm, using the standard language model settings for computing log-probabilities of bigrams. -in multi-instance learning, the uncertainty of instance labels can be taken into account. -this paper describes our system submission to the semeval 2016 sts shared task. -versions of the l imsi broadcast news transcription system have been developed in american english, french, german, mandarin and portuguese. -here we review the parameters of the standard phrase-based translation model ( cite-p-17-1-20 ). -this paper presents an approach to the problem of taxonomy construction from texts focusing on the hyponym-hypernym relation between two terms. -this model is implemented using a crf sequence tagger. -for word embeddings, we use word2vec, a fisher-encoded word embeddings. -we use glove vectors for word embeddings. -it is widely acknowledged that good mwe processing strategies are necessary for nlp systems to work effectively, since these kinds of word combinations are very frequent in both text and speech. -based on this intuition, we proposed an event-based time label propagation model called confidence boosting in which time label information can be propagated between documents and events on a bipartite graph. -in this paper, we describe a new approach for the collection of image annotation datasets. -goldstein-stewart et al also carried out some cross-topic experiments by concatenating the texts of an author from different genres. -the factorization model allows us to determine which dimensions are important for a particular context, and adapt the dependency-based feature vector of the word accordingly. -lemmatization is the process of determining the root/dictionary form of a word. -to address the latter problem, we avoid feature engineering and instead adopt convolutional architecture with piecewise max pooling to automatically learn relevant features. -szpektor et al proposed a fully unsupervised learning algorithm for web-based extraction of entailment relations. -a popular approach is phrase-based models which translate short sequences of words together. -but this kind of memory is known to have a severely constrained storage capacity a possibly constrained to as few as three or four distinct elements. -rasooli and collins proposed a method to induce dependency parser in the target language 100 using a dependency parser in the source language and a parallel corpus. -through extensive experiments on real datasets, we demonstrated effectiveness of kgeval. -we train a 5-gram language model on the xinhua portion of the gigaword corpus using the srilm toolkit with modified kneser-ney smoothing. -tan et al used a local feature selection method to ensure the performance of trigger classification and applied multiple levels of patterns to improve their coverage in argument classification. -following on the instructional corpus, we use 26 relations, and treat the reversals of non-commutative relations as separate relations. -to avoid those given in cite-p-3-15-8 and used in all of the previous repeated evaluations based on the testing cor-work. -rishabh iyer acknowledges support from the microsoft research ph.d fellowship. -this paper presents a predicate-argument structure analysis that simultaneously conducts zero-anaphora resolution. -in this task, we used the trec question dataset 10 which contains 5952 questions. -discriminative models in syntactic and semantic parsers typically use millions of features. -however, adding confusion matrix features improves the predictive model ( section 4 ). -the data collection methods used to compile the dataset provided in offenseval are described in zampieri et al. -in this paper, we propose a novel method to detect asymmetric entailment relations between verbs. -pitler and nenkova showed that syntactic features extracted from constituent parse trees are very useful in disambiguating discourse connectives. -we trained a statistical model using data derived from the chinese treebank and reported promising preliminary results. -escudero et al tested the supervised adaptation scenario on the dso corpus, which had examples from the brown corpus and wall street journal corpus. -moldovan et al propose a sense collocation method based on the pair of word senses of nc constituents. -kalchbrenner et al introduced a dynamic k-max pooling to handle variable length sequences. -the svm is based on discriminative approach and makes use of both positive and negative examples to learn the distinction between the two classes. -we propose to formalize a scene ( i. e., a domain of objects and their properties and relations ) as a labeled directed graph and describe the content selection problem ( which properties and relations to include in a description for an object? ). -it forms a hierarchy of nested subgraphs whose cohesiveness and size respectively increase and decrease with k. -we used julius as the lvcsr and julian as the dssr. -zelenko et al described a recursive kernel based on shallow parse trees to detect personaffiliation and organization-location relations, in which a relation example is the least common subtree containing two entity nodes. -mohammad et al leverage a large sentiment lexicon in a svm model, achieving the best results in the semeval 2013 benchmark on sentence-level sentiment analysis. -pantel and ravichandran extended this approach by making use of all syntactic dependency features for each noun. -issue framing is related to the broader challenges of biased language analysis and subjectivity. -first, we show that the ensemble can be unfolded into a single large neural network which imitates the output of the ensemble system. -we apply an hmm method in conjunction with a local classification model to predict a global phoneme sequence given a word. -lexical features are a major source of information in state-of-the-art coreference resolvers. -le and mikolov introduced paragraph-level vectors, a fixed-length feature representation for variable-length texts. -a seed-expansion approach is proposed to extract the arguments correctly. -as a key property of our tool, we store all intermediate annotation results and record the user csystem interaction data. -the data structure is a list which is mainly accessed wittl a typical lifo stack policy. -since multilinguality is a key need in today ’ s information society, and because wcls have been tested overwhelmingly only with the english language, we provide experiments for three different languages, namely english, french and italian. -for word embeddings, we use word2vec, a fisher-encoded word embeddings. -marcu and echihabi presented an unsupervised method to recognize discourse relations held between arbitrary spans of text. -this model extends the phrase-based model by using the formal synchronous grammar to well capture the recursiveness of language during translation. -in previous work using the propbank corpus, gildea and palmer developed a system to predict semantic roles from sentences and their parse trees as determined by the statistical parser of collins. -by using the teaching process, we can reduce the performance gap between mixed and upper case ner by as much as 39 % for muc-6 and 22 % for muc-7. -in this paper, we investigate other methods for converting a system-generated bit string into a memorable sequence of english words. -as there is no standard chinese corpus, no chinese experimental results are reported in. -experimental results show that our proposed framework outperforms the state-of-the-art baseline by over 7 % in f-measure. -the most relevant work to this paper are kazama and torisawa, toral and munoz, cucerzan, richman and schone. -with its growing size and coverage, the internet has become an attractive source of material for linguistic resources, used both for linguistics and natural language processing applications. -we find that the precision of our system and that of morante and daelemans drops by an equal amount for the cross-text testing. -in this paper, we extend ongoing research into multi-sense embeddings by first proposing a new version based on chinese restaurant processes that achieves state of the art performance on simple word similarity matching tasks. -in this paper, we propose a framework for multi-target smt. -cite-p-16-3-9 did suggest using syntactic errors in their work but did not investigate them in any detail. -semeval-2016 task 4 comprises five subtasks, three of which represent a significant departure from previous editions. -semantic textual similarity measures the degree of equivalence between the meanings of two text sequences. -the bleu score or bilingual evaluation under study is a method to measure the difference between machine and human translations. -logical derivations were used to combine clauses and remove easily inferable clauses in. -the second one is a reimplementation of a phrase-based decoder with lexicalized reordering model based on maximum entropy principle proposed by xiong et al. -a hybrid method called tribayes is then introduced that combines the best of the previous two methods. -hacioglu et al showed that tagging phrase by phrase is better than word by word. -the entropy is the logarithm of the local perplexity at a given point in the word string. -in this paper, we show how oe adverbials and dongan adverbials contribute to constructing the temporal interpretation of korean sentences. -under this model, we incorporate various constraints to improve the linguistic quality of the compressed sentences. -state-of-the-art statistical parsers and pos taggers perform very well when trained with large amounts of in-domain data. -we tried to follow the underlying idea of the task, that is, evaluating the gap of full-fledged recognizing textual entailment systems with respect to compositional distributional semantic models ( cdsms ) applied to this task. -a solution to this problem relies on the use of expectation maximization. -wang and jiang build question-aware passage representation with match-lstm, and predict answer boundaries in the passage with pointer networks. -on the other hand, the majority of corpus statistics approaches to noun-noun compound interpretation collect statistics on the occurrence frequency of the noun constituents and use them in a probabilistic model. -section 3 shows that a generative 1 lm built with our classifier is competitive to modified kneser-ney smoothing and can outperform it if sufficiently rich features are incorporated. -we evaluated our approach with experiments on three multimodal tasks using public datasets and compare its performance with state-of-the-art models. -in this paper, we introduce word appearance in context. -wei and gao derived external features based on the relevant tweet collection to assist the ranking of the original sentences for extractive summarization in a fashion of supervised machine learning. -in this paper, we focus on an inference technique called amortized inference ( cite-p-13-3-0 ), where previous solutions to inference problems are used to speed up new instances. -although the system performs well within a limited textual domain, further research is needed to make it effective for open-domain question answering and text summarisation. -models based on the current scheme performed appreciably better than the baseline. -the number of feature structures is no longer finite as defined in, and therefore the generative capacity of the formalism is extended. -generally, we may think of math-w-2-6-1-123 as arbitrary strings over arbitrary alphabets math-w-2-6-1-142. -since most available caption datasets have been constructed for english language, there are few datasets for japanese. -we use maege to mimic a setting of ranking against precision-oriented outputs. -the grammar matrix is couched within the head-driven phrase structure grammar framework. -we report results in terms of case-insensitive bleu scores. -to perform qa, we used the framework of berant et al, as implemented in sempre. -wikification is the task of identifying and linking expressions in text to their referent wikipedia pages. -because word frequencies are zipf-distributed, this often means that there is little relevant training data for a substantial fraction of parameters, especially in new domains. -word embeddings are dense, low dimensional, and real-valued vectors that can capture syntactic and semantic properties of the words. -large scale knowledge bases like dbpedia and freebase provide structured information in diverse domains. -the regressor used is a random forest regressor in the implementation provided by scikit-learn. -our model regards associative anaphora as a kind of zero anaphora and resolves it in the same manner as zero anaphora resolution that uses automatically acquired case frames. -after topics are discovered by topic modeling techniques, these topics are conventionally represented by their top n words or terms. -predictions-as-features methods suffer from the drawback that they methods can ’ t model dependencies between current label and the latter labels. -in this paper we focus on choosing useful bigrams and estimating accurate weights to use in the concept-based ilp methods. -marcu and echihabi presented an unsupervised method to recognize discourse relations held between arbitrary spans of text. -we used hmm method for pos tagging and morpheme analysis-based method to predict poss for new words. -summarization is the task of condensing a piece of text into a shorter version that contains the main information from the original. -second, we develop a novel integer linear programming ( ilp ) based abstractive summarization technique to generate text from the classified content. -similarity is only one particular type of relatedness, comparison to similarity norms fails to give a complete view of a relatedness measure ’ s efficacy. -in section 2 we discuss related work, section 3 details the algorithm, section 4 describes the evaluation protocol and section 5 presents our results. -kim and hovy and bethard et al explore the usefulness of semantic roles provided by framenet for both opinion holder and opinion target extraction. -however, it is well-known that k-means has the major drawback of not being able to separate data points that are not linearly separable in the given feature space. -the system was optimized on the wmt08 french-english development data using minimum error rate training and tested on the wmt08 test data. -semantic roles are obtained by using the parser by zhang et al. -in this paper, we experiment with three different methods of pos error detection using the ifd corpus. -the experiments were done on the english penn treebank using standard head-percolation rules to convert the phrase structure into dependency trees. -we evaluate and compare both approaches on two lexical substitution datasets, one english and one german. -we introduce non-lexical rules using the same approach as for the hierarchical rules of chiang. -this method is an entropy-based cutoff method, and can be considered an extension of the work of seymore and rosenfeld. -recaps help the audience absorb the essence of previous episodes, but also grab people s attention with upcoming plots. -we use the opennmt-pytorch toolkit to train our pytorch pytorch pytorch pytorch pytorch pytorch pytorch pytorch pytorch pytorch pytorch pytorch pytorch pytorch pytorch pytor -we define a new task, argument facet similarity ( afs ), and show that we can predict afs with a.54 correlation score, versus an ngram system baseline of.39 and a semantic textual similarity system baseline of.45. -in this paper, we propose to find a balance between availability and restrictedness by making use of discourse markers. -each target word occurs in a sentence and it may be the case that those words surrounding the target give extra information as to its complexity. -in this paper, we propose to incorporate the supervised method into the concept-based ilp framework. -aspect extraction is a fundamental task in natural language processing ( nlp ). -garfield was the first to define a classification scheme, while finney was the first to suggest that a citation classifier could be automated. -in this paper, we describe our approach to intermediate semantic representations in the interpretation of temporal expressions. -we evaluate our method using the europarl corpus. -the recently suggested idea of partial textual entailment may remedy this problem. -our approach can therefore be adapted to languages with dependency treebanks, since ccg lexical categories can be easily extracted from dependency treebanks. -many knowledge graph entities lack such textual descriptions. -this paper presents an open-domain textual question-answering system that uses several feedback loops to enhance its performance. -recent studies on review helpfulness prediction have been shown effective by using handcrafted features. -minimum error rate training is a crucial component for many state-of-the-art nlp applications, such as machine translation and speech recognition. -uccaapp supports annotation with a variety of formal properties, including discontiguous units, inter-sentence annotation, reentrancy and multi-layered annotation, making it suitable for other syntactic and semantic annotation schemes that use these properties. -part-of-speech tags are obtained using the treetagger. -lapata and brew and li and brew proposed probabilistic models for calculating prior probabilities of verb classes for a verb. -as there are many other nlp problems in which there is an interesting minority class, the brf method might be applied to those problems also. -wang et al proposed a topical n-gram model that adds a layer of complexity to allow the formation of bigrams to be determined by the context. -applying our methods to the task of compound noun interpretation, we have shown that combining lexical and relational similarity is a very effective approach that surpasses either similarity model taken individually. -for chinese, the concatenated trigram model in shao et al is applied. -to determine the word classes, one can use the algorithm of brown et al for finding the classes. -the translation model was smoothed in both directions with kn smoothing. -in this article we survey past and current work on question answering in restricted domains. -these curves demonstrate that parameter averaging helps to stabilize the learning and improve generalization capacity. -this system uses discriminative large-margin learning techniques coupled with a decoding algorithm that searches the space of all compressions. -summarization is a fundamental task in natural language processing ( nlp ). -a challenge set consists of a small set of sentences, each hand-designed to probe a system ’ s capacity to bridge a particular structural divergence between languages. -to the best of our knowledge, there is no attempt in the literature to build a resource that associates words with senses. -this weight vector is learned using a simple perceptron like algorithm similar to the one used in. -intrinsic evaluation of the resulting vectors shows that geographic context alone does provide useful information about semantic relatedness. -this kernel has shown very promising results in srl. -in this paper, we propose an approach that favors the use of normalized dictionaries by generating virtual/materialized personalized views. -in the context of ir, decompounding has an analogous effect to stemming, and it significantly improves retrieval results ( cite-p-26-3-1 ). -examples of such neural networks are linear networks, deeper feed-forward neural networks, or recurrent neural networks. -in linguistics various subtypes of elliptical constructions are studied. -experimental results indicate that this method can consistently and significantly improve translation quality over individual translation outputs. -in this paper, we are concerned with the reasons that cause the errors. -the neural networks are trained using the rwthlm toolkit. -as a sequence labeler, we use conditional random fields. -as mentioned in previous sections, we apply our measure word generation module into smt output. -one of the earliest works is an extension of word2vec to learn a distributed representation of text. -we applied topic modeling in order to get topic distributions over set of sentences. -pang et al examined the effectiveness of using supervised learning methods to identify document level sentiments. -all linear svm models were implemented with scikit-learn and trained and tested using liblinear backend. -for more details on the original definition of tags, we refer the reader to kroch and joshi, or vijayshanker. -to pursue a better method to predict the order between two neighboring blocks 1, xiong et al present an enhanced btg with a maximum entropy based reordering model. -bunescu and mooney use shortest path dependency kernel for relation extraction. -examples include the widely known discourse parsing work of. -this makes our approach applicable to different nmt architectures. -these articles are then used to learn a vector of word frequencies, wherewith answer candidates are rated afterwards. -we use glove embeddings for the english and fasttext embeddings for all newswire tasks. -our monolingual objective follows the glove model, which learns from global word co-occurrence statistics. -we propose a new model that jointly learns multilingual multimodal representations using the image as a pivot between languages. -in our framework, each classifier learns to focus on the cases where the other classifiers are less confident. -liao and grishman use cross-event inference to help with the extraction of role fillers shared across events. -in this paper, we present an evaluation and error analysis of a cross-lingual application that we developed for a government-sponsored evaluation, the 5w task. -on wmt 16 english-romanian translation we achieve accuracy that is very competitive to the current state-of-the-art result. -we explain how other more efficient variants of the basic parser can be obtained by determinizing portions of the basic non-deterministic pushdown machine while still using the same pseudo-parallel driver. -in short texts, these methods yield the same poor performance as traditional topic models. -neural machine translation is a new paradigm in machine translation, powered by recent advances in sequence to sequence learning frameworks. -for the unsupervised mapping, we used the source and target language monolingual spaces. -this paper presents a novel approach using recurrent neural networks for estimating the quality of machine translation output. -we used a k-best version of the mira algorithm. -the first two are from semeval 2014, containing reviews of restaurant and laptop domains, which are widely used in previous works. -word embeddings have shown promising results in various nlp applications, such as named entity recognition, sentiment analysis and parsing. -the technique is generally applicable to natural language generation systems, which perform hierarchical text structuring based on a theory of coherence relations with certain additional assumptions. -kleinberg proposed a state machine to model the arrival times of documents in a stream. -in this example, the score of translating “ dos ” to “ make ” was higher than the score of translating “ dos ” to “ both ”. -our attentional models yield a boost of up to 5.0 bleu over non-attentional systems which already incorporate known techniques such as dropout. -information about lexical category probabilities ( cite-p-19-1-1 ) assigned by the supertagger can be useful during parsing. -a semantic bias is used to associate collocations with the appropriate meaning relation, if one exists. -a major challenge of semantic parsing is the vocabulary mismatch problem between natural language and target ontology. -xu et al picked up heterogeneous information along the left and right sub-path of the sdp respectively, leveraging recurrent neural networks with long short term memory units. -we explicitly model missing words to alleviate the sparsity problem in modeling short texts. -the roark parser is an incremental syntactic parser based language model that uses rich lexical and syntactic contexts as features to predict its next moves. -glaysher and moldovan demonstrated an efficiency gain by explicitly disallowing constituents that cross chunk boundaries. -the pdtb is the largest corpus annotated for discourse relations, formed by newspaper articles from the wall street journal. -xu et al represent heterogeneous features as embeddings and propose a multichannel lstm based recurrent neural network for picking up information along the sdp. -for efficiently solving the tsp, the model is restricted to pairwise features which examine only a pair of words and their neighborhood. -this also corresponds to a syntactic similarity : all the verbs of this group share the same preferred syntactic subcategorization frames. -we adopt a previously-proposed wsi methodology for the task, which is based on a hierarchical dirichlet process ( hdp ), a non-parametric topic model. -clarke and lapata used a trigram language model with modified kneser-ney smoothing. -the marked systems produce statistically significant improvements as measured by bootstrap resampling method on bleu over the baseline system. -improvements demonstrate the importance of combining complementary objectives in a joint model for robust disambiguation. -it has been shown that domain information is fundamental for wsd. -in this paper, we propose a supervised model for keyphrase extraction from research papers, which are embedded in citation networks. -in order for these techniques to be more broadly applicable, they need to be extended to apply on weighted packed representations of ambiguous input. -we propose an algorithm based on the lesk wsd algorithm in order to perform unsupervised visual sense disambiguation on our dataset. -the implementation is done using the tensorflow library. -dzikovska et al showed that a statistical classifier trained on this data set can be used in combination with a semantic interpreter to significantly improve the overall quality of natural language interpretation in a dialogue-based its. -in this paper, we propose a simple and efficient model for using retrieved sentence pairs to guide an existing nmt model at test time. -we describe the enhanced models that incorporate temporal and semantic information about speech and eye gaze for word acquisition. -in this work, we tackle the problem of mapping trending twitter topics to entities from wikipedia. -we used the continuous bag-of-words model of mikolov et al with a window size of eight by training the model with wikipedia text corpus, we obtained word embeddings for most of the lemmas and words contained in the vuamc. -in section 4, we present an active learning method using the learning with rationales framework and present relevant results. -we adopt the sentence-level evaluation metric used in pado et al. -the performance of the different systems is evaluated in terms of translation error rate, bleu, and precision. -however, handcrafted multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent inputs. -first, a statistical parser is used to generate a semantically-augmented parse tree ( sapt ), where each internal node includes both a syntactic and semantic label. -we will also try to further exploit the factorized representation with discriminative learning. -yu and hatzivassiloglou, kim and hovy, hu and liu, and grefenstette et al all begin by creating prior-polarity lexicons. -shi and mihalcea propose the integration of verbnet, wordnet and framenet into a knowledge base and use it in the building of a semantic parser. -su et al also apply htmm to monolingual data and apply the results to machine translation. -cite-p-24-3-9 trained a multi-speaker speech recognizer using permutation-free training without explicit objective function for separation. -chambers and jurafsky presented an unsupervised learning system for narrative schemas based on coreferent arguments in chains of verbs. -paraphrase database contains millions of english paraphrases automatically extracted from bilingual parallel corpora. -named entity recognizer ( ner ) trained on an english corpus does not have the same performance when applied to machine-translated text. -the nmt architecture is an attentional encoder-decoder model similar to and uses a long short-term memory as the recurrent cell. -we start by formulating a general hypothesis testing framework for a comparison between two algorithms. -we encode domain knowledge as first order logic rules and automatically integrate them with a topic model to produce clusters shaped by the data and the constraints at hand. -explicit discourse connectives can potentially be exploited to collect more training data to collect more data and boost the performance. -yet further research work is still expected to make it effective with complicated relation extraction tasks such as the one defined in ace. -as second data sets we use a noun compound data set of 54,571 nouns from germanet, 21 which has been constructed by henrich and hinrichs. -in this paper, we extend the work on using latent cross-language topic models for identifying word translations across comparable corpora. -word representations, especially brown clusters, have been extensively used for named entity recognition, parsing and pos tagging. -words in time expressions demonstrate similar syntactic behaviour. -sundermeyer et al proposed word-and phrase-based rnn models and applied them to rescore n-best lists, reporting major improvements. -mesgar and strube modeled these coherence patterns by subgraphs of the graph representation of documents. -all the parameters are initialized using the xavier method. -schwartz and hearst proposed an algorithm for identifying acronyms by using parenthetical expressions as a marker of a short form. -hierarchical neural models have been successfully used in document-level language modeling and document classification. -top-down cues, on the other hand, were found to be effective only on a subset of the data, which corresponds to the interesting contrasts that cause lexical variation. -labeledlda is applied, utilizing constraints based on an open-domain database ( freebase ) as a source of supervision. -niessen and ney describe an approach for translation from german to english that combines verbs with associated particles, and also reorders questions. -we adopt the feature set of the best performing sts system at semeval-2015, sultan et al, 2015. -as for recurrent models, our model outperforms rnns but is below state of the art lstm models. -in this paper, we introduce a neural network approach to learn continuous document representation for sentiment classification. -previous research on document sentiment classification has shown that machine learning based classifiers perform much better compared to rule-based systems. -we evaluate our approach on the english portion of the conll-2012 dataset. -first, we will consider the itg constraints. -it is impossible to construct rules to identify humor. -previous research indicates that automated communication systems are more effective if they take into account the affective and mental states of the user. -grammar induction is a fundamental task in natural language processing ( nlp ). -pennell and liu proposed a character-level mt model for text normalization. -wan used machine translation to translate the source language to the target language to bridge the gap and applied the co-training approach. -storyline detection from news articles aims at summarizing events described under a certain news topic and revealing how those events evolve over time. -there are several studies about grammatical error correction using phrase-based statistical machine translation. -an alternative approach to training structured linear classifiers is based on maximum-margin markov networks. -scqa learns the shared model parameters and the similarity metric by minimizing the energy function connecting the twin networks. -we show that this approach outperforms several baseline methods when judged against goal-acts identified by human annotators. -the gunning fog index uses average sentence length and the percentage of words with at least three syllables. -experimental results show that our models can rank the ground-truth error position toward the top of the candidate list. -sifier ( step 3 ) expands its training data using distributed vector representations of words. -we perform chinese word segmentation, pos tagging, and dependency parsing for the chinese sentences using stanford corenlp. -we train a ranking svm model to identify ( structured ) problem answers from unstructured answer text. -animacy is an inherent property of the referents of nouns which has been claimed to figure as an influencing factor in a range of different grammatical phenomena in various languages and it is correlated with central linguistic concepts such as agentivity and discourse salience. -approximate inference can be done by loopy belief propagation. -brown was used through the interface provided by nltk. -the other potential problem is the so-called “ bag-of-sentences ” assumption implicitly made by most of these summaryrs. -in this paper, we present a simple approach to unsupervised semantic role labeling. -we use the mpqa subjectivity lexicon. -we propose two inexpensive methods for training alignment models solely using free text, by generating artificial question-answer pairs from discourse structures. -ties and relations in a knowledge base ( kb ) by jointly embedding the union of all available schema types — not only types from multiple structured databases ( such as freebase or wikipedia infoboxes ), but also types expressed as textual patterns from raw text. -we investigate the use of speech-gaze temporal information and word-entity semantic relatedness to facilitate word acquisition. -stance detection is the task of determining whether the attitude expressed in a text towards a given topic is ‘ in favour ’, ‘ against ’, or ‘ neutral ’. -we use case-insensitive bleu-4 and rouge-l as evaluation metrics for question decomposition. -the semeval 2012 competition initiated a task focused on semantic textual similarity between sentence pairs. -our set cover-based method guarantees that all bursty n-grams including irregularly-formed ones must be covered by extracted bursty phrases. -as opposed to the two dominant techniques of computing statistics or writing specialized grammars, our document-centered approach works by considering suggestive local contexts and repetitions of individual words within a document. -i have described an implemented system based on the theoretical treatment which determines whether a specified sequence of trajectory-of-motionevents is or is not possible under varying situationally specified constraints. -cheng et al and wu et al used neighboring dependency attachment taggers to improve the performance of the deterministic parser. -in this paper, we extend the standard hmms to learn distributed state representations and facilitate cross-domain sequence predictions. -empirical experiments on chinese-to-english and japaneseto-english tasks demonstrate that the proposed attention based nmt delivers substantial gains in terms of both bleu and aer scores. -this paper presents a pronoun anaphora resolution system based on fhmms. -a different alternative, which however only delivers quasi-normalized scores, is to train the network using the noise contrastive estimation or nce for short. -we use relative position representation in selfattention mechanism of both the encoder and decoder side. -in this work, we first investigate label embeddings for text representations, and propose the label-embedding attentive models. -this paper presents the excitement open platform ( eop ), a generic architecture and a comprehensive implementation for textual inference in multiple languages. -we use an automatic topic segmentation method to segment the source articles in our test corpus. -we split the words into sub-words using joint bpe with 32, 000 merge operations. -the model is used to evaluate the likelihood of various substitutes for a word in a given context. -in general, inference and learning for graph-based dependency parsing is np-hard when the score is factored over anything larger than arcs. -zens and ney exhaustively compare the ibm and itg constraints, concluding that although the itg constraints permit more flexible re-orderings, the ibm constraints result in higher bleu scores. -in this work, we describe a system we developed and submitted to semeval2015. -collobert et al used convolution for embeddings with a crf layer to attain alongside benchmarking several nlp tasks including ner. -lexical, syntactic and semantic information from the reference and the two hypotheses is compacted into relatively small distributed vector representations and fed into the input layer, together with a set of individual real-valued features coming from simple pre-existing mt evaluation metrics. -the alignment template approach for pb-smt allows many-to-many relations between words. -in this paper, we present a novel method for learning the edges of entailment graphs. -the system of krishnakumaran and zhu uses wordnet and word bigram counts to predict verbal, nominal and adjectival metaphors at the sentence level. -on the multimodal emotion recognition task, our model achieves better results compared to the state-of-the-art models across all emotions on the f1 score. -xu et al described a bayesian semisupervised model by considering the segmentation as the hidden variable in machine translation. -in recent years, we have seen an increasing use of graph-based methods in nlp. -issue framing is related to the broader challenges of biased language analysis and subjectivity. -a tdnn convolves a sequence of inputs math-w-6-1-0-7 with a set of weights m. -in this paper, we propose a novel neural network model called rnn encoder c decoder that consists of two recurrent neural networks ( rnn ). -empirical measurements from large enough samples tend to be reliable for even larger sample sizes. -parsing scores or discourse based scores. -as in previous work, we represent wordforms by their orthographic strings, and word-meanings by their semantic vector representations as produced by a distributional semantic vector space model. -our system improves over state of the art in the full lexical substitution task in all three languages. -we use glove embeddings for the english and fasttext embeddings for all newswire tasks. -tomanek et al used eye-tracking data to evaluate a degree of difficulty in annotating named entities. -we first learn word embeddings for each language, then use a seed dictionary to train a mapping function between the two vector spaces. -in this paper, we describe an experiment on fully automatic derivation of the knowledge necessary for part-of-speech tagging. -we include pos tags and the top n-gram features as described in prior work. -in the former stage, a specially designed deep network is given to learn the unified representation using both textual and non-textual information. -in section 4, we apply rd to recognize protein-protein interaction ( ppi ) sentences, using proteins as seeds for the entity discovery phase. -in another, cite-p-17-1-19 applies an svm to rank elements, by devising the input vector by subtraction of feature values. -cnns have been effectively employed in nlp tasks such as text classification, sentiment analysis, relation classification, and so on. -there is a growing interest in learning vectorspace representations of words and phrases using large training corpora in the field of natural language processing. -on the other hand, we propose a new method for speeding up classification which is independent to the polynomial kernel degree. -in this paper, we propose a neural knowledge diffusion ( nkd ) model to introduce knowledge into dialogue generation. -recently, a chatbot for e-commerce sites known as superagent has been developed. -in this paper, we propose adversarial stability training for neural machine translation. -as a supervised upperbound baseline, we use stanford collapsed dependencies for the english data and dependencies coming from the mate tools for the german corpus. -dialogue act classification is a fundamental task in natural language processing ( nlp ). -comparatively, the best english srl results reported drops from 91. 2 ( cite-p-26-1-9 ) to 80. 56 ( cite-p-26-1-12 ). -suggestion mining is the task of extracting suggestions from unstructured text. -for instance, reddy et al collected numerical scores for 90 english nominal compounds regarding their compositionality. -the survey by schmidt and wiegand points out that bag-of-word models are good features for hate speech detection, although they ignore word order and sentence syntax. -word alignment is a crucial early step in the training of most statistical machine translation ( smt ) systems, in which the estimated alignments are used for constraining the set of candidates in phrase/grammar extraction ( cite-p-9-3-5, cite-p-9-1-4, cite-p-9-3-0 ). -as a case study, we experimented on the language pair of japanese and korean. -this approach has previously been successfully used on english. -in this paper, we construct the discourse dependency corpus scidtb. -recently, tensor factorization-based methods have been proposed for binary relation schema induction ( cite-p-13-3-13 ), with gains in both speed and accuracy over previously proposed generative models. -we propose tree-based position features to encode the relative positions of words in a dependency tree. -we use pre-trained word embeddings trained on google news corpus. -by varying the size of the training data and the dimensionality of the covariates, we have demonstrated that our proposed model is relatively robust across different parameter settings. -in this study, we focus on extractive summarization. -we show that acme yields a significant relative error reduction over the input alignment systems and heuristic-based combinations on three different language pairs. -we used glove vectors to initialize the word embeddings. -hand-built lexicons, such as cyc and wordnet, are the most useful resources for nlp applications. -the classic data set of rubenstein and goodenough consists of 65 noun pairs. -long short-term memory networks have been applied to machine translation and semantic processing. -empirical results show that our approach significantly outperforms existing neural and non-neural approaches on framenet data. -only few nlg systems generate personalized information from medical data for the patient, as opposed to health care personnel. -gazdar discussed a restricted form of indexed grammars in which the stack associated with the nonterminal on the left of each production can only be associated with one of the occurrences of nonterminals on the right of the production. -our algorithm is also applicable to other graph-structured representations, e. g. hpsg predicate-argument analysis ( cite-p-26-1-25 ). -in their setting, lda merely serves the purpose of dimensionality reduction, whereas our particular motivation is to use topics as probabilistic indicators for the prediction of attributes. -first, we add two sources of implicit linguistic information as features – eventuality type and modality of an event, which are also inferred automatically. -named entity linking is a fundamental task in natural language processing ( nlp ). -the goal of the conll-2014 shared task was to evaluate algorithms and systems for automatically correcting grammatical errors in english essays written by second language learners of english. -matrix decomposition is computationally heavy and has not been proven to scale well when the number of words assigned to categories grows. -by contrast, our approach adopts a twin-candidate learning model. -mcclosky et al used an unlabeled corpus to reduce data sparsity. -we present multigrancnn, a general deep learning architecture for matching text chunks. -in this paper, we propose a fluency boost learning and inference mechanism. -complexity of this task challenges systems to establish the meaning, reference and identity across documents. -cross-narrative temporal ordering of medical events is essential to the task of generating a comprehensive timeline over a patient s history. -in zaidan et al, we assume that on average annotating an instance with feature feedback takes twice as much time as annotating an instance without feature feedback. -rahman and ng used event-related information by looking at which semantic role the entity mentions can have and the verb pairs of their predicates. -we then adopt the machine learning method proposed in and the bayesian network classifier for feature rating estimation. -word embeddings are learned from a given text corpus without supervision by predicting the context of each word or predicting the current word given its context. -we evaluate our proposed method on the tac 2008 and 2011 data sets using the standard rouge metric and human evaluation of the linguistic quality. -shieber, schabes, and pereira and sikkel have shown how to specify parsers in a simple, interpretable, item-based format. -their methods have the potential to drop arbitrary words from the original sentence without considering the boundary determined by the tree structures. -usually, such methods need intermediary machine translation system or a bilingual dictionary to bridge the language gap. -the pattern matching capabilities of neural networks can be used to locate syntactic constituents of natural language. -we assume that this property would fit with a word alignment task, and we propose an rnn-based word alignment model. -to compare our model with the other systems, we evaluated the performance of our model when the entity boundaries were given. -johnson thinks that re-annotating each node with the category of its parent category in treebank can improve parsing performance. -in this paper, we propose a multilingual transliteration system for named entities. -we describe our pmi-cool system for semeval-2016, task 3 on community question answering, subtask a, which asks to rerank the comments from the thread for a given forum question from good to bad. -our experimental results show the effectiveness of our method. -we use the easyccg parser of lewis and steedman as the parser. -the smt systems were trained using the moses toolkit with modified kneser-ney smoothing. -we evaluate kale with the link prediction and triple classification tasks on wordnet and freebase data. -an additional translation set called the maximum bleu set is employed by the smt system to train the weights associated with the components of its log-linear model. -is the word in the confusion set that occurred most often in the training corpus. -the phrasal implementation uses the line search algorithm of cer et al, uniform initialization, and 20 random starting points. -stoyanov et al used subjective vocabulary for their opinion qa system. -while there is no overall best model, all models significantly outperform a single-sense skip baseline, thus demonstrating the need to distinguish between word senses in a distributional semantic model. -in this paper, we propose a sense-based translation model to integrate word senses into statistical machine translation. -in the message polarity classification subtask, we focus on the influence of domain information on sentiment classification. -tu et al incorporated a reconstructor module into nmt, which reconstructs the input source sentence from the hidden layer of the output target sentence to enhance source representation. -brockett et al used phrasal statistical machine translation techniques to correct countability errors. -in this article, we argue that kendall s math-w-11-1-0-8 can be used as an automatic evaluation method for information-ordering tasks. -the lexicon consists of a strongly connected core, around which there is a kernel, an asymmetric grounding set and satellites. -in this paper, we propose an attention based rnn framework to generate multiple summaries of a single document tuned to different topics of interest. -as described above, our base system is a phrase-based statistical mt system, similar to that of och and ney. -in social media context, a different uncertainty classification scheme is needed. -stolcke proposed a criterion for pruning n-gram language models based on the relative entropy between the original and the pruned model. -this paper addresses an automatic classification of preposition types in german, comparing various clustering approaches. -previous work suggests that the unigram baseline can be difficult to beat for certain types of debates. -on top of a distributed file system, the runtime transparently handles all other aspects of execution, on clusters ranging from a few to a few thousand nodes. -corex and anchored corex consistently produce topics that are comparable to lda-based methods, despite only making use of binarized word counts. -this logical form is evaluated against a learned probabilistic database that defines a distribution over denotations for each textual predicate. -we used the l1-regularized logistic regression classifier implemented in liblinear. -atc in our system is performed using a hierarchical clustering method in which clusters are merged based on average mutual information measuring how strongly terms are related to one another. -the result holds both for seen bigrams and for unseen bigrams whose counts have been re-created using smoothing techniques. -another device is the specification of a temporal or spatial parameter that is outside the normal range of a situation. -empirical results show that our model outperforms state-of-the-art machine translation models, for both english and chinese, in terms of both automatic and human evaluation. -the smaller windows allow to acquire more relevant contexts for a target, but increase the data sparseness problem. -to the best of our knowledge, our method is the first to address this task in an al framework. -recent research in abstractive summarization has focused on data driven neural models based on the encode-attend-decode paradigm ( bahdanau et al, 2014 ). -as a result, our proposed model trian achieves near state-of-the-art performance. -dtm is a switching graphical model performing a switch between topics and ad-expressions similar to that in. -learning the probability of n-grams, together with their representation in a continuous space, is an appropriate approximation for large vocabulary tasks. -johnson showed that the performance of an unlexicalized pcfg over the penn treebank could be improved enormously simply by annotating each node by its parent category. -this type of data has been found to yield the best correlation with eye-tracking data when different styles of presentation were compared for english. -when a source sentence is to be translated, its domain is first predicted. -mihalcea and moldovan and lytinen et al used wordnet to obtain the sense of a word. -our results show that this regularization technique is critical for obtaining a state-of-the-art result. -coordination is a common syntactic phenomenon, appearing in 38. 8 % of the sentences in the penn treebank ( ptb ) ( cite-p-24-1-13 ), and in 60. 71 % of the sentences in the genia treebank ( cite-p-24-1-15 ). -in this paper, we extend methods from cite-p-12-1-11 for reducing the worst-case complexity of a context-free parsing pipeline via hard constraints derived from finite-state tagging preprocessing. -our best system boosts precision by 44 % and recall by 70 %. -we should incorporate non-local information into the model. -the chinese system uses the berkeley parser. -we use rouge, a recall-oriented evaluation package for automatic summarization. -a key contribution of this paper is using relation temporality for determining relation equivalence. -gildea presents a general algorithm to binarize an lcfrs while minimizing a given scoring function. -compact translation models tried to further improve the translation probabilities based on question-answer pairs by selecting the most important terms to build compact translation models. -second, beyond deterministic greedy search, principled dynamic programming strategies can be employed to explore more possible hypotheses. -we have also provided new evaluation metrics inspired by research in ir, and guidelines for evaluating semantic representation models on the quantitative wa task. -there has been extensive work on modeling conversational interactions on twitter. -to compensate this, we apply a strong recurrent neural network language model. -in previous work, hatzivassiloglou and mckeown propose a method to identify the polarity of adjectives. -to overcome the deficiencies of these two kinds of methods, we propose a novel semi-supervised key phrase extraction approach in this paper, which explores title phrases as the source of knowledge. -we use conditional random fields and memory-based learning as ml methods for word-level qe. -in this paper, we attempt to integrate prosodic information for asr using an n-best rescoring scheme. -in this paper, we undertake such a comparative study by looking at selectional preferences of german verbs. -these meetings have been transcribed, and annotated with extractive summaries. -in name translation, only 0.79 % and 1.11 % of candidates for english person names and location names, respectively, have to be proposed. -although it does not beat the hmm, the new convex model improves on the standard ibm model 2 significantly. -we investigate this problem by learning domain-specific representations of input sentences using neural network. -we use the newstest 2011 data provided by the annual workshop on statistical machine translation. -in this paper, we present the first empirical study that quantitatively measures the deception cues in real-time writing process. -turney and littman evaluated the semantic orientation of a target word t by comparing its association with two seed sets of manually crafted target words. -we generalize this result to formalisms beyond cfg. -in emerging areas, such as domainoriented dialogues, the interaction with the system, typically modelled as a conversation with a virtual anthropomorphic character, can be the main motivation for the interaction. -semantic role labeling is the task of labeling predicate-argument structures with semantic roles. -learning is done using a monte carlo variant of the expectation-maximization algorithm. -word-level dsms can be categorized into unstructured, that employ a bag-of-words model, and structured, that employ syntactic relationships between words. -consistency of corpus annotation is an essential property for the many uses of annotated corpora in computational and theoretical linguistics. -by being fictional, the answer typically can be found only in the story itself. -yao et al applied linear chain crfs with features derived from ted to automatically learn associations between questions and candidate answers. -as a key property of our tool, we store all intermediate annotation results and record the user-system interaction data. -for example, tan et al and zhang et al have found that the language used in arguments and the patterns of interaction between debaters are important predictors of persuasiveness. -following, we classified words into high-frequency words and content words. -ikeda et al proposed a machine learning approach to handle sentiment polarity reversal. -experiments show that our algorithm leads to a more effective and stable training of neural network based detection models. -we identify the subtopics ( which are closely related to the original topic ) in the given body of texts by using lda and calculate their similarity with the questions by applying essk ( with disambiguated word senses ). -first, we present a new dataset of caption annotations?, conceptual captions ( fig. 1 ), which has an order of magnitude more images than the coco dataset. -we use the universal pos tagset proposed by petrov et al which has 12 pos tags that are applicable to both en and hi. -the corpus is balanced with respect to genericity and about 10,000 clauses in size. -we construct our representations using a skip-gram model of mikolov et al trained on textual data to obtain linguistic embeddings and a deep convolutional neural network trained on image data to obtain visual embeddings. -the cross-entropy of the brown corpus and our model is 1,75 bits per character. -in this paper, we propose two statistical models to solve this seeded problem, which aim to discover exactly what the user wants. -experiments show that our model outperforms previous state-of-the-art methods, including those relying on much richer forms of prior knowledge. -clark et al used the results of one pos tagger on unannotated data to inform the training of another tagger in a semisupervised setting using a co-training routine with a markov model tagger and a maximum entropy tagger. -this measure is a “ within-topic ” measure. -they are a combination of features introduced by gildea and jurafsky, ones proposed in, surdeanu et al and the syntactic-frame feature proposed in. -source channel model has been widely used for spelling correction. -we evaluate our methods using the benchmark test collection from the acl semeval-2007 web person search task. -from 2007 on, a global crisis struck the financial markets and led to a severe slowdown of the real economy. -the application is unusual because it requires text-to-speech synthesis of undited, spontaneously generated conversational text. -in this paper, we address one aspect of this problem – inferring predictive models to structure task-oriented dialogs. -ji and grishman even consider topic-related documents, proposing a cross-document method. -they describe this setting as unsupervised because they only use 14 seeds as paradigm words that define the semantic orientation rather than train the model. -in this paper, we investigated the usefulness of directly summarizing citation texts ( sentences that cite other papers ) in the automatic creation of technical surveys. -related work soricut and marcu describe a discourse parser - a system that uses penn treebank syntax to identify intra-sentential discourse relations in the rst treebank. -mimus follows the information state update approach to dialogue management, and has been developed under the eu-funded talk project ( cite-p-14-3-9 ). -for the ° predicted setting, first, we predicted the subject labels in a similar manner to five-fold cross validation, and we used the predicted labels as features for the episode classifier. -leacock, towell and voorhees demonstrated that contextual representations consisting of both local and topical components are effective for resolving word senses and can be automatically extracted from sample texts. -we used elmo embeddings, which are generated by training a bidirectional language model on a large corpus of unlabeled data. -we propose a cluster-ranking approach to coreference resolution that combines the strengths of mention rankers and entity-mention models. -we also explore a one-semantic-class-per-discourse heuristic, and use the classifiers to dynamically create semantic features. -in this paper, we propose a method for nsw detection. -for automatic extraction of patterns, we followed the pattern definitions given in. -in the figure, the titles are sorted left to right based on the maximum mean story grade among the titles in the libitum approach. -we present a technique that improves the efficiency of word-lattice parsing as used in speech recognition language modeling. -in the following example, “ will go ” is translated as ayg ( jaaenge ), with eg ( enge ) as the future tense marker : -lu et al used shallow parsing to identify aspects for short comments. -we propose a novel unsupervised approach for distinguishing literal and non-literal use of idiomatic expressions. -earlier work on event coreference in the muc program was limited to several scenarios such as terrorist attacks and management succession. -a channel is a communication medium associated with a particular encoding method. -we used the stanford parser to generate parse trees. -neg - finder significantly outperforms bootstrapping prior to the domain expert s negative categories. -we describe our participation in the semeval 2007 web people search task. -the parameters of the model are estimated using gibbs sampling. -in section 6 and 7, we present our experimental results and analyses, and finally conclude our work. -recently, continuous bag-of-words and skip-gram models, which can alleviate the above issue, have received much attention. -motivated by the directional scattering patterns of the gmm mean supervectors, we peroform discriminant analysis on the unit hypersphere rather than in the euclidean space, leading to a novel dimensionality reduction technique ° sda . -then, gated recurrent neural network is exploited to adaptively encode semantics of sentences and their inherent relations in document representations. -following blitzer et al, we only use positive entries in the pivot predictors weight vectors to compute the svd. -to integrate local features with long distance dependencies, we propose a dependency-based gated recursive neural network. -these parsers make use of the ccgbank that is created by inducing a ccg grammar from the penn treebank. -neelakantan et al and make use of context-based word sense disambiguation during corpus training to allow on-line learning of multiple senses of a word with modified versions of skip-gram. -we describe the baseline phrase-based translation system and various refinements. -to this end, we adapt a formalism known as unordered tree alignment to a probabilistic setting. -since the ud annotation scheme is applied on all of the treebanks, this suggests that the training data of the same language from different domains could be combined. -it is worth noticing that there is a nombank-specific label in figure 1, sup, in helping introduce the arguments, which occur outside the nominal predicate-headed noun phrase. -our named entity recognition module uses the hmm approach of, which learns from a tagged corpus of named entities. -distributed representation can inform an inductive bias to generalize in a bootstrapping system. -we use the scfg decoder cdec 4 and build grammars using its implementation of the suffix array extraction method described in lopez. -the maximal marginal relevance algorithm is used to perform sentence reranking and selection. -the aforementioned studies have shown that incorporating contextual information can improve sentiment analysis. -in section 5, we outline the experiments used to evaluate the models and present their results. -this is similar in spirit to hidden topic models such as latent dirichlet allocation, but rather than assigning a hidden topic to each word, we constrain the topics to yield a linear segmentation of the document. -in this paper, we explore features representing the accuracy of the content of a spoken response. -this paper describes the evaluator, concentrating on cases in which the system and user disagree. -xiong et al incorporated lexical cohesion devices into document-level machine translation. -given a word, the task of finding the semantic orientation of the word is to identify if the word is more likely to be used in positive or negative sense. -yu et al proposed the factor-based compositional embedding model, which uses syntactic dependency trees together with sentence-level embeddings. -in pursuit of better translation, phrase-based models have significantly improved the quality over classical word-based models. -carpuat and wu, 2007 ) report an improvement in translation quality by incorporating a wsd system directly in a phrase-based translation system. -brin proposed a bootstrapping-based method on the top of a self-developed pattern matching-based classifier to exploit the duality between patterns and relations. -reinforcement learning with user feedback after the imitation learning stage further improves the agent ’ s capability in successfully completing a task. -the scripts were further post-processed with the stanford corenlp pipeline to perform tagging, parsing, named entity recognition and coreference resolution. -miwa et al proposed a hybrid kernel 5, which is a composition of all-dependency-paths kernel, bag-of-words kernel and sst kernel. -on the other hand, math-w-6-1-0-93 and math-w-6-1-0-96 both happen in the interval math-w-6-1-0-103 but they form an overlap relation. -in section 3, we describe each processing step of our approach. -we propose a knowledge-lean method that relies on word association and requires no syntactic annotation. -named entity ( ne ) tagging is a fundamental task in natural language processing. -one of the main advantages of this approach is that it does not depend on bilingual or multilingual resources. -chan and ng proposed a machine translation evaluation metric based on the optimal algorithm for bipartite graph matching also known as the assignment problem. -we use the wikipedia revision toolkit with modified kneser-ney smoothing, and the jwpl wikipedia api. -to identify these terms, we use the log-likelihood statistic suggested by dunning and first used in summarization by lin and hovy. -davidov et al used 50 hashtags and 15 emoticons as sentiment labels for classification to allow diverse sentiment types for the tweet. -the task is to determine the degree of semantic equivalence between a pair of sentences. -in this paper, however, we focus on the use of the context model to resolve deictic and anaphoric expressions keyed in by the user. -dependency-to-string model takes head-dependent relations as the elementary structures of dependency trees, and represents the translation rules with the source side as hdrs and the target side as string. -more sophisticated metrics, such as the rte metric, use higher level syntactic or semantic analysis to determine the grammaticality of the output. -semantic integration of these different but related types of medical knowledge that is present in disparate domain ontologies becomes necessary. -we can write math-w-15-1-1-133, where math-w-15-1-1-162 is a r 1r vector that can again be computed offline. -yu and siskind proposed a system that induces word-object mappings from features extracted from short videos paired with sentences. -word embeddings were obtained using word2vec, which represents each word as a 300-dimensional vector. -according to fox, dependency representations have the best inter-lingual phrasal cohesion properties. -in the final two articles, by piotrovskij and maruk, the authors strongly advocate what they consider to be practical approaches to mt, while dismissing much of the work cited in the first three articles. -the use of unsupervised word embeddings in various natural language processing tasks has received much attention. -graves, 2012, and convolutional nns are the two most popular neural networks in this regard. -we use negative sampling as a speed-up technique. -this software is an implementation of the algorithm presented by, which extracts frequent ordered subtrees from a set of ordered trees. -the corpus is based on the dataset introduced by pang and lee and consists of 11,855 single sentences extracted from movie reviews. -the complexity of the tasks makes it however difficult to infer what kind of information is present in the representations. -recent advances in dependency parsing have been made by introducing non-linear, neuralnetwork based models. -in this paper we describe our participation in semeval-2015 task 12 ( absa ). -due to the superior performance of fasttext, the system highlights high risk sentences in those reports via using fasttext. -in this paper, we examine topological field parsing, a shallow form of parsing which identifies the major sections of a sentence in relation to the clausal main verb and the subordinating heads. -gupta and ji use cross-event information to extract implicit time information. -one of the touted advantages of neural network language models is their ability to model sparse data. -lexical cohesion analysis has been used in such nlp applications as determining the structure of text and automatic text summarization. -for example, smith et al mine parallel sentences from comparable documents in wikipedia, demonstrating substantial gains on open domain translation. -we used wordnet as a source of synonyms and hypernyms for linking english words in the word relatedness graph. -luong and manning presented a neural machine translation system using character rnns only for oov words, dropping the rnn output into a conventional word-based nmt system. -for example, a0 is commonly mapped onto subject ( sbj ), whereas a1 is often realized as object ( obj ). -our model consists of a linear classifier based on support vector machines, which have proved to provide competitive results in text categorization since their conception. -galley et al propose a method for extracting tree transducer rules from a parallel corpus. -in this work, we focus on gaokao history multiple choice questions which is denoted as gkhmc. -in this paper, we propose a companion learning framework to unify rule-based policy and rlbased policy. -it adapts to the user ’ s preferences and situation. -our framework is general and applicable to various types of neural architectures. -erk and pad introduce the concept of a structured vector space in which each word is associated with a set of selectional preference vectors corresponding to different syntactic dependencies. -in ( cite-p-23-3-10 ), the authors proposed a method that tackles online multi-task learning in the lifelong learning setting. -in this paper, we propose two algorithms for automatically ontologizing ( attaching ) semantic relations into wordnet. -duh and kirchhoff adopted a minimally supervised approach that requires raw data from several das, and an msa morphological analyzer. -zesch and gurevych created a third dataset from domain-specific corpora using a semi-automatic process. -for the laptops domain, we used only one cnn classifier that predicts the aspects based on a probability threshold. -every time a sentence is analyzed, it detects unknown morphemes, enumerates candidates and selects the best candidates by comparing multiple examples kept in the storage. -rangrej et al compared the performance of three document clustering techniques on twitter data, and found that graph-based approach using affinity propagation performs best in clustering tweets. -we constrain the translation of an input sentence using the most similar ® translation example retrieved from the tm. -tomanek et al used eye-tracking data to evaluate a degree of difficulty in annotating named entities. -shoufan and alameri and al-ayyoub et al present a survey on nlp and deep learning methods for processing arabic dialectal data with an overview on arabic did of text and speech. -ambiguity is a common feature of weps and wsd. -the system employs simple partial parsing techniques as described by abney. -it was first used for unlabeled dependency parsing by kudo and matsumoto and yamada and matsumoto. -we set up a web experiment using the evaluation toolkit by belz and kow to collect ratings of local coherence for implicit and explicit arguments. -dbpedia spotlight is a tool for automatically annotating mentions of dbpedia resources in text. -in creating the summary, instantiating the content model, we identify independent categories and dependent categories, and we argue that in order to preserve the cohesion of the text. -opinion lexicons have been obtained for english and also for spanish. -the phrase-based model is much simpler than other phrase-based statistical models. -it may be useful for discourse relation projection and discourse parsing. -this is due to the possibility to boost similarity to human reference translations by the additional use of a cost function in our approach. -third and finally, the baselines reported for resnik ’ s test set were higher than those for the all-words task. -in this paper, we have attempted to reproduce a study by nilsson et al that has shown that making auxiliaries heads in verb groups improves parsing but failed to show that those results port to parsing with universal dependencies. -this paper presents such a method, exploiting machine learning in an innovative way. -for example, lavie et al, liu et al, and chiang noted that translation quality tends to decrease in tree-to-tree systems because the rules become too restrictive. -it also performs well on a number of natural language processing problems, including text categorization, sebastiani et al, and word sense disambiguation. -experiments in two domains showed that contextual role knowledge improved coreference performance, especially on pronouns. -asahara et al extended the original hmms by 1 ) position-wise grouping of pos tags, 2 ) word-level statistics, and 3 ) smoothing of word and pos level statistics. -the back-end is a modular, expandable, scalable and flexible architecture with parallel and distributed processing capabilities. -in order to do this, we adopt a multi-task learning approach. -given a sentence pair and a corresponding word alignment, phrases are extracted following the criterion in och and ney. -we develop a novel smooth version of the multi-focus attention function, which generalizes the single focus softmax-function. -for more details see the overview paper by the organizers. -och developed a training procedure that incorporates various mt evaluation criteria in the training procedure of log-linear mt models. -boyd-graber et al propose a topic model with wordnet and use it to carry out disambiguation and learn topics simultaneously. -non-compositional multiword expressions ( mwes ) still pose serious issues for a variety of natural language processing ( nlp ) tasks. -in this paper follows the pair-wise learning-to-rank paradigm outlined in. -in addition, we improve the word alignment results by combining the results of the two semi-supervised boosting methods. -we use the webclopedia question set by. -it has been widely adopted in the generic summarization task. -the insensitivity of bleu and nist to perfectly legitimate variation has been raised, among others, in, but the criticism is widespread. -our main claim is that we use visual and audio information to achieve robust topic identification. -in addition, we report the bleu score that was computed on the word level. -han and baldwin use a classifier to detect illformed words, and then generate correction candidates based on morphophonemic similarity. -to address the above-mentioned issues, we present wikikreator, a system that can automatically generate content for wikipedia stubs. -we incorporate these learned word senses as translation evidences into maximum entropy classifiers which form the foundation of the proposed sense-based translation model. -we built a type signature for the xtag english grammar, an existing broad-coverage grammar of english. -we present svm-based classifiers which use two sets of features : n-gram and stylistic features. -in this paper, we explore a flexible application of dependency paths that overcomes this difficulty. -unlike grconv and adasent, our model uses full binary tree as the topological structure. -this new approach, without the need of using constrained re-decoding as a middle step, provides a direct means to learn the knowledge in the partial labels. -developments of this approach have been proposed that improve cluster quality and retrieval performance. -for this, an effective approach is to automatically select and expand domain-specific sentence pairs from large scale general-domain parallel corpus. -xiong et al extend the treelet approach to allow dependency fragments with gaps. -the method does not require labeling sentences with logical forms. -hank and church pointed out the usefulness of mutual information for identifying monolingual collocations in lexicography. -in this paper, we examine how well the rouge scores correlate with human evaluation for extractive meeting summarization. -if arbitrary word-reorderings are permitted, the search problem is np-hard. -to evaluate the performance of our model, we conducted our experiments on stanford natural language inference corpus. -mohammad and hirst show that their approach performs better than other strictly corpusbased approaches that they experimented with. -the models admit a rich set of linguistic features, and are trained to learn feature weights automatically by optimizing a regression objective. -two attempts to overcome this withdrawal are presented in nerbonne and nerbonne. -our unidirectional-rm sets a new state of the art for the sentence completion challenge with 69. 2 % accuracy. -we apply a state-of-the-art language-independent entity linker to link each transliteration hypothesis to an english kb. -as a refinement ( relabeling ) model, it achieves the best las on 5 out of 7 datasets. -the corpus has been converted into an xml format conforming to tei standards. -we used the nltk python classifier to train the nltk python model. -in this paper, we present litner, an ner system targeted specifically at fiction. -for twitter, we obtain a median error of 479 km and mean error of 967 km. -different filters of the same 3 á 3 shape are operated over the input matrix to output feature map tensors. -we use support vector machines, a maximum-margin classifier that realizes a linear discriminative model. -given no linguistic resources between the source language and the target language, transfer learning methods can be used instead. -for example, resources such as wordnet may be used to aid in the classification of geo-political entities. -sugiyama et al extract features from the sentences based on the verbs and nouns in the sentences such as the verbal forms, and the part of speech tags of the 20 words surrounding the verb. -in this paper, we focus on translating into mrls and issues associated with word formation. -document level sentiment classification remains a challenge : encoding the intrinsic relations between sentences in the semantic meaning of a document. -in this study, we analyze the relationship between an individual ’ s traits and his/her aspect framing decisions. -this paper presents an approach to incrementally generating locative expressions. -in the decoding stage, the best first strategy is used to predict bridging links. -images are ranked using a graph-based method that makes use of both textual and visual information. -we use the scikit-learn toolkit to build word embeddings. -it is common practice to optimize the coefficients of the log-linear combination of feature functions by maximizing the bleu score on the development data. -the syntax-augmented translation model of zollmann and venugopal annotates nonterminals in hierarchical rules with thousands of extended syntactic categories in order to capture the syntactic variations of phrase pairs. -we use the moses scripts to tokenize the english sentences and perform truecasing. -in this paper, we focus on the application of machine translation via neural sequenceto-sequence learning. -this produces multiple paths between terms, allowing sash to shape itself to the data set. -we use the bag-of-words model in conjunction with word embeddings. -wang et al used a single-domain translation model and generalized a single-domain decoder to deal with different domains. -krulwich and burkey use heuristics based on syntactic clues to extract keyphrases from a document. -the current paper describes a new method for query selection and its applications in lm augmentation and adaptation using web data. -zeng et al exploit a convolutional neural network to extract lexical and sentence level features for relation classification. -resnik measures the similarity between two concepts by finding the ic of the lcs of the two concepts. -we will explore the effectiveness of sememe information for wrl in other languages. -socher et al assign a vector and a matrix to each word for the purpose of semantic composition, and build recursive neural network along constituency tree. -the embeddings were pre-trained using glove vectors. -hindle and rooth mention the interaction between the structural and the semantic factors in the disambiguation of a pp, indicating that verb complements are the most difficult. -phelan et al used tweets to recommend news articles based on user preferences. -opinion words ( oword ) and their semantic orientations ( otype ) are identified. -similar to earlier work, we set this problem as a variant of the textual entailment recognition task. -in future work, we plan to explore more fully the semantics of modification, and to pursue the addition of a type system to the logic to treat quantifiers analogously to cite-p-9-4-3, cite-p-9-4-4. -following previous work, we use generalized average precision to compare the ranking predicted by our model with the gold standard. -conditional random fields are conditional models in the exponential family. -cite-p-19-3-19, cite-p-19-3-20 showed through similar analyses of emotion words that the three primary independent dimensions of emotions are valence or pleasure ( positiveness c negativeness /pleasure cdispleasure ), arousal ( active cpassive ), and dominance ( dominant c submissive ). -we introduce a symmetric pattern based approach to word representation which is particularly suitable for capturing word similarity. -on the other hand, our proposed method learns a single representation for a particular word for each domain in which it occurs. -araki et al evaluated their model using blanc evaluation metric, while evaluated their model using the standard f 1 evaluation metric. -in this paper, we propose an approach to temporal information extraction that identifies a single connected timeline for a text. -chen et al show that n-gram model outperforms a popular feed-forward language model on a one billion word benchmark. -we show that this unsupervised system has better core performance than other learning approaches that do not use manually labeled data. -we call a sequence of words which have lexical cohesion relation with each other a lezical chain like. -to evaluate coherence, we did not use the rouge metric because from a manual analysis found that the ordering of content within the summaries is an aspect which is not evaluated by rouge. -our method handles noisy representation of questions in a source language to retrieve answers across target languages. -in all the experiments described in this paper, we use snow as the learning environment, with winnow as the update rule. -compound splitting is a fundamental task in natural language processing ( nlp ). -as we will show later, recall is well below 50 % for all named entity types on the new test sets. -to test our implementation, following sha and pereira, we performed an np chunking task using the conll-2000 text chunking task data. -greedy-loglin closely resembles the learning model of lapata, except that it is a discriminative log-linear model, rather of a generative markovian model. -the svm is based on discriminative approach and makes use of both positive and negative examples to learn the distinction between the two classes. -similarly, choi et al used a propbank-based semantic role labeler for opinion holder extraction. -the first-order measures obtained a higher wu & palmer score than the second-order measure on the test data. -our improved cube-pruned parser represents a significant improvement over the feature-rich transition-based parser of zhang and nivre with a large beam size. -we present experiments using our syntacticsemantic parser on the conll-2009 shared task english benchmark. -our baseline translation system is based on a stringto-dependency translation model similar to the implementation in. -another example is the mpqa subjectivity lexicon, which was built manually by annotating the subjective expressions in the mpqa corpus. -kim et al proposed walk-weighted subsequence kernel using e-walks, partial matches, non-contiguous paths, and different weights for different sub-structures. -we evaluated translation output using case-insensitive bleu. -in this work, we train a recurrent neural tagger for a low-resource language jointly with a tagger for a related high-resource language. -we are interested in addressing two types of data shift common in slu applications. -neither source-language nor target-language analysis was able to circumvent problems in mt, although each approach had advantages relative to the other. -in section 2, we describe the details of the syntactic decision tree lm. -we currently achieve coverage of 95. 26 %, a bleu score of 0 7227 and string accuracy of 0. 776 on the penn-ii wsj section 23 sentences of length 20. -entrainment in many of these dimensions has also been associated with measures of dialogue success. -metonymy is defined as a figure of speech in which a speaker uses one entity to refer to another that is related to it. -we used the version of string-edit distance of bangalore et al which normalises for length. -in this method, punctuations are not associated with lexical heads, but are treated as properties of their neighbouring words. -this justifies our attempt to model the continuity or shift of the discourse focus in pronoun resolution via centering-motivated features from the semantic perspective. -by enforcing consistency constraints between their predictions, we show improvements in the performance of both tasks without retraining the individual models. -in this article, we adopt their tagger for experiments. -the linguistic structure of a discourse is composed of utterances that exhibit meaningful hierarchical relationships. -quantum states are expressed as density operators rather than kets. -in a different vein, cite-p-19-1-12 introduced three unsupervised methods drawn from visual properties of images to determine a concept ’ s generality in hypernymy tasks. -central to our approach is the encoding of generation as a parsing problem. -however, most work focuses on congressional debates or debates in online forums. -the underlying model is a rnn encoder-decoder that explores possible binary tree structures and a reward mechanism that encourages structures that improve performance on downstream tasks. -thesaurus was acquired using the method described by lin. -in the future, we would like to explore additional types of rules such as seed rules, which would assign tuples complying with the ° seed information to distinct relations. -we describe a method for enriching the output of a parser with information available in a corpus. -in these settings, we must compute the gradient of entropy or risk. -creating lists of named entities is a critical problem at commercial engines such as yahoo! and google. -eisner algorithm can be modified trivially for secondorder decoding. -we extract the frequent noun terms from pros and cons reviews as features, then train a one-class svm to identify aspects from the candidates. -collobert et al used a feed-forward neural network to effectively identify entities in a newswire corpus by classifying each word using contexts within a fixed number of surrounding words. -our proposed method can be easily extended by using other types of submodular functions. -mikolov et al applied an rnn for language modeling, and demonstrated that the word embeddings learned by the rnnlm capture both syntactic and semantic regularities. -as a sequence labeler, we use conditional random fields. -in this paper, we present the first extrinsic evaluations of simulated annealing and d-bees in a lexical substitution setting. -zaidan and callison-burch developed an informal monolingual arabic online commentary annotated dataset with high dialectal content. -we train a 4-gram language model on the xinhua portion of the english gigaword corpus, which is trained on the xinhua portion of the english gigaword corpus. -in our experiment, using glpk s branch-and-cut solver took 0.2 seconds to produce optimal ilp solutions for 1000 sentences on a machine with intel core 2 duo cpu and 4gb ram. -in all these cases, topic information was helpful in boosting retrieval performance above baseline vector space or n-gram models. -we use four scales of adjectives ( cf. table 1 ). -experimental results show improvements of our compressive solution over state-of-the-art systems. -the german sentence is labeled using annotation projection. -in this paper, we propose an end2end neural model based on seq2seq learning framework with copy mechanism for relational facts extraction. -bordes et al further improved their work by proposing the concept of subgraph embeddings. -named entity recognition is the task of identifying named entities in text. -in addition, we improve the word alignment results by combining the results of the two semi-supervised boosting methods. -in summary, phrase-based systems have relatively limited potential to model word-order differences between different languages. -the experimental evaluation demonstrates the superior performance of the model on the benchmark datasets. -conditional auto-encoders have been employed in that generate diverse replies by capturing discourse-level information in the encoder. -we report an evaluation on all thirteen languages of the conll-x shared task, for comparison with the results by nivre and mcdonald. -comparisons with english typos suggest that some language-specific properties result in a part of chinese input errors. -we parse the text into typed dependency graphs with the stanford parser, recording all verbs with subject, object, or prepositional typed dependencies. -baldwin took a statistical approach to automated lexical acquisition for deep grammars. -in this paper, we describe our participating system in the semeval-2007 web people search task. -support, and for providing me with lots of instruction computation in subset ( often dreadful ) automata generated by his construction. -faruqui et al use synonym relations extracted from wordnet and other resources to construct an undirected graph. -our approach is most closely related to the approach described in kakkonen and sutinen where the experiments were conducted in the finnish language. -the most recent semi-automatic lexicon is sentiwordnet which assigns polarity to word senses in wordnet 3 known as synsets. -in section 3, we present the methodology of parallel data selection and terminology identification to improve ontology label translation. -in this paper, we are interested in extracting the unknown words with high precision and recall results. -structural correspondence learning uses only unlabeled data to find a common feature representation for a source and a target domain. -nlir is likely to be related to linguistic characteristics of the respective native languages. -in this work, we propose adding information to the wsme model which is provided by the grammatical structure of the sentence. -we cast the word alignment problem as maximizing a submodular function under matroid constraints. -in this paper, we describe our approach using a modified svm based classifier on short text. -future work will include a further investigation of parser-derived features. -the goal of this note is to point out inversion as an option for turning distributed language representations into classification rules. -other approaches are based on external features allowing to deal with various mt systems, eg. -crfs are particularly suitable for sequence labelling tasks. -in, the problem of personalized, interactive tag recommendation was also studied based on the statistics of the tags co-occurrence. -since the work of pang, lee, and vaithyanathan, various classification models and linguistic features have been proposed to improve classification performance. -in the current implementation, no acoustic information is used in disambiguating words ; only the pronunciations of words are used to verify the values of the semantic variables. -in the parliament domain, this means ( and is translated as ) ° report. -suggestion mining can be defined as the extraction of sentences that contain suggestions from unstructured text. -we do not assume that a single standard linking is valid for all predicates. -a paragraph associated with each topic is used as the source of relevant information about the topic. -this paper focuses on translation of fully-and partially-assimilated foreign words, called “ borrowed words ”. -recently, dsms based on neural networks have rapidly grown in popularity. -we also show that constraints derived from the discourse context can be highly useful for disambiguating sentence-level sentiment. -to implement this, the sklearn library is used. -the measure relies on the latent semantic analysis trained on the tasa corpus. -in this paper, we propose a novel time-aware kb embedding approach taking advantage of the happening time of facts. -word sense induction is performed using unsupervised clustering. -we incorporate subword units into a lattice framework within the kws system. -in this work, we attempt to model all three dimensions in developing a computational model for applause. -anderson et al show that semantic models built from visual data correlate highly with fmri-based brain activation patterns. -prior approaches to text simplification have addressed the task as a monolingual translation problem. -the training data released by the task organizers comes from the nucle corpus, which contains essays written by learners of english as a foreign language and is corrected by english teachers. -in this paper, we introduce picturebook embeddings produced by image search using words as queries. -sentiwordnet is a large lexicon for sentiment analysis and opinion mining applications. -in this paper, we use a non-projective dependency tree crf ( cite-p-16-3-4 ). -levy et al conducted a comprehensive set of experiments and comparisons that suggest that much of the improved results are due to the system design and parameter optimizations, rather than the selected method. -in this paper, we present a novel sliding window based text alignment algorithm for real-time crowd captioning. -experiments show that by incorporating mers model, the baseline system achieves statistically significant improvement. -in addition, an experiment was conducted to evaluate auto the advantage in terms of speed, the autosem. -co-training uses both labeled and unlabeled data to train models that have two different views of the data. -local analysis and co-occurrence based user profile representation have also been adopted to expand the query ( cite-p-18-1-12, cite-p-18-1-3 ). -pyramid is a summarization evaluation scheme designed to achieve consistent score while taking into account human variation in content selection and formulation. -an important advantage of our model is that it can be used to learn region representations for words, by using a quadratic kernel. -specifically, we propose target specific transformation component to better integrate target information into the word representation. -we formulate this task as a text-to-text natural language generation ( nlg ) problem. -cite-p-22-1-5 investigate the idea of fusing disparate sentences with a supervised algorithm. -emotions achieve a low agreement among raters ( see cite-p-8-3-0 ) and surprisingly emotion recognition is higher in a condition of modality deprivation ( only acoustic or only visual vs. bimodal ). -the brown algorithm is a hierarchical agglomerative hard-clustering algorithm. -the particular proposal is both precisely characterizable, through a compilation to linear indexed grammars, and computationally operational, by virtue of an efficient algorithm for recognition and parsing. -to cope with this problem, we applied an efficient algorithm of maximum entropy estimation for feature forests. -in section 5, we discuss the problem of segmenting and labeling dialog structure and building models for predicting labels. -to parse the target text, one simply uses the mixture of parsing models with the highest predicted accuracy. -summary structure is planned with sentences generated based on the semantic link network. -we propose a method using svr to combine various features to evaluate the similarity between two sentences. -in this paper, we propose to improve the robustness of nmt models with adversarial stability training. -in this paper, we specifically address questions of polysemy with respect to verbs, and how regular extensions of meaning can be achieved through the adjunction of particular syntactic phrases. -the data generated in the task provides ample opportunitites for further investigations of preposition behavior. -in this paper, we use the human annotated data or machine alignments of the training set. -in this paper, arabic was the target language but the approach is applicable to any language that needs affix removal. -quickset is a distributed system consisting of a collection of agents that communicate through the open agent architecture4 ( cite-p-2-60-7 ). -ratnaparkhi et al used 20 801 tuples for training and 3097 tuples for evaluation. -we found that our method gave overall better rouge scores than four baseline methods, and the new sentence clustering and compression algorithm are robust. -this paper presents universal conceptual cognitive annotation ( ucca ), a novel framework for semantic representation. -in this paper, we investigate the role of large amounts of noisily sense annotated data obtained using an unsupervised approach in relieving the data acquisition bottleneck for the wsd task. -in order to discover more general patterns, we map the tag set down after tagging, eg. -in this paper, we propose a knowledge-based answer selection system for arabic. -tang et al explored the impact of three different types of word representations on clustering-based representation, distributional representation and word embedding. -we introduce a novel method for grammatical error correction with a number of small corpora. -they employed a domain-independent feature set along with features generated from the output of chemspot, an existing chemical named entity recognition tool, as well as a collection of domain-specific resources. -the additional model is log-linearly interpolated with the indomain model using the multidecoding method described in. -yi et al, hu and liu, kobayashi et al, popescu and etzioni,. -we use a similarity sensitive rerank method to get the final abbreviation. -we compare our model with the baselines and state-of-the-art models for sentiment analysis, speaker traits recognition and emotion recognition. -in transformation-based parsing, a finite sequence of tree rewriting rules are checked for application to an input structure. -the corpus was converted from xml to raw text, various string normalization operations were then applied, and the corpus was lemmatized using treetagger. -to test how cs, normalisation, and dimensionality reduction affect simple compositional vector operations, we use the test portion of the phrasal similarity dataset from mitchell and lapata. -our methods extract one million contradiction pairs with over 70 % precision, and 500,000 causality pairs with about 70 % precision from a 600 million page web corpus. -in order to have a more extensive database of affect-related terms, we used wordnet affect, sentiwordnet, micrownop. -statistical topic models such as latent dirichlet allocation provide powerful tools for uncovering hidden thematic patterns in text and are useful for representing and summarizing the contents of large document collections. -vlachos has used the classifier confidence score as a stopping criterion for the uncertainty sampling. -here we identify the natural fragment of normal dominance constraints and show that its satisfiability problem is in deterministic polynomial time. -in our development work, we found that the method of clark and weir overall gave better performance, and so we limit our discussion here to the results on their model. -we adopt the greedy feature selection algorithm as described in jiang et al to pick up positive features incrementally according to their contributions. -clark and curran demonstrates that this relatively small set has high coverage on unseen data and can be used to create a robust and accurate parser. -we also look to lay the foundation for analysis based on implicit data collected from our application. -in this paper, we propose a deep neural network diachronic distributional model. -we use the lstm toolkit to train the encoder and decoder models. -experiments are conducted on the semeval-2010 task 8 dataset. -second, we integrate a simple lexical module which is jointly trained with the rest of the model. -to predict labels, we train conditional random fields, which are directly optimized for splitting. -in this paper, we describe the particularities of biochemical terminology. -relations form part of the qualia structure assumed in generative lexicon theory. -then we adopt a combination method to build a universal model to estimate semantic similarity, which consists of traditional natural language processing ( nlp ) methods and deep learning methods. -xing et al pre-defined a set of topics from an external corpus to guide the generation of the seq2seq model. -this paper describes a novel approach to semantic relation detection. -the aspect term extraction method is based on supervised learning algorithm, where we use different classifiers, and finally combine their outputs using a majority voting technique. -next on the continuum, we find work that focuses on defining morphological models with limited lexica that are then extended using raw text. -the heuristic strategy of grow-diag-final-and is used to combine the bidirectional alignments to extract phrase translations and to reorder tables. -msa is the language used in education, scripted speech and official settings while da is the primarily spoken native vernacular. -data sparsity is a fundamental problem in natural language processing ( nlp ). -meanwhile, the fluency of the produced summaries has been mostly ignored. -these are not only problems in exploring multi-party dialogues. -the main challenge we tackle is to generate quality data for training the reordering model in spite of the machine alignments being noisy. -in this paper, we will show the efficacy of collaborative ranking on the entity linking task defined in the knowledge base population ( kbp ) track ( cite-p-25-3-2 ). -we empirically evaluate cpra on benchmark data created from freebase. -we use deep neural networks, which can easily share information with hidden shared layers. -our method is more accurate than the baseline methods in different settings such as large rule sets and large vocabulary sizes. -zhang et al use adversarial training to obtain cross-lingual word embeddings without any parallel data. -following the set-up of duan et al and zhang and clark, we split ctb5 into training. -in unsupervised methods, most approaches regarded opinion words as the important indicators for opinion targets ( cite-p-16-1-3, cite-p-16-3-2, cite-p-16-1-18, cite-p-16-3-5 ). -question-answer pairs are represented by concatenated distributed representation vectors and a multilayer perceptron is used to compute the score for an answer ( the probability of an answer being the best answer to the question ). -the corpus used in our experiments is the french treebank, version from june 2010, hereafter ftb ). -we show how a bf containing n-grams can enable us to use much larger corpora and higher-order models complementing a conventional n-gram lm within an smt system. -in this paper, we propose a uima framework to manage the computation distribution of the complicated processing pipelines involved in cqa systems. -each of our systems uses the semeval 2012 c2015 sts datasets to train a ridge regression model that combines different measures of similarity. -for regularization, we use dropout and l2 regularization. -the structure of the paper is the following :,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, -given the statistics we have aggregated, we have designed a new crowdsourcing scheme that creates a new sct dataset, which overcomes some of the biases. -our system assumes pos tags as input and uses the tagger of ratnaparkhi to provide tags for the development and evaluation sets. -the millions of parameters were tuned only on a small development set consisting of less than 1k sentences. -quirk et al used a source-side dependency parser and projected automatic parses across word alignments in order to model dependency syntax on phrase pairs. -after sentence splitting and tokenization, we applied the highly efficient treetagger for part-of-speech tagging and we extracted time and money entities with fast regular expressions. -when the selected sentence pairs are evaluated on an endto-end mt task, our methods can increase the translation performance by 3 bleu points. -our analysis of naturally occurring dialog indicates that humans understand many utterances that would appear imperfect or incomplete to current natural language systems. -in this paper, we propose a model that jointly identifies the domain and tracks the belief states corresponding to that domain. -the second one is an approximation of the first algorithm. -in this paper, we propose to combine the advantages of source side constituency and dependency trees. -in this paper, we investigate noun phrases based on cross-linguistic evidence and present a domain independent model for their semantic interpretation. -this paper solves the inconsistence by normalizing the word vectors. -in this paper, we focus on learning the plan elements and the ordering constraints between them. -both parsers obtain state-of-the-art performance, are fast, and are easy to use through a simple api. -besides, we are interested in applying the method of combing topic model and deep learning into some traditional nlp tasks. -we propose a pure unsupervised d-topwords model to extract new domain-specific words. -klebanov et al evaluated the effect of concreteness as a feature for metaphor detection using mrcpd. -for example, for the oov mention ° lukebryanonline , our model can find similar mentions like ° thelukebryan and ° thelukebryan . -based on the attributes, several statistical classifiers were used to select operands and determine the operator. -we also incorporate additional features such as pos tags and sentiment features extracted from sentiment lexicons. -in this paper, we propose learning sentiment-specific word embeddings ( sswe ) for sentiment analysis. -reichart and rappoport showed that one can self-train with only a generative parser if the seed size is small. -experimental results show that oversampling is a relatively good choice in active learning for wsd in highly imbalanced data. -in this paper, we propose a stacked framework for learning to predict dependency structures for natural language sentences. -vietools is widely used in vietnamese language processing. -this representation allows easy data share between kbs. -recently, there is rising interest in modelling the interactions of two sentences with deep neural networks. -by drawing on the aggregated results of the task ’ s participants, we have extracted highly representative pairs for each relation to build an analogy set. -question answering ( qa ) is a fundamental task in natural language processing ( nlp ) applications such as question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering, question answering -in this paper, we present a method of using the hierarchy of labels to improve the classification accuracy. -in ( cite-p-17-3-4 ), popescu and etzioni not only analyzed polarity of opinions regarding product features but also ranked opinions based on their strength. -this grammar is based on the framework of head-driven phrase structure grammar, one of the most prominent linguistic theories being used in natural language processing. -cite-p-21-3-10 proposed to learn a two-dimensional sentiment representation based on a simple neural network. -to combat the noisy training data produced by heuristic labeling in distant supervision, researchers exploit multi-instance learning models. -ideally, apart from strategies to prevent errors, error handling would consist of steps to immediately detect an error when it occurs and to interact with the user to correct the error in subsequent exchanges. -in this paper, we propose a multimodal translation-based approach that defines the energy of a kg triple as the sum of sub-energy functions that leverage both multimodal ( visual and linguistic ) and structural kg representations. -character-level nodes have special tags where position-of-character and pos tags are combined. -we propose a hybrid learning approach for such systems using endto-end trainable neural network model. -scarton and specia propose a number of discourse-informed features in order to predict bleu and ter at document level. -we present a context-sensitive chart pruning method for cky-style mt decoding. -let math-w-8-4-0-1 be two points classified into math-w-8-4-0-12. -since the latent meanings are included in the vocabulary, there is no extra embedding being generated. -this hierarchy includes the loss functions useful in both situations where we intend to apply mbr decoding. -cherry and lin show that introducing soft syntactic constraints through discriminative training can improve alignment quality. -for example, “ appetite on 10 ”, “ my appetite way up ” should be mapped to ‘ increased appetite ’, while “ suppressed appetite ” should be mapped to ‘ loss of appetite ’. -we used the lazy decoder program, which is based on the kenlm language model estimation and querying system. -we present hyp, an open-source toolkit for the representation, manipulation, and optimization of weighted directed hypergraphs. -in smt, maximum entropy-based reordering model is often introduced as a better alternative to the commonly used lexicalized one. -we use a stochastic gradient descent algorithm and adadelta to train each model. -we ran the alignment algorithm from on a chinese-english parallel corpus of 218 million english words, available from the linguistic data consortium. -we chose the three models that achieved at least one best score in the closed tests from emerson, as well as the sub-word-based model of zhang et al for comparison. -logic formulas are combined in a probabilistic framework to model soft constraints. -in english event detection task, our approach achieved 73. 4 % f-score with average 3. 0 % absolute improvement compared to state-of-the-art. -word embeddings are based on low-dimension vectors representing the features of the words, captured in context. -entity linking ( el ) has received considerable attention in recent years. -goldwater and mcclosky proposed a morpheme aware word alignment model for language pairs in which the source language words correspond to only one morpheme. -msc is a text-to-text generation process in which a novel sentence is produced as a result of summarizing a set of similar sentences originally called sentence fusion. -recently, several successful attempts have been made at using supervised machine learning for word alignment. -all mt systems are trained using the alignment template model of och and ney. -in addition, we have demonstrated a way to intuitively interpret the model. -we use label propagation to determine the relation and observation type expressed by each pattern. -the text corpus was lemmatized using the treetagger and parsed for syntactic dependency structures with parzu. -chapman et al created a simple regular expression algorithm called negex that can detect phrases indicating negation and identify medical terms falling within the negative scope. -similar work on solving domain adaptation for smt by mining unseen words has been presented by snover et al and daum and jagarlamudi. -the literature consists of a series of well-established frameworks to explore a deeper understanding of the semantic relationship between entities, ranging from ontological reasoning to compositional as well as distributional semantics ( cite-p-13-1-2 ). -yamada and matsumoto, 2003, make use of polynomial kernels of degree 2 which is equivalent to using even more conjunctive features. -we propose to use constraints as a way to guide semi-supervised learning. -kappa coefficient is commonly used as a standard to reflect inter-annotator agreement. -we present a neural network based shift-reduce ccg parser, the first neural network based parser for ccg. -in addition to the primary model, we propose an ensemble method to achieve a stable and credible accuracy. -cite-p-21-1-3 proposed to build a sentiment lexicon by a propagation method. -we parse the english side of our parallel corpus with the berkeley parser, and tune parameters of the mt system with mira. -the annotated hindi treebank is based on a dependency framework and has a very rich set of dependency labels. -we presented a method using the existing rbmt system as a black box to produce synthetic bilingual corpus, which was used as training data for the smt system. -these methods are shown to significantly reduce the training time and significantly improve performance, both in terms of perplexity and on a large-scale translation task. -as in the previous methods, we avoid the danger of aligning a token in one segment to excessive numbers of tokens in the other segment, by adopting a variant of competitive linking by melamed. -it does not rely on the availability of an adjective classification scheme and uses wordnet antonym and synonym lists instead. -in the above simulation, only a fraction of nodes were updated at each iteration in order to model a rapid change. -in our approach, parameters are calibrated for each relation by maximizing the likelihood of our generative model. -word co-occurrence frequencies are based on fixed windows spanning in both directions from the focus word. -word embeddings were obtained using word2vec, which represents each word as a 300-dimensional vector. -alikaniotis et al used score-specific word embeddings for word embeddings. -louds succinctly represents a trie with math-w-1-1-0-40 nodes as a 2m + 1 bit string. -we evaluate our parser on rst discourse treebank and thoroughly analyze different components of our method. -silberer and frank use an entity-based coreference resolution model to automatically extend the training set. -the problem of correct identification of nes is specifically addressed and benchmarked by the developers of information extraction system, such as the gate system. -as our decoder accounts for multiple derivations, we extend the mert algorithm to tune feature weights with respect to bleu score. -in this work, we present an exploration of automatic ner of code-mixed data. -upadhyay et al compared empirically some of the most recent development on cross-lingual models of word embeddings. -table 1 shows the comparison of srilm and randlm with respect to performance on bleu and ter. -we use the revised d-level sentence complexity scale as the basis of our syntactic complexity measure. -the corpus has been automatically annotated with full syntactic dependency trees by the alpino parser for dutch. -the enju parser is a deep parser based on the hpsg formalism. -jiang and zhou use a phrase-based smt approach to generate chinese couplets. -this algorithm can be used to extract topic hierarchies from large document collections. -further experiment shows that the obtained subtree alignment benefits both phrase and syntax based mt systems by delivering more weight on syntactic phrases. -to pre-order the chinese sentences using the syntax-based reordering method proposed by, we use the berkeley parser. -we present a component for incremental speech synthesis ( iss ) and a set of applications that demonstrate its capabilities. -it is based on 5-grams with extended kneser-ney smoothing. -in this paper, we present a novel endto-end neural network framework for extractive document summarization by jointly learning to score and select sentences. -1 a construct is a set of knowledge, skills, and abilities measured by a test. -we use the moses toolkit to train a phrase-based smt system on the xinhua portion of the gigaword corpus. -in this paper, we propose a cross-lingual mixture model ( clmm ) for cross-lingual sentiment classification. -machine learning techniques, and particularly reinforcement learning, have recently received great interest in research on dialogue management. -we hold a view of structuralist linguistics and study the impact of paradigmatic and syntagmatic lexical relations on chinese pos tagging. -by using a japanese grammar based on a monostratal theory of grammar we could simultaneously annotate syntactic and semantic structure without overburdening the annota-tor. -our test sets are the conll 2014 evaluation set and the jfleg test set. -ando and zhang presented a semi-supervised learning algorithm named alternating structure optimization for text chunking. -experimental results on two public data sets indicate that matching models get significant improvements when they are learned with the proposed method. -recently, deep learning-based sequential models of sentence, such as recurrent neural network, have proved to be effective in dealing with the non-sequential properties of human language. -at a final stage, the pre-trained parameters of the network are used to initialize the model which is then trained on the supervised training data from semeval-2015. -we then incorporate this model into a global, efficient branch-and-bound search through the space of permutations. -we think this is a significant contribution since students or professors can use features as a feedback for better understanding essays writing. -for creating our folds, we used stratified cross-validation, which aims to ensure that the proportion of classes within each partition is equal. -we learn a distance metric for each category node, and measure entity-context similarity under the aggregated metrics of all relevant categories. -jerl outperforms the state-of-art systems on both ner and linking tasks on the conll 03/aida data set. -the preprocessing phase comprises treatment of emoticon, slang terms, lemmatization and pos-tagging. -another important aspect of our approach is a two-pronged strategy that handles event narratives differently from other documents. -a. of course, it was situated behind a big neu but unobtrusive painting neu. -we also show that our approach outperforms the best performing fully corpus-based ble methods on these test sets. -our framework combines the strengths of 6 approaches that had previously been applied to 3 different tasks ( keyword extraction, multi-sentence compression, and summarization ) into a unified, fully unsupervised endto-end summarization framework, and introduces some novel components. -in this paper, we explore the syntactic features of convolution tree kernels for relation extraction. -applications of our method include topic detection, event tracking, story/topic monitoring, new-event detection, summarization, information filtering, etc. -we consider a semi-supervised setting for domain adaptation where only unlabeled data is available for the target domain. -the method could be used both in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where a handful of seeds is used to define the two polarity classes. -recently, in the nlp research field, an increasing amount of effort has been made on structural event detection in spontaneous speech. -left-corner ( lc ) parsing is a natural language processing ( nlp ) task. -in, the sfst-based model is compared with support vector machines and conditional random fields. -there is still a gap to the discriminative re-scoring methods. -sahami et al measure semantic similarity between two queries using the snippets returned for those queries by a search engine. -communication accommodation theory states that people use nonverbal feedback to establish social distance during conversation. -in this paper, we propose a model for relation extraction from qna data, which is capable of predicting relations between entities mentioned in question and answer sentences. -over the last few years, several large scale knowledge bases such as freebase, nell, and yago have been developed. -wordnet is limited for entailment rule generation. -in this paper, we adopt the same paradigm pursued in cite-p-12-1-11, but apply it to an exact inference cyk parser ( cite-p-12-1-2, cite-p-12-3-1, cite-p-12-1-8 ). -topic modeling algorithms such as latent dirichlet allocation and non negative matrix factorization are able to find the topics within a document collection. -we replicate a recent large-scale evaluation that relied on, what we now know to be, suboptimal rouge variants revealing distinct conclusions about the relative performance of state-of-the-art summarization systems. -the phonological processing of such as primary stress and higher pitch have been well noted in the literature, culicover and rochemont among others ). -there has been a growing awareness of japanese mwe problems ( cite-p-17-1-0 ). -a more flexible direction is grounded language acquisition : learning the meaning of sentences in the context of an observed world state. -we demonstrate the effectiveness of our approach in the context of one form of unbalanced task : annotation of transcribed human-human dialogues for presence/absence of uncertainty. -we present both theoretical and empirical results concerning the correctness and efficiency of these algorithms. -in this paper, we follow a different strategy, arguing that a much simpler inference strategy suffices. -we use the moses package, which uses a phrase-based approach by combining a translation model and a language model to generate paraphrases. -this paper proposes a knowledge-based method, called structural semantic relatedness ( ssr ), which can enhance the named entity disambiguation by capturing and leveraging the structural semantic knowledge in multiple knowledge sources. -in terms of active learning, lewis and gale discussed the use of virtual examples in text classification. -although the markov chains are efficient at encoding local word interactions, the n-gram model clearly ignores the rich syntactic and semantic structures that constrain natural languages. -in section 7, i argue that the context-dependent feature of the analysis does not add extra complexity to my treatment of time-dependent expressions, but is needed for purposes of discourse understanding. -the newer method of latent semantic indexing 1 is a variant of the vsm in which documents are represented in a lower dimensional space created from the input training dataset. -they extended a semi-supervised structured conditional model to the dependency parsing problem and combined their method with the approach of koo et al. -we describe a long-term crowdsourced effort to have the sentences labeled by arabic speakers for the level of dialect in each sentence and the dialect itself. -in this paper, we propose novel convolutional architectures to dynamically encode the relevant information in the source language. -all linear svm models were implemented with scikit-learn and trained and tested using liblinear backend. -neural language models based on recurrent neural networks and sequence-tosequence architectures have revolutionized the nlp world. -backtranslation was adapted to train a translation system in a true translation setting based on monolingual corpora. -in this paper, we propose a framework for working with predictive opinion. -our empirical results show that our decoding framework is effective, and can lead to substantial improvements in translations, especially in situations where greedy search and beam search are not feasible. -the hal model provides an informative infrastructure for the cip to induce semantic patterns from the unannotated psychiatry web corpora. -woodsend and lapata, 2012, used ilp to jointly optimize different aspects including content selection, surface realization, and rewrite rules in summarization. -coreference resolution is a common problem in the analysis of coreference resolution approaches. -in this paper, we propose a transfer learning-based cross-lingual knowledge extraction framework called wikicike. -rnn can model the whole sequence and capture long-term dependencies ( cite-p-18-1-3 ). -to solve this problem, we propose an approach to exploit non-local information. -recently, proposed two particular models, skipgram and cbow, to learn word representations in large amounts of text data. -inference rules for predicates have been identified as an important component in semantic applications, such as question answering and information extraction. -lmf scales linearly in the number of modalities. -we apply the stanford parser to the definition of a page in order to extract all the dependency relations of the sentence. -yamada and knight further extended the model to a syntax-to-string translation model. -we present a greedy document partitioning technique for the task. -prototype information is then propagated to other words based on distributional similarity. -for subtask b, we propose two novel methods to improve semantic similarity estimation between question-question pair by integrating the rank information of question-comment pair. -pugs extend unification grammars with an explicit control of the saturation of structures by attributing a polarity to each object. -unlike previous studies, we show that query expansion using only manually created lexical resources can significantly improve the retrieval performance. -we use maximum entropy model to design the basic classifier used in active learning for wsd and tc tasks. -this problem can be solved in polynomial time, using eg, the hungarian algorithm. -we propose to use a generative adversarial network that consists of a generator g and a discriminator d. -our results clearly indicate that training on the created webis-debate-16 corpus yields the most robust cross-domain classifier. -pang et al, turney, we are interested in fine-grained subjectivity analysis, which is concerned with subjectivity at the phrase or clause level. -this representation was used successfully for addressing the sts task with purely string-based approaches. -semantic role labeling was first defined in gildea and jurafsky. -although there are several well-known spectral clustering algorithms in the literature, meil and shi, kannan et al, we adopt the one proposed by ng et al, as it is arguably the most widely used. -for the evaluation, we used bleu, meteor and chrf metrics. -in this paper, we explore a joint multilingual semantic relatedness metric, which aggregates semantic relatedness scores measured on several different languages. -as content words, we considered nouns, adjectives, adverbs, and verbs, based on the part-of-speech output of the lets preprocess toolkit. -for both languages, english and spanish, we achieved the best results of all participants ( value f1 ). -the memory consumption is based on the word embedding layer. -conditional random fields is a discriminative model that estimates joint distribution pover the target sequence y, conditioned on the observed sequence x. -in this paper, we present an efficient method for detecting and disambiguating coordinate structures. -in the contextual polarity disambiguation subtask, we use a sentiment lexicon approach combined with polarity shift detection and tree kernel based classifiers. -veale and hao, however, did not evaluate to what extent their knowledge base of talking points and the associated reasoning framework are useful to interpret metaphorical expressions occurring in text. -in practice, we set all weights j to 1, and employ adam for optimization. -in this work, we adopt the max-margin objective. -abandah et al used a recurrent neural network to transcribe undiacritized arabic text with fully diacritized sentences. -applying our method in a setting where all labeled examples are available also shows improvements over state-of-the-art supervised methods. -recent progress in natural language understanding shows that pre-training transformer decoders on language modelling tasks leads to remarkable transferable knowledge which boosts performance on a wide range of nlp tasks. -we present experiments aiming at an automatic classification of spanish verbs into lexical semantic classes. -in human evaluation, hisan outperformed the baseline methods. -this paper describes ongoing work on a new approach to dialogue management which attempts to fill this gap. -to the best of our knowledge, there has been no exact measure for the optimization, and the usefulness of a given resource can only be assessed when it is finished and used in applications. -our models are also validated on the more difficult wmt ’ 14 englishto-german task. -the result of applying the no-covered-roots restriction alone is equivalent to the arc-eager parser by sagae and tsujii. -syllabic units, however, rival the performance of morphemes in the kws task. -in this paper, we present a comparison of both algorithms. -a novel agent-aware dropout deep q-network ( aad-dqn ) is proposed to address the problem of when to consult the teacher and how to learn from the teacher s experiences. -we use the svm-light-tk toolkit to build the hybrid kernels. -in this paper, we investigate unsupervised lm adaptation using clustering and lda based topic analysis. -we trained the rerankers using svm-light-tk 6, which enables the use of structural kernels in svm-light. -for example, nishigauchi and watanabe claimed that there were island constraints in japanese, but ishihara and sprouse et al mentioned that this language had no island constraint. -in this paper, we address the problem of prompt adaptation using multi-task learning. -conventional methods to measure the relevance between two arguments include bilinear model, and single layer neural networks, etc. -knight and graehl, 1997, describe a back transliteration system for japanese. -we used the data provided by the second sighan bakeoff segmentation model. -we use three data sampling approaches to solve the problem of data skewness. -koehn and knight tested this idea on a larger test set consisting of the 1000 most frequent words from a german-english lexicon. -we present h eady, an abstractive headline generation system based on the generalization of syntactic patterns by means of a noisy-or bayesian network. -a zero pronoun is a gap in the sentence, which refers to the component that is omitted because of the coherence of language. -we propose a sense-aware neural model to address this challenging task. -following koehn and knowles, we process all the data with byte-pair encoding to construct a vocabulary of 50k subwords. -since sud data are often expensive to obtain at a large scale, to maximize system performance, we focus on methods that employ unsupervised feature learning to take advantage of a large amount of unsupervised social media data. -in this paper, we present a method to extract implicit interpretations from modal constructions. -sac uses the attention scheme to automatically select appropriate senses for context words according to the target word. -to test this hypothesis, we combined an incremental tag parser with an incremental semantic role labeler. -shriberg and stolcke studied the location and distribution of repairs in the switchboard corpus, the primary corpus for speech disfluency research, but did not propose an actual model of repairs. -experimental results show that our method can significantly improve machine translation performance on both iwslt and nist data, compared with a state-of-the-art baseline. -we present a method for cross-formalism transfer in parsing. -collocation, context-words and neighboring sentence sentiment are effective in sentiment adjectives disambiguation. -ibm constraints, lexical word reordering model, and inversion transduction grammar constraints belong to this type of approach. -in this work, we aim to summarize the student responses. -we use the classifieds data provided by grenager et al and compare with results reported by crr07 and mm08 for both supervised and semi-supervised learning. -we also observe that our model beats the standard lstm in terms of accuracy. -ian interactively learns the coarse-grained attentions between the context and aspect, and concatenates the vectors for prediction. -the sentences were dependencyparsed with cabocha, and cooccurrence samples of event mentions were extracted. -we use minimum error rate training to train a 5-gram language model with modified kneser-ney smoothing on the target-side training corpus. -however, the search space in mt can be quite large. -by contrast, we incorporate source syntax into a stringto-tree model. -somasundaran et al developed a scheme for annotating sentiment and arguing expressions in meetings. -greedy transition-based dependency parsers incrementally process an input sentence from left to right. -on qa-tempeval ( semeval2015 task 5 ), our proposed technique outperforms state-of-the-art methods by a large margin. -the csj is the largest spontaneous speech corpus in the world, consisting of roughly 7m words with the total speech length of 700 hours, and is a collection of monologues such as academic presentations and simulated public speeches. -in particular, when linked to wikipedia articles, the task is called wikifiation. -other approaches rely on statistical language models to determine the most likely substitutes to represent contexts. -however, there is a lack of training data annotated with fine-grained quality information. -wizard-of-oz frameworks have been used in several studies since in order to collect human-computer dialogue data to help design dialogue systems. -emotion cause extraction aims to identify the reasons behind a certain emotion expressed in text. -baseline is a phrase-based translation system which consists of training data comprising a bilingual dataset without preordering. -relations have been shown to be very effective knowledge sources for wsd and interpretation of noun sequences. -we calculate statistical significance of performance differences using stratified shuffling. -hierarchical phrase-based translation ( hiero, ( chiang, 2005 ) ) has proven to be a very useful compromise between syntactically informed and purely corpus-driven translation. -the language model was trained on the target side of the parallel data using the srilm toolkit with modified kneser-ney smoothing. -we used the r igraph toolkit to train a graph building model with modified k-cores. -for example, sentences such as ° bake for 50 minutes do not explicitly mention what to bake. -g lossy ’ s extractions have proven useful as seed definitions in an unsupervised wsd task. -in this paper, we propose a novel semantic framework for jointly capturing the meaning of comparison and ellipsis constructions. -user : i was cleaning out my account when i acciden-french-french-french-french-french-french-french-french-french-french-french-french-french-french-french-french-french-french-french-french-french-french-french-french -in this paper, we propose a method to identify the target of a driu for conversational agents in action control dialogue. -we use the explicit semantic analysis based on wikipedia to compute semantic relatedness between concepts. -yih et al used convolutional neural networks to answer single-relation questions. -swier and stevenson induce role labels with a bootstrapping scheme in which the set of labeled instances is iteratively expanded using a classifier trained on previously labeled instances. -lin et al defined a search goal as an action-entity pair and used a web trigram to identify fine-grained search goals. -the experiments illustrate the different effect of four feature types including direct lexical matching, idf-weighted lexical matching, modified bleu n-gram matching and named entities matching. -tree substitution grammar ( tsg ) is a promising formalism for modeling language data. -in this experiment, we only use sentiment related words as features to represent opinion documents, instead of using all words. -although this work represents the first formal study of relationship questions that we are aware of, by no means are we claiming a solution awe see this as merely the first step in addressing a complex problem. -it scored higher than a version of hobbs'algorithm that we implemented for slot grammar. -this metric uses the stanford dependency parser. -we used a categorical cross entropy loss function and adam optimizer and trained the model for 10 epochs. -affective text shared task on news headlines at semeval 2007 for emotion and valence level identification has drawn the focus to this field. -semantics plays indeed a role in coreference resolution. -we propose a two-step training method, which benefit from both large-scale pseudo training data and task-specific data, showing promising performance. -in, the authors romanized chinese nes and selected their english transliterations from english nes extracted from the web by comparing their phonetic similarities with chinese nes. -we use the stanford pos tagger. -a tree transformation is sensible if the size of each output tree is uniformly bounded by a linear function in the size of the corresponding input tree. -as a base parser we use desr, a shift-reduce parser described in. -for each task, we created labeled data from english, arabic, and spanish tweets. -a kg is a multi-relational directed graph composed of entities as nodes and relations as edges. -on a large-scale freebase+clueweb prediction task, we achieve 25 % error reduction, and a 53 % error reduction on sparse relations. -unlike grconv and adasent, our model uses full binary tree as the topological structure. -mistakes made by non-native speakers are systematic and depend on the first language of the writer. -katz and giesbrecht and baldwin et al use latent semantic analysis for this purpose. -donaway et al used exhaustive search to generate all three sentences extracts to evaluate different evaluation metrics. -the novelty of our work is the proposal of a method to automatically extract persuasive argumentation features from political debates by means of the use of semantic frames as pivoting features. -besides, our model can be easily generalized and applied to other sequence labeling tasks. -conditional random fields are undirected graphical models trained to maximize a conditional probability of random variables and y, and the concept is well established for sequential labeling problem. -our parser is trained by combining a syntactic parsing task with a distantly-supervised relation extraction task. -in this paper, we propose a new approach to language modeling which uses discriminative learning methods. -with both automatic and human evaluations, the results show that the proposed method effectively balances between adequacy and dissimilarity. -multitask learning models have been proven very useful for several nlp tasks and applications. -a comparable corpus consists of documents in two or more languages or varieties which are not translation of each other and deal with similar topics. -clarke and lapata used ilp in decoding, making it convenient to add constraints to preserve grammatical structure. -our experimental results showed that the graph cut based method achieved competitive performance compared to ilp, while about 100 times faster. -argument mining is a fundamental task in natural language processing ( nlp ). -machine comprehension of text is the central goal of nlp. -in the tagging scheme for such languages, a complete pos tag is formed by combining tags from multiple tag sets defined for each morphosyntactic category. -aue and gamon attempted to solve the problem of the absence of large amounts of labeled data by customizing sentiment classifiers to new domains using training data from other domains. -we use the wapiti toolkit to train a 5-gram language model on the xinhua portion of the gigaword corpus. -hashtags are widely used on twitter. -we used the openfst implementation of the wfsa model. -patty used sequence mining algorithms for gathering a general class of relational phrases, organizing them into synsets, and inducing lexical type signatures. -it is a specific kind of generalized linear model where its function is the logit function and the dependent variable y is a binary or dichotomic variable which has a bernoulli distribution. -we employ a method proposed by neubig et al, which uses parametric bayesian inference with the phrasal itgs. -topic signatures are weighted topical vectors that are associated with senses or concepts. -klein and manning presented another approach focusing on constituent sequences called the constituent-text model. -our system s best result ranked 35 among 73 submitted runs with 0.7189 average pearson correlations over five test sets. -the ef cambridge open language database is an english l2 corpus that was released in 2013 and used for nli in. -when training over 10,000 features on a modest amount of data, we, like watanabe et al, did observe overfitting, yet saw improvements on new data. -we used pre-trained glove word embeddings. -a comparable corpus consists of documents in two or more languages or varieties which are not translation of each other and deal with similar topics. -we use the same training, development and out-of-domain test set as provided in the conll 2009 shared task. -a good survey of the state of the art is available in ( cite-p-20-12-0 ). -we used the nave bayes implementation in the weka machine learning toolkit, a support vector machine, and the crf implementation in mallet. -in this study, we explore the feasibility of controlling human perception of traits using automated methods. -in this study, we focus on the problem of community-related event detection by community emotions. -we obtained better results by using case frames constructed from larger corpora ; the performance was not saturated even with a corpus size of 100 billion words. -grosz, joshi, and weinstein state that cf may be ordered using different factors, but they only use information about grammatical roles. -react achieves an accuracy of 92 % for the onand off-topic classification task and an f 1 -measure of 72 % for the semantic annotation. -we used the lexrank graph-based algorithm with the ner step. -both were initialised by uniformly sampling values from the symmetric interval suggested by glorot and bengio. -we split the words into sub-words using joint bpe with 32, 000 merge operations. -further insights may be available from the finer-grained data available in the preposition disambiguation task. -however, the research community is also aware of the deficiencies of these metrics. -the key component is the so-called alignment model, which makes sure the embeddings of entities, relations, and words are in the same space. -we implemented a version of mira from crammer and singer, which we used for regression. -in section 2, we provide some background and review previous work on graph-based dependency parsing for mono- and cross-lingual settings. -we use the mdl-based tree cut model to identify question topic and question focus automatically. -in our dataset, we additionally provide the most similar training questions for each challenge question. -in the following, we heavily rely on the work of clarke and lapata, who develop an approach based on ilp for monolingual sentence compression. -therefore, it is not suitable to exploit existing word-based models to translate this set of languages. -transliteration is a key building block for multilingual and cross-lingual nlp since it is essential for ( i ) handling of names in applications like machine translation ( mt ) and cross-lingual information retrieval ( clir ), and ( ii ) user-friendly input methods. -while antonymy is defined as the oppositeness between words, synonymy refers to words that are similar in meaning ( cite-p-11-1-2, cite-p-11-1-10 ). -learning is efficiently parallelized by splitting training data among shards and by merging parameters in each round ( cite-p-14-3-10 ). -we evaluate our methodology using intrinsic and extrinsic measures. -it is also observed that rcm can automatically measure the knowledge levels of words. -in this paper, we model discussions in online political blogs. -in the utterances were further processed with the porter stemmer in the nltk package. -to this end, we use automatically word aligned bitext between the source and target language pair, and learn a discriminative conditional random field model on the target side. -this paper describes our participation in the task denominated cross-lingual textual entailment ( clte ) for content synchronization. -we use the srilm toolkit to train a 5-gram language model with modified kneser-ney smoothing on the target-side training corpus. -supervised methods include hidden markov model, maximum entropy, conditional random fields, and support vector machines. -we use a conditional random field formalism to learn a model from labeled training data that can be applied to unseen data. -in this work, we represent document with convolutional-gated recurrent neural network, which adaptively encodes semantics of sentences and their relations. -our results experimentally confirm the theoretical assumption that a sufficiently detailed lexicon provides enough information to reliably predict the aspectual value of verbs across their readings. -it consists of two parts : multi-channel cnn, and lstm. -kendall ’ s math-w-11-5-2-1 can be easily used to evaluate the output of automatic systems, regardless of the domain or application at hand. -in contrast, we present an automatic approach that infers the general connotation of words. -we have shown that these paraphrases can be used to obtain high precision extraction patterns for information extraction. -one uses confusion networks formed along a skeleton sentence to combine translation systems as described in and. -a cnn is a feedforward network with convolution layers interleaved with pooling layers. -english-german, english-french and chinese-to-english translation tasks. -we achieved a macro f1 score of 32. 73 % for the english data and 17.98 % for the spanish data. -all models used the sbieon encoding to support the recognition of non-continuous entities. -such a model is extended from a graph-based model for dependency parsing. -we use the unigrams and bigrams to represent lexical features, and the stanford part-of-speech tagger to extract the lexicalized named entity and part-of-speech features. -not surprisingly, the ensemble system performs the best, obtaining a weighted pearson correlation of 0.738. -zhang and clark used both character and word-based decoding. -a 5-gram language model was trained on the target side of the parallel data using the srilm toolkit. -experimental results show that our model yields high quality poems compared to the state of the art. -mikolov et al use different dimensions of word embeddings for the source language and the target language to achieve the best translation quality. -as their published state-of-the-art result described in, their attention-based model is based on word-level embeddings. -wang et al 2016 built on this framework and introduced attention mechanism for generating these sentential features. -murphy, liu, sun, and wu, topkara, topkara, and atallah, meral et al, murphy and vogel, and meral et al all belong to this syntactic transformation category. -ccgbank was created by semiautomatically converting the penn treebank to ccg derivations. -previous work that generate surveys of scientific topics uses the text of citation sentences alone. -in our example, extracted following the head-finding strategy by yamada and matsumoto, while feature is a boolean feature that indicates for each token if it is the main verb in the sentence or not. -the proposed method can also be used in the traditional within-domain problem with some simplifications. -in experiments using the chinese treebank ( ctb ), we show that the accuracies of the three tasks can be improved significantly over the baseline models, particularly by 0. 6 % for pos tagging and 0. 4 % for dependency parsing. -we use the europarl parallel corpus as the basis for our small-scale cross-lingual experiments. -urdu is the national language of pakistan and is one of the official languages of india. -hasegawa et al tried to extract multiple relations by choosing entity types. -in this paper, we present an initial set of experiments on englishto-arabic smt. -our models are similar to several other approaches. -bwe of 512 dimensions were obtained using word embeddings trained with fasttext 9 and aligned in the same space using unsupervised vecmap 10 for this induction. -semi-supervised learning ( ssl ) is a machine learning ( ml ) approach that uses large amounts of unlabeled data, combined with a smaller amount of labeled data, to learn a target function ( cite-p-23-1-15, cite-p-23-1-2 ). \ No newline at end of file