nianlonggu
init
02ae0bf
raw
history blame
No virus
126 kB
<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 /home/scieditor/grobid-0.6.1/grobid-home/schemas/xsd/Grobid.xsd"
xmlns:xlink="http://www.w3.org/1999/xlink">
<teiHeader xml:lang="en">
<fileDesc>
<titleStmt>
<title level="a" type="main">SPECTER: Document-level Representation Learning using Citation-informed Transformers</title>
</titleStmt>
<publicationStmt>
<publisher/>
<availability status="unknown"><licence/></availability>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Arman</forename><surname>Cohan</surname></persName>
<email>armanc@allenai.org</email>
<affiliation key="aff0">
<orgName type="institution" key="instit1">Allen Institute for Artificial Intelligence ‡ Paul G. Allen School of Computer Science &amp; Engineering</orgName>
<orgName type="institution" key="instit2">University of Washington</orgName>
</affiliation>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Sergey</forename><surname>Feldman</surname></persName>
<email>sergey@allenai.org</email>
<affiliation key="aff0">
<orgName type="institution" key="instit1">Allen Institute for Artificial Intelligence ‡ Paul G. Allen School of Computer Science &amp; Engineering</orgName>
<orgName type="institution" key="instit2">University of Washington</orgName>
</affiliation>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Iz</forename><surname>Beltagy</surname></persName>
<email>beltagy@allenai.org</email>
<affiliation key="aff0">
<orgName type="institution" key="instit1">Allen Institute for Artificial Intelligence ‡ Paul G. Allen School of Computer Science &amp; Engineering</orgName>
<orgName type="institution" key="instit2">University of Washington</orgName>
</affiliation>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Doug</forename><surname>Downey</surname></persName>
<email>dougd@allenai.org</email>
<affiliation key="aff0">
<orgName type="institution" key="instit1">Allen Institute for Artificial Intelligence ‡ Paul G. Allen School of Computer Science &amp; Engineering</orgName>
<orgName type="institution" key="instit2">University of Washington</orgName>
</affiliation>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Daniel</forename><forename type="middle">S</forename><surname>Weld</surname></persName>
<affiliation key="aff0">
<orgName type="institution" key="instit1">Allen Institute for Artificial Intelligence ‡ Paul G. Allen School of Computer Science &amp; Engineering</orgName>
<orgName type="institution" key="instit2">University of Washington</orgName>
</affiliation>
</author>
<author>
<affiliation key="aff1">
<orgName type="department">Introduction</orgName>
</affiliation>
</author>
<title level="a" type="main">SPECTER: Document-level Representation Learning using Citation-informed Transformers</title>
</analytic>
<monogr>
<imprint>
<date/>
</imprint>
</monogr>
</biblStruct>
</sourceDesc>
</fileDesc>
<encodingDesc>
<appInfo>
<application version="0.6.1" ident="GROBID" when="2022-06-22T18:48+0000">
<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
<ref target="https://github.com/kermitt2/grobid"/>
</application>
</appInfo>
</encodingDesc>
<profileDesc>
<abstract>
<p>Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are targeted towards token-and sentence-level training objectives and do not leverage information on inter-document relatedness, which limits their document-level representation power. For applications on scientific documents, such as classification and recommendation, the embeddings power strong performance on end tasks. We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph. Unlike existing pretrained language models, SPECTER can be easily applied to downstream applications without task-specific fine-tuning. Additionally, to encourage further research on document-level models, we introduce SCIDOCS, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction, to document classification and recommendation. We show that SPECTER outperforms a variety of competitive baselines on the benchmark. 1</p>
</abstract>
</profileDesc>
</teiHeader>
<text xml:lang="en">
<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>As the pace of scientific publication continues to increase, Natural Language Processing (NLP) tools that help users to search, discover and understand the scientific literature have become critical. In recent years, substantial improvements in NLP tools have been brought about by pretrained neural language models (LMs) <ref type="bibr" target="#b38">(Radford et al., 2018;</ref><ref type="bibr" target="#b11">Devlin et al., 2019;</ref>. While such models are widely used for representing individual words or sentences, extensions to whole-document embeddings are relatively underexplored. Likewise, methods that do use inter-document signals to produce whole-document embeddings <ref type="bibr" target="#b46">(Tu et al., 2017;</ref>) have yet to incorporate stateof-the-art pretrained LMs. Here, we study how to leverage the power of pretrained language models to learn embeddings for scientific documents.</p><p>A paper's title and abstract provide rich semantic content about the paper, but, as we show in this work, simply passing these textual fields to an "off-the-shelf" pretrained language model-even a state-of-the-art model tailored to scientific text like the recent SciBERT <ref type="bibr" target="#b3">(Beltagy et al., 2019)</ref>-does not result in accurate paper representations. The language modeling objectives used to pretrain the model do not lead it to output representations that are helpful for document-level tasks such as topic classification or recommendation.</p><p>In this paper, we introduce a new method for learning general-purpose vector representations of scientific documents. Our system, SPECTER, 2 incorporates inter-document context into the Transformer <ref type="bibr" target="#b47">(Vaswani et al., 2017)</ref> language models (e.g., SciBERT <ref type="bibr" target="#b3">(Beltagy et al., 2019)</ref>) to learn document representations that are effective across a wide-variety of downstream tasks, without the need for any task-specific fine-tuning of the pretrained language model. We specifically use citations as a naturally occurring, inter-document incidental supervision signal indicating which documents are most related and formulate the signal into a triplet-loss pretraining objective. Unlike many prior works, at inference time, our model does not require any citation information. This is critical for embedding new papers that have not yet been cited. In experiments, we show that SPECTER's representations substantially outperform the state-of-the-art on a variety of document-level tasks, including topic classification, citation prediction, and recommendation.</p><p>As an additional contribution of this work, we introduce and release SCIDOCS 3 , a novel collection of data sets and an evaluation suite for documentlevel embeddings in the scientific domain. SCI-DOCS covers seven tasks, and includes tens of thousands of examples of anonymized user signals of document relatedness. We also release our training set (hundreds of thousands of paper titles, abstracts and citations), along with our trained embedding model and its associated code base.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Model 2.1 Overview</head><p>Our goal is to learn task-independent representations of academic papers. Inspired by the recent success of pretrained Transformer language models across various NLP tasks, we use the Transformer model architecture as basis of encoding the input paper. Existing LMs such as BERT, however, are primarily based on masked language modeling objective, only considering intra-document context and do not use any inter-document information. This limits their ability to learn optimal document representations. To learn high-quality documentlevel representations we propose using citations as an inter-document relatedness signal and formulate it as a triplet loss learning objective. We then pretrain the model on a large corpus of citations using this objective, encouraging it to output representations that are more similar for papers that share a citation link than for those that do not. We call our model SPECTER, which learns Scientific Paper Embeddings using Citation-informed Trans-formERs. With respect to the terminology used by <ref type="bibr" target="#b11">Devlin et al. (2019)</ref>, unlike most existing LMs that are "fine-tuning based", our approach results in embeddings that can be applied to downstream tasks in a "feature-based" fashion, meaning the learned paper embeddings can be easily used as features, with no need for further task-specific fine-tuning. In the following, as background information, we briefly describe how pretrained LMs can be applied for document representation and then discuss the details of SPECTER.</p><p>3 https://github.com/allenai/scidocs Transformer (initialized with SciBERT) Related paper (P + ) Query paper (P Q ) Unrelated paper (P − )</p><formula xml:id="formula_0">Triplet loss =max d P Q , P + − d P Q , P − + m , 0</formula><p>Figure 1: Overview of SPECTER.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Background: Pretrained Transformers</head><p>Recently, pretrained Transformer networks have demonstrated success on various NLP tasks <ref type="bibr" target="#b38">(Radford et al., 2018;</ref><ref type="bibr" target="#b11">Devlin et al., 2019;</ref><ref type="bibr" target="#b33">Liu et al., 2019)</ref>; we use these models as the foundation for SPECTER. Specifically, we use SciBERT <ref type="bibr" target="#b3">(Beltagy et al., 2019)</ref> which is an adaptation of the original BERT <ref type="bibr" target="#b11">(Devlin et al., 2019)</ref> architecture to the scientific domain. The BERT model architecture <ref type="bibr" target="#b11">(Devlin et al., 2019)</ref> uses multiple layers of Transformers <ref type="bibr" target="#b47">(Vaswani et al., 2017)</ref> to encode the tokens in a given input sequence. Each layer consists of a self-attention sublayer followed by a feedforward sublayer. The final hidden state associated with the special [CLS] token is usually called the "pooled output", and is commonly used as an aggregate representation of the sequence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Document Representation</head><p>Our goal is to represent a given paper P as a dense vector v that best represents the paper and can be used in downstream tasks. SPECTER builds embeddings from the title and abstract of a paper. Intuitively, we would expect these fields to be sufficient to produce accurate embeddings, since they are written to provide a succinct and comprehensive summary of the paper. <ref type="bibr">4</ref> As such, we encode the concatenated title and abstract using a Transformer LM (e.g., SciBERT) and take the final representation of the [CLS] token as the output representation of the paper:</p><formula xml:id="formula_1">5 v = Transformer(input) [CLS] ,<label>(1)</label></formula><p>where Transformer is the Transformer's forward function, and input is the concatenation of the [CLS] token and WordPieces <ref type="bibr" target="#b52">(Wu et al., 2016)</ref> of the title and abstract of a paper, separated by the [SEP] token. We use SciBERT as our model initialization as it is optimized for scientific text, though our formulation is general and any Transformer language model instead of SciBERT. Using the above method with an "off-the-shelf" SciBERT does not take global inter-document information into account. This is because SciBERT, like other pretrained language models, is trained via language modeling objectives, which only predict words or sentences given their in-document, nearby textual context. In contrast, we propose to incorporate citations into the model as a signal of inter-document relatedness, while still leveraging the model's existing strength in modeling language.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Citation-Based Pretraining Objective</head><p>A citation from one document to another suggests that the documents are related. To encode this relatedness signal into our representations, we design a loss function that trains the Transformer model to learn closer representations for papers when one cites the other, and more distant representations otherwise. The high-level overview of the model is shown in <ref type="figure">Figure 1</ref>. In particular, each training instance is a triplet of papers: a query paper P Q , a positive paper P + and a negative paper P − . The positive paper is a paper that the query paper cites, and the negative paper is a paper that is not cited by the query paper (but that may be cited by P + ). We then train the model using the following triplet margin loss function:</p><formula xml:id="formula_2">L = max d P Q , P + − d P Q , P − + m , 0 (2)</formula><p>where d is a distance function and m is the loss margin hyperparameter (we empirically choose m = 1). Here, we use the L2 norm distance:</p><formula xml:id="formula_3">d(P A , P B ) = v A − v B 2 ,</formula><p>where v A is the vector corresponding to the pooled output of the Transformer run on paper A (Equation 1). <ref type="bibr">6</ref> Starting from the trained SciBERT model, we pretrain the Transformer parameters on the citation objective to learn paper representations that capture document relatedness.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Selecting Negative Distractors</head><p>The choice of negative example papers P − is important when training the model. We consider two sets of negative examples: the first set simply consists of randomly selected papers from the corpus.</p><p>Given a query paper, intuitively we would expect the model to be able to distinguish between cited papers, and uncited papers sampled randomly from the entire corpus. This inductive bias has been also found to be effective in content-based citation recommendation applications . But, random negatives may be easy for the model to distinguish from the positives. To provide a more nuanced training signal, we augment the randomly drawn negatives with a more challenging second set of negative examples. We denote as "hard negatives" the papers that are not cited by the query paper, but are cited by a paper cited by the query paper, i.e. if P 1 cite − − → P 2 and P 2 cite − − → P 3</p><p>but P 1 cite − − → P 3 , then P 3 is a candidate hard negative example for P 1 . We expect the hard negatives to be somewhat related to the query paper, but typically less related than the cited papers. As we show in our experiments ( §6), including hard negatives results in more accurate embeddings compared to using random negatives alone.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5">Inference</head><p>At inference time, the model receives one paper, P, and it outputs the SPECTER's Transfomer pooled output activation as the paper representation for P (Equation 1). We note that for inference, SPECTER requires only the title and abstract of the given input paper; the model does not need any citation information about the input paper. This means that SPECTER can produce embeddings even for new papers that have yet to be cited, which is critical for applications that target recent scientific papers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">SCIDOCS Evaluation Framework</head><p>Previous evaluations of scientific document representations in the literature tend to focus on small datasets over a limited set of tasks, and extremely high (99%+) AUC scores are already possible on these data for English documents . New, larger and more diverse benchmark datasets are necessary. Here, we introduce a new comprehensive evaluation framework to measure the effectiveness of scientific paper embeddings, which we call SCIDOCS. The framework consists of diverse tasks, ranging from citation prediction, to prediction of user activity, to document classification and paper recommendation. Note that SPECTER will not be further fine-tuned on any of the tasks; we simply plug in the embeddings as features for each task. Below, we describe each of the tasks in detail and the evaluation data associated with it. In addition to our training data, we release all the datasets associated with the evaluation tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Document Classification</head><p>An important test of a document-level embedding is whether it is predictive of the class of the document. Here, we consider two classification tasks in the scientific domain: MeSH Classification In this task, the goals is to classify scientific papers according to their Medical Subject Headings (MeSH) <ref type="bibr" target="#b30">(Lipscomb, 2000)</ref>. <ref type="bibr">7</ref> We construct a dataset consisting of 23K academic medical papers, where each paper is assigned one of 11 top-level disease classes such as cardiovascular diseases, diabetes, digestive diseases derived from the MeSH vocabulary. The most populated category is Neoplasms (cancer) with 5.4K instances (23.3% of the total dataset) while the category with least number of samples is Hepatitis (1.7% of the total dataset). We follow the approach of <ref type="bibr" target="#b13">Feldman et al. (2019)</ref> in mapping the MeSH vocabulary to the disease classes.</p><p>Paper Topic Classification This task is predicting the topic associated with a paper using the predefined topic categories of the Microsoft Academic Graph (MAG) <ref type="bibr" target="#b45">(Sinha et al., 2015)</ref> 8 . MAG provides a database of papers, each tagged with a list of topics. The topics are organized in a hierarchy of 5 levels, where level 1 is the most general and level 5 is the most specific. For our evaluation, we derive a document classification dataset from the level 1 topics, where a paper is labeled by its corresponding level 1 MAG topic. We construct a dataset of 25K papers, almost evenly split over the 19 different classes of level 1 categories in MAG.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Citation Prediction</head><p>As argued above, citations are a key signal of relatedness between papers. We test how well different paper representations can reproduce this signal through citation prediction tasks. In particular, we focus on two sub-tasks: predicting direct citations, and predicting co-citations. We frame these as ranking tasks and evaluate performance using MAP and nDCG, standard ranking metrics. Direct Citations In this task, the model is asked to predict which papers are cited by a given query paper from a given set of candidate papers. The evaluation dataset includes approximately 30K total papers from a held-out pool of papers, consisting of 1K query papers and a candidate set of up to 5 cited papers and 25 (randomly selected) uncited papers. The task is to rank the cited papers higher than the uncited papers. For each embedding method, we require only comparing the L2 distance between the raw embeddings of the query and the candidates, without any additional trainable parameters.</p><p>Co-Citations This task is similar to the direct citations but instead of predicting a cited paper, the goal is to predict a highly co-cited paper with a given paper. Intuitively, if papers A and B are cited frequently together by several papers, this shows that the papers are likely highly related and a good paper representation model should be able to identify these papers from a given candidate set. The dataset consists of 30K total papers and is constructed similar to the direct citations task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">User Activity</head><p>The embeddings for similar papers should be close to each other; we use user activity as a proxy for identifying similar papers and test the model's ability to recover this information. Multiple users consuming the same items as one another is a classic relatedness signal and forms the foundation for recommender systems and other applications <ref type="bibr" target="#b42">(Schafer et al., 2007)</ref>. In our case, we would expect that when users look for academic papers, the papers they view in a single browsing session tend to be related. Thus, accurate paper embeddings should, all else being equal, be relatively more similar for papers that are frequently viewed in the same session than for other papers. To build benchmark datasets to test embeddings on user activity, we obtained logs of user sessions from a major academic search engine. We define the following two tasks on which we build benchmark datasets to test embeddings:</p><p>Co-Views Our co-views dataset consists of approximately 30K papers. To construct it, we take 1K random papers that are not in our train or development set and associate with each one up to 5 frequently co-viewed papers and 25 randomly selected papers (similar to the approach for citations). Then, we require the embedding model to rank the co-viewed papers higher than the random papers by comparing the L2 distances of raw embeddings. We evaluate performance using standard ranking metrics, nDCG and MAP.</p><p>Co-Reads If the user clicks to access the PDF of a paper from the paper description page, this is a potentially stronger sign of interest in the paper. In such a case we assume the user will read at least parts of the paper and refer to this as a "read" action. Accordingly, we define a "co-reads" task and dataset analogous to the co-views dataset described above. This dataset is also approximately 30K papers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Recommendation</head><p>In the recommendation task, we evaluate the ability of paper embeddings to boost performance in a production recommendation system. Our recommendation task aims to help users navigate the scientific literature by ranking a set of "similar papers" for a given paper. We use a dataset of user clickthrough data for this task which consists of 22K clickthrough events from a public scholarly search engine. We partitioned the examples temporally into train (20K examples), validation (1K), and test (1K) sets. As is typical in clickthrough data on ranked lists, the clicks are biased toward the top of original ranking presented to the user. To counteract this effect, we computed propensity scores using a swap experiment (Agarwal et al., 2019). The propensity scores give, for each position in the ranked list, the relative frequency that the position is over-represented in the data due to exposure bias. We can then compute de-biased evaluation metrics by dividing the score for each test example by the propensity score for the clicked position. We report propensity-adjusted versions of the standard ranking metrics Precision@1 (P @1) and Normalized Discounted Cumulative Gain (nDCG).</p><p>We test different embeddings on the recommendation task by including cosine embedding distance 9 as a feature within an existing recommendation system that includes several other informative features (title/author similarity, reference and citation overlap, etc.). Thus, the recommendation experiments measure whether the embeddings can boost the performance of a strong baseline system on an end task. For SPECTER, we also perform an online A/B test to measure whether its advantages on the offline dataset translate into improvements on the online recommendation task ( §5).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head><p>Training Data To train our model, we use a subset of the Semantic Scholar corpus consisting of about 146K query papers (around 26.7M tokens) with their corresponding outgoing citations, and we use an additional 32K papers for validation. For each query paper we construct up to 5 training triples comprised of a query, a positive, and a negative paper. The positive papers are sampled from the direct citations of the query, while negative papers are chosen either randomly or from citations of citations (as discussed in §2.4). We empirically found it helpful to use 2 hard negatives (citations of citations) and 3 easy negatives (randomly selected papers) for each query paper. This process results in about 684K training triples and 145K validation triples.</p><p>Training and Implementation We implement our model in AllenNLP . We initialize the model from SciBERT pretrained weights <ref type="bibr" target="#b3">(Beltagy et al., 2019)</ref> since it is the stateof-the-art pretrained language model on scientific text. We continue training all model parameters on our training objective (Equation 2). We perform minimal tuning of our model's hyperparameters based on the performance on the validation set, while baselines are extensively tuned. Based on initial experiments, we use a margin m=1 for the triplet loss. For training, we use the Adam optimizer (Kingma and Ba, 2014) following the suggested hyperparameters in Devlin et al. (2019) (LR: 2e-5, Slanted Triangular LR scheduler 10 (Howard and Ruder, 2018) with number of train steps equal to training instances and cut fraction of 0.1). We train the model on a single Titan V GPU (12G memory) for 2 epochs, with batch size of 4 (the maximum that fit in our GPU memory) and use gradient accumulation for an effective batch size of 32. Each training epoch takes approximately 1-2 days to complete on the full dataset. We release our code and data to facilitate reproducibility. 11</p><p>Task-Specific Model Details For the classification tasks, we used a linear SVM where embedding vectors were the only features. The C hyperparameter was tuned via a held-out validation set.</p><p>For the recommendation tasks, we use a feedforward ranking neural network that takes as input ten features designed to capture the similarity between each query and candidate paper, including the cosine similarity between the query and candidate embeddings and manually-designed features computed from the papers' citations, titles, authors, and publication dates.</p><p>Baseline Methods Our work falls into the intersection of textual representation, citation mining, and graph learning, and we evaluate against stateof-the-art baselines from each of these areas. We compare with several strong textual models: SIF <ref type="bibr" target="#b2">(Arora et al., 2017)</ref>, a method for learning document representations by removing the first principal component of aggregated word-level embeddings which we pretrain on scientific text; SciBERT <ref type="bibr" target="#b3">(Beltagy et al., 2019)</ref> a state-of-the-art pretrained Transformer LM for scientific text; and Sent-BERT <ref type="bibr" target="#b40">(Reimers and Gurevych, 2019)</ref>, a model that uses negative sampling to tune BERT for producing optimal sentence embeddings. We also compare with Citeomatic , a closely related paper representation model for citation prediction which trains content-based representations with citation graph information via dynamically sampled triplets, and SGC <ref type="bibr" target="#b50">(Wu et al., 2019a)</ref>, a state-of-the-art graph-convolutional approach. For completeness, additional baselines are also included; due to space constraints we refer to Appendix A for detailed discussion of all baselines. We tune hyperparameters of baselines to maximize performance on a separate validation set. <ref type="table" target="#tab_1">Table 1</ref> presents the main results corresponding to our evaluation tasks (described in §3). Overall, we observe substantial improvements across all tasks with average performance of 80.0 across all metrics on all tasks which is a 3.1 point absolute improvement over the next-best baseline. We now discuss the results in detail.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Results</head><p>For document classification, we report macro F1, a standard classification metric. We observe that the classifier performance when trained on our representations is better than when trained on any other baseline. Particularly, on the MeSH (MAG) dataset, we obtain an 86.4 (82.0) F1 score which is about a ∆= + 2.3 (+1.5) point absolute increase over the best baseline on each dataset respectively. Our evaluation of the learned representations on predicting user activity is shown in the "User activity" columns of <ref type="table" target="#tab_1">Table 1</ref>. SPECTER achieves a MAP score of 83.8 on the co-view task, and 84.5 on coread, improving over the best baseline (Citeomatic in this case) by 2.7 and 4.0 points, respectively. We observe similar trends for the "citation" and "co-citation" tasks, with our model outperforming virtually all other baselines except for SGC, which has access to the citation graph at training and test time. 12 Note that methods like SGC cannot be used in real-world setting to embed new papers that are not cited yet. On the other hand, on cocitation data our method is able to achieve the best results with nDCG of 94.8, improving over SGC with 2.3 points. Citeomatic also performs well on the citation tasks, as expected given that its primary design goal was citation prediction. Nevertheless, our method slightly outperforms Citeomatic on the direct citation task, while substantially outperforming it on co-citations (+2.0 nDCG). Finally, for recommendation task, we observe that SPECTER outperforms all other models on this task as well, with nDCG of 53.9. On the recommendations task, as opposed to previous experiments, the differences in method scores are generally smaller. This is because for this task the embeddings are used along with several other informative features in the ranking model (described under task-specific models in §4), meaning that embedding variants have less opportunity for impact on overall performance.</p><p>We also performed an online study to evaluate whether SPECTER embeddings offer similar advantages in a live application. We performed an online A/B test comparing our SPECTER-based recommender to an existing production recommender system for similar papers that ranks papers by a textual similarity measure. In a dataset of 4,113 clicks, we found that SPECTER ranker improved clickthrough rate over the baseline by 46.5%, demonstrating its superiority.</p><p>We emphasize that our citation-based pretraining objective is critical for the performance of SPECTER; removing this and using a vanilla SciB-ERT results in decreased performance on all tasks. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Analysis</head><p>In this section, we analyze several design decisions in SPECTER, provide a visualization of its embedding space, and experimentally compare SPECTER's use of fixed embeddings against a finetuning approach.</p><p>Ablation Study We start by analyzing how adding or removing metadata fields from the input to SPECTER alters performance. The results are shown in the top four rows of <ref type="table" target="#tab_3">Table 2</ref> (for brevity, here we only report the average of the metrics from each task). We observe that removing the abstract from the textual input and relying only on the title results in a substantial decrease in performance. More surprisingly, adding authors as an input (along with title and abstract) hurts performance. <ref type="bibr">13</ref> One possible explanation is that author names are sparse in the corpus, making it difficult for the model to infer document-level relatedness from them. As another possible reason of this behavior, tokenization using Wordpieces might be suboptimal for author names. Many author names are out-of-vocabulary for SciBERT and thus, they might be split into sub-words and shared across names that are not semantically related, leading to noisy correlation. Finally, we find that adding venues slightly decreases performance, 14 except on document classification (which makes sense, as we would expect venues to have high correlation <ref type="bibr">13</ref> We experimented with both concatenating authors with the title and abstract and also considering them as an additional field. Neither were helpful.</p><p>14 Venue information in our data came directly from publisher provided metadata and thus was not normalized. with paper topics). The fact that SPECTER does not require inputs like authors or venues makes it applicable in situations where this metadata is not available, such as matching reviewers with anonymized submissions, or performing recommendations of anonymized preprints (e.g., on OpenReview). One design decision in SPECTER is to use a set of hard negative distractors in the citation-based finetuning objective. The fifth row of <ref type="table" target="#tab_3">Table 2</ref> shows that this is important-using only easy negatives reduces performance on all tasks. While there could be other potential ways to include hard negatives in the model, our simple approach of including citations of citations is effective. The sixth row of the table shows that using a strong general-domain language model (BERT-Large) instead of SciBERT in SPECTER reduces performance considerably. This is reasonable because unlike BERT-Large, SciB-ERT is pretrained on scientific text.</p><p>Visualization <ref type="figure">Figure 2</ref> shows t-SNE (van der Maaten, 2014) projections of our embeddings (SPECTER) compared with the SciBERT baseline for a random set of papers. When comparing SPECTER embeddings with SciBERT, we observe that our embeddings are better at encoding topical information, as the clusters seem to be more compact. Further, we see some examples of crosstopic relatedness reflected in the embedding space (e.g., Engineering, Mathematics and Computer Science are close to each other, while Business and Economics are also close to each other). To quantify the comparison of visualized embeddings in <ref type="figure">Figure 2</ref>, we use the DBScan clustering algorithm <ref type="bibr" target="#b12">(Ester et al., 1996)</ref> on this 2D projection. We use the completeness and homogeneity clustering quality measures introduced by <ref type="bibr" target="#b41">Rosenberg and Hirschberg (2007)</ref>. For the points corresponding to <ref type="figure">Figure 2</ref>, the homogeneity and completeness values for SPECTER are respectively 0.41 and 0.72 compared with SciBERT's 0.19 and 0.63, a clear improvement on separating topics using the projected embeddings.</p><p>Comparison with Task Specific Fine-Tuning While the fact that SPECTER does not require finetuning makes its paper embeddings less costly to use, often the best performance from pretrained Transformers is obtained when the models are finetuned directly on each end task. We experiment with fine-tuning SciBERT on our tasks, and find this to be generally inferior to using our fixed representations from SPECTER. Specifically, we finetune SciBERT directly on task-specific signals instead of citations. To fine-tune on task-specific data (e.g., user activity), we used a dataset of coviews with 65K query papers, co-reads with 14K query papers, and co-citations (instead of direct citations) with 83K query papers. As the end tasks are ranking tasks, for all datasets we construct up to 5 triplets and fine-tune the model using triplet ranking loss. The positive papers are sampled from the most co-viewed (co-read, or co-cited) papers corresponding to the query paper. We also include both easy and hard distractors as when training SPECTER (for hard negatives we choose the least non-zero co-viewed (co-read, or co-cited) papers). We also consider training jointly on all task-specific training data sources in a multitask training process, where the model samples training triplets from a distribution over the sources. As illustrated in Table 3, without any additional final task-specific fine-tuning, SPECTER still outperforms a SciBERT model fine-tuned on the end tasks as well as their multitask combination, further demonstrating the effectiveness and versatility of SPECTER embeddings. 15</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Related Work</head><p>Recent representation learning methods in NLP rely on training large neural language models on unsupervised data <ref type="bibr" target="#b38">Radford et al., 2018;</ref><ref type="bibr" target="#b11">Devlin et al., 2019;</ref><ref type="bibr" target="#b3">Beltagy et al., 2019;</ref><ref type="bibr" target="#b33">Liu et al., 2019)</ref>. While successful at many sentenceand token-level tasks, our focus is on using the models for document-level representation learning, which has remained relatively under-explored.</p><p>There have been other efforts in document representation learning such as extensions of word vectors to documents <ref type="bibr" target="#b28">(Le and Mikolov, 2014;</ref><ref type="bibr" target="#b14">Ganesh et al., 2016;</ref><ref type="bibr" target="#b51">Wu et al., 2018;</ref><ref type="bibr" target="#b16">Gysel et al., 2017)</ref>, convolution-based methods <ref type="bibr" target="#b55">Zamani et al., 2018)</ref>, and variational autoencoders <ref type="bibr" target="#b19">(Holmer and Marfurt, 2018;</ref>. Relevant to document embedding, sentence embedding is a relatively well-studied area of research. Successful approaches include seq2seq models <ref type="bibr">(Kiros et al., 2015)</ref>, BiLSTM Siamese networks <ref type="bibr" target="#b49">(Williams et al., 2018)</ref>, leveraging supervised data from other corpora <ref type="bibr" target="#b10">(Conneau et al., 2017)</ref>, and using discourse relations <ref type="bibr" target="#b35">(Nie et al., 2019)</ref>, and BERT-based methods <ref type="bibr" target="#b40">(Reimers and Gurevych, 2019)</ref>. Unlike our proposed method, the majority of these approaches do not consider any notion of inter-document relatedness when embedding documents.</p><p>Other relevant work combines textual features with network structure <ref type="bibr" target="#b46">(Tu et al., 2017;</ref>. These works typically do not leverage the recent pretrained contextual representations and with a few exceptions such as the recent work by , they cannot generalize to unseen documents like our SPECTER approach. Context-based citation recommendation is another related application where models rely on citation contexts <ref type="bibr" target="#b21">(Jeong et al., 2019)</ref> to make predictions. These works are orthogonal to ours as the input to our model is just paper title and abstract. Another related line of work is graphbased representation learning methods <ref type="bibr" target="#b6">(Bruna et al., 2014;</ref><ref type="bibr" target="#b24">Kipf and Welling, 2017;</ref><ref type="bibr">Hamilton et al., 2017a,b;</ref><ref type="bibr">Wu et al., 2019a,b)</ref>. Here, we compare to a graph representation learning model, SGC (Simple Graph Convolution) <ref type="bibr" target="#b50">(Wu et al., 2019a)</ref>, which is a state-of-the-art graph convolution approach for representation learning. SPECTER uses pretrained language models in combination with graph-based citation signals, which enables it to outperform the graph-based approaches in our experiments.</p><p>SPECTER embeddings are based on only the title and abstract of the paper. Adding the full text of the paper would provide a more complete picture of the paper's content and could improve accuracy <ref type="bibr" target="#b9">(Cohen et al., 2010;</ref><ref type="bibr" target="#b29">Lin, 2008;</ref><ref type="bibr" target="#b43">Schuemie et al., 2004)</ref>. However, the full text of many academic papers is not freely available. Further, modern language models have strict memory limits on input size, which means new techniques would be required in order to leverage the entirety of the paper within the models. Exploring how to use the full paper text within SPECTER is an item of future work.</p><p>Finally, one pain point in academic paper recommendation research has been a lack of publicly available datasets <ref type="bibr" target="#b8">(Chen and Lee, 2018;</ref><ref type="bibr" target="#b22">Kanakia et al., 2019)</ref>. To address this challenge, we release SCIDOCS, our evaluation benchmark which includes an anonymized clickthrough dataset from an online recommendations system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">Conclusions and Future Work</head><p>We present SPECTER, a model for learning representations of scientific papers, based on a Transformer language model that is pretrained on cita-tions. We achieve substantial improvements over the strongest of a wide variety of baselines, demonstrating the effectiveness of our model. We additionally introduce SCIDOCS, a new evaluation suite consisting of seven document-level tasks and release the corresponding datasets to foster further research in this area.</p><p>The landscape of Transformer language models is rapidly changing and newer and larger models are frequently introduced. It would be interesting to initialize our model weights from more recent Transformer models to investigate if additional gains are possible. Another item of future work is to develop better multitask approaches to leverage multiple signals of relatedness information during training. We used citations to build triplets for our loss function, however there are other metrics that have good support from the bibliometrics literature <ref type="bibr" target="#b26">(Klavans and Boyack, 2006)</ref> that warrant exploring as a way to create relatedness graphs. Including other information such as outgoing citations as additional input to the model would be yet another area to explore in future.</p><p>A Appendix A -Baseline Details 1. Random Zero-mean 25-dimensional vectors were used as representations for each document.</p><p>2. Doc2Vec Doc2Vec is one of the earlier neural document/paragraph representation methods <ref type="bibr" target="#b28">(Le and Mikolov, 2014)</ref>, and is a natural comparison. We trained Doc2Vec on our training subset using Gensim <ref type="bibr">(Řehůřek and Sojka, 2010)</ref>, and chose the hyperparameter grid using suggestions from Lau and Baldwin (2016). The hyperparameter grid used:</p><p>{'window': <ref type="bibr">[5,</ref><ref type="bibr">10,</ref><ref type="bibr">15]</ref>, 'sample': [0, 10 ** -6, 10 ** -5], 'epochs': <ref type="bibr">[50,</ref><ref type="bibr">100,</ref><ref type="bibr">200</ref>]}, for a total of 27 models. The other parameters were set as follows: vector_size=300, min_count=3, alpha=0.025, min_alpha=0.0001, negative=5, dm=0, dbow=1, dbow_words=0. 3. Fasttext-Sum This simple baseline is a weighted sum of pretrained word vectors. We trained our own 300 dimensional fasttext embeddings <ref type="bibr" target="#b5">(Bojanowski et al., 2017)</ref> on a corpus of around 3.1B tokens from scientific papers which is similar in size to the SciBERT corpus <ref type="bibr" target="#b3">(Beltagy et al., 2019)</ref>. We found that these pretrained embeddings substantially outperform alternative off-theshelf embeddings. We also use these embeddings in other baselines that require pretrained word vectors (i.e., SIF and SGC that are described below). The summed bag of words representation has a number of weighting options, which are extensively tuned on a validation set for best performance. 4. SIF The SIF method of <ref type="bibr" target="#b2">Arora et al. (2017)</ref> is a strong text representation baseline that takes a weighted sum of pretrained word vectors (we use fasttext embeddings described above), then computes the first principal component of the document embedding matrix and subtracts out each document embedding's projection to the first principal component.</p><p>We used a held-out validation set to choose a from the range [1.0e-5, 1.0e-3] spaced evenly on a log scale. The word probability p(w) was estimated on the training set only. When computing term-frequency values for SIF, we used scikit-learn's TfidfVectorizer with the same parameters as enumerated in the preceding section. sublinear_tf, binary, use_idf, smooth_idf were all set to False. Since SIF is a sum of pretrained fasttext vectors, the resulting dimensionality is 300. provides contextualized representations of tokens in a document. It can provide paragraph or document embeddings by averaging each token's representation for all 3 LSTM layers. We used the 768-dimensional pretrained ELMo model in AllenNLP .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">ELMo ELMo</head><p>6. Citeomatic The most relevant baseline is Citeomatic , which is an academic paper representation model that is trained on the citation graph via sampled triplets. Citeomatic representations are an L2 normalized weighted sum of title and abstract embeddings, which are trained on the citation graph with dynamic negative sampling. Citeomatic embeddings are 75-dimensional. 7. SGC Since our algorithm is trained on data from the citation graph, we also compare to a state-ofthe-art graph representation learning model: SGC (Simple Graph Convolution) <ref type="bibr" target="#b50">(Wu et al., 2019a)</ref>, which is a graph convolution network. An alternative comparison would have been Graph-SAGE <ref type="bibr" target="#b18">(Hamilton et al., 2017b)</ref>, but SGC (with no learning) outperformed an unsupervised variant of GraphSAGE on the Reddit dataset 16 , Note that SGC with no learning boils down to graph propagation on node features (in our case nodes are academic documents). Following Hamilton et al. (2017a), we used SIF features as node representations, and applied SGC with a range of parameter k, which is the number of times the normalized adjacency is multiplied by the SIF feature matrix. Our range of k was 1 through 8 (inclusive), and was chosen with a validation set. For the node features, we chose the SIF model with a = 0.0001, as this model was observed to be a high-performing one. This baseline is also 300 dimensional.</p><p>8. SciBERT To isolate the advantage of SPECTER's citation-based fine-tuning objective, we add a controlled comparison with SciBERT <ref type="bibr" target="#b3">(Beltagy et al., 2019)</ref>. Following <ref type="bibr" target="#b11">Devlin et al. (2019)</ref> we take the last layer hidden state corresponding to the [CLS] token as the aggregate document representation. 17 9. Sentence BERT Sentence BERT <ref type="bibr" target="#b40">(Reimers and Gurevych, 2019</ref>) is a general-domain pretrained model aimed at embedding sentences. The authors fine-tuned BERT using a triplet loss, where positive sentences were from the same document section as the seed sentence, and distractor sentences came from other document sections. The model is designed to encode sentences as opposed to paragraphs, so we embed the title and each sentence in the abstract separately, sum the embeddings, and L2 normalize the result to produce a final 768-dimensional paper embedding. <ref type="bibr">18</ref> During hyperparameter optimization we chose how to compute TF and IDF values weights by taking the following non-redundant combinations of scikit-learn's TfidfVectorizer <ref type="bibr" target="#b36">(Pedregosa et al., 2011)</ref> parameters: sublinear_tf, binary, use_idf, smooth_idf. There were a total of 9 parameter combinations. The IDF values were estimated on the training set. The other parameters were set as follows: min_df=3, max_df=0.75, strip_accents='ascii', stop_words='english', norm=None, lowercase=True. For training of fasttext, we used all default parameters with the exception of setting dimension to 300 and minCount was set to 25 due to the large corpus.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>t-SNE visualization of paper embeddings and their corresponding MAG topics.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 :</head><label>1</label><figDesc>Results on the SCIDOCS evaluation suite consisting of 7 tasks.</figDesc><table /><note></note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2 :</head><label>2</label><figDesc></figDesc><table /><note>Ablations: Numbers are averages of metrics for each evaluation task: CLS: classification, USR: User activity, CITE: Citation prediction, REC: Recom- mendation, Avg. average over all tasks &amp; metrics.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head></head><label></label><figDesc>SciBERT fine-tune on co-view 83.0 84.2 84.1 36.4 76.0 SciBERT fine-tune on co-read 82.3 85.4 86.7 36.3 77.1 SciBERT fine-tune on co-citation 82.9 84.3 85.2 36.6 76.4 SciBERT fine-tune on multitask 83.3 86.1 88.2 36.0 78.0</figDesc><table><row><cell>Training signal</cell><cell>CLS USR CITE REC All</cell></row><row><cell>SPECTER</cell><cell>84.2 88.4 91.5 36.9 80.0</cell></row></table><note></note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 3 :</head><label>3</label><figDesc>Comparison with task-specific fine-tuning.</figDesc><table /><note></note></figure>
<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2">SPECTER: Scientific Paper Embeddings using Citationinformed TransformERs</note>
<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4">We also experimented with additional fields such as venues and authors but did not find any empirical advantage in using those (see §6). See §7 for a discussion of using the full text of the paper as input.5 It is also possible to encode title and abstracts individually and then concatenate or combine them to get the final embedding. However, in our experiments this resulted in sub-optimal performance.</note>
<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6">We also experimented with other distance functions (e..g, normalized cosine), but they underperformed the L2 loss.</note>
<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7">https://www.nlm.nih.gov/mesh/meshhome. html 8 https://academic.microsoft.com/</note>
<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9">Embeddings are L2 normalized and in this case cosine distance is equivalent to L2 distance.</note>
<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10">Learning rate linear warmup followed by linear decay. 11 https://github.com/allenai/specter</note>
<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="12">For SGC, we remove development and test set citations and co-citations during training. We also remove incoming citations from development and test set queries as these would not be available at test time in production.</note>
<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="15">We also experimented with further task-specific finetuning of our SPECTER on the end tasks but we did not observe additional improvements.</note>
<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="16">There were no other direct comparisons in<ref type="bibr" target="#b50">Wu et al. (2019a)</ref> 17 We also tried the alternative of averaging all token representations, but this resulted in a slight performance decrease compared with the [CLS] pooled token.</note>
<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="18">We used the 'bert-base-wikipedia-sections-mean-tokens' model released by the authors: https://github.com/ UKPLab/sentence-transformers</note>
</body>
<back>
<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>We thank Kyle Lo, Daniel King and Oren Etzioni for helpful research discussions, Russel Reas for setting up the public API, Field Cady for help in initial data collection and the anonymous reviewers (especially Reviewer 1) for comments and suggestions. This work was supported in part by NSF Convergence Accelerator award 1936940, ONR grant N00014-18-1-2193, and the University of Washington WRF/Cable Professorship.</p></div>
</div>
<div type="references">
<listBibl>
<biblStruct xml:id="b0">
<analytic>
<title level="a" type="main">Estimating position bias without intrusive interventions</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">K</forename><surname>Anant</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Ivan</forename><surname>Agarwal</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Xuanhui</forename><surname>Zaitsev</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Wang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Yen</forename><surname>Cheng</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Marc</forename><surname>Li</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Thorsten</forename><surname>Najork</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Joachims</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">WSDM</title>
<imprint>
<date type="published" when="2019" />
</imprint>
</monogr>
<note type="raw_reference">Anant K. Agarwal, Ivan Zaitsev, Xuanhui Wang, Cheng Yen Li, Marc Najork, and Thorsten Joachims. 2019. Estimating position bias without intrusive in- terventions. In WSDM.</note>
</biblStruct>
<biblStruct xml:id="b1">
<analytic>
<title level="a" type="main">Construction of the literature graph in semantic scholar</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Waleed</forename><surname>Ammar</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Dirk</forename><surname>Groeneveld</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Chandra</forename><surname>Bhagavatula</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Iz</forename><surname>Beltagy</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Miles</forename><surname>Crawford</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Doug</forename><surname>Downey</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Jason</forename><surname>Dunkelberger</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Ahmed</forename><surname>Elgohary</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Sergey</forename><surname>Feldman</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Vu</forename><surname>Ha</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Rodney</forename><surname>Kinney</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Sebastian</forename><surname>Kohlmeier</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Kyle</forename><surname>Lo</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Tyler</forename><forename type="middle">C</forename><surname>Murray</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Hsu-Han</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Matthew</forename><forename type="middle">E</forename><surname>Ooi</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Joanna</forename><surname>Peters</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Sam</forename><surname>Power</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Lucy</forename><forename type="middle">Lu</forename><surname>Skjonsberg</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Christopher</forename><surname>Wang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Zheng</forename><surname>Wilhelm</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Madeleine</forename><surname>Yuan</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Oren</forename><surname>Van Zuylen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Etzioni</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">NAACL-HLT</title>
<imprint>
<date type="published" when="2018" />
</imprint>
</monogr>
<note type="raw_reference">Waleed Ammar, Dirk Groeneveld, Chandra Bha- gavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebas- tian Kohlmeier, Kyle Lo, Tyler C. Murray, Hsu- Han Ooi, Matthew E. Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Christopher Wilhelm, Zheng Yuan, Madeleine van Zuylen, and Oren Et- zioni. 2018. Construction of the literature graph in semantic scholar. In NAACL-HLT.</note>
</biblStruct>
<biblStruct xml:id="b2">
<analytic>
<title level="a" type="main">A simple but tough-to-beat baseline for sentence embeddings</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Sanjeev</forename><surname>Arora</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Yingyu</forename><surname>Liang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Tengyu</forename><surname>Ma</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">ICLR</title>
<imprint>
<date type="published" when="2017" />
</imprint>
</monogr>
<note type="raw_reference">Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence em- beddings. In ICLR.</note>
</biblStruct>
<biblStruct xml:id="b3">
<analytic>
<title level="a" type="main">SciB-ERT: A Pretrained Language Model for Scientific Text</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Iz</forename><surname>Beltagy</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Kyle</forename><surname>Lo</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Arman</forename><surname>Cohan</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">EMNLP</title>
<imprint>
<date type="published" when="2019" />
</imprint>
</monogr>
<note type="raw_reference">Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciB- ERT: A Pretrained Language Model for Scientific Text. In EMNLP.</note>
</biblStruct>
<biblStruct xml:id="b4">
<monogr>
<title level="m" type="main">Content-Based Citation Recommendation</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Chandra</forename><surname>Bhagavatula</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Sergey</forename><surname>Feldman</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Russell</forename><surname>Power</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Waleed</forename><surname>Ammar</surname></persName>
</author>
<editor>NAACL-HLT</editor>
<imprint>
<date type="published" when="2018" />
</imprint>
</monogr>
<note type="raw_reference">Chandra Bhagavatula, Sergey Feldman, Russell Power, and Waleed Ammar. 2018. Content-Based Citation Recommendation. In NAACL-HLT.</note>
</biblStruct>
<biblStruct xml:id="b5">
<monogr>
<title level="m" type="main">Enriching word vectors with subword information</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Piotr</forename><surname>Bojanowski</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Edouard</forename><surname>Grave</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Armand</forename><surname>Joulin</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Tomas</forename><surname>Mikolov</surname></persName>
</author>
<idno type="DOI">10.1162/tacl_a_00051</idno>
<imprint>
<date type="published" when="2017" />
</imprint>
</monogr>
<note type="raw_reference">Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. TACL.</note>
</biblStruct>
<biblStruct xml:id="b6">
<monogr>
<title/>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Joan</forename><surname>Bruna</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Wojciech</forename><surname>Zaremba</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Arthur</forename><surname>Szlam</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Yann</forename><surname>Lecun</surname></persName>
</author>
<imprint>
<date type="published" when="2014" />
</imprint>
</monogr>
<note type="raw_reference">Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral networks and locally connected networks on graphs. ICLR.</note>
</biblStruct>
<biblStruct xml:id="b7">
<analytic>
<title level="a" type="main">Improving textual network embedding with global attention via optimal transport</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Liqun</forename><surname>Chen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Guoyin</forename><surname>Wang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Chenyang</forename><surname>Tao</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Dinghan</forename><surname>Shen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Pengyu</forename><surname>Cheng</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Xinyuan</forename><surname>Zhang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Wenlin</forename><surname>Wang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Yizhe</forename><surname>Zhang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Lawrence</forename><surname>Carin</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">ACL</title>
<imprint>
<date type="published" when="2019" />
</imprint>
</monogr>
<note type="raw_reference">Liqun Chen, Guoyin Wang, Chenyang Tao, Ding- han Shen, Pengyu Cheng, Xinyuan Zhang, Wenlin Wang, Yizhe Zhang, and Lawrence Carin. 2019. Im- proving textual network embedding with global at- tention via optimal transport. In ACL.</note>
</biblStruct>
<biblStruct xml:id="b8">
<analytic>
<title level="a" type="main">Research Paper Recommender Systems on Big Scholarly Data</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Maria</forename><surname>Tsung Teng Chen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Lee</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">Knowledge Management and Acquisition for Intelligent Systems</title>
<imprint>
<date type="published" when="2018" />
</imprint>
</monogr>
<note type="raw_reference">Tsung Teng Chen and Maria Lee. 2018. Research Pa- per Recommender Systems on Big Scholarly Data. In Knowledge Management and Acquisition for In- telligent Systems.</note>
</biblStruct>
<biblStruct xml:id="b9">
<analytic>
<title level="a" type="main">The structural and content aspects of abstracts versus bodies of full text journal articles are different</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">K</forename><surname>Cohen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Helen</forename><forename type="middle">L</forename><surname>Johnson</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Karin</forename><forename type="middle">M</forename><surname>Verspoor</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Christophe</forename><surname>Roeder</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Lawrence</forename><surname>Hunter</surname></persName>
</author>
</analytic>
<monogr>
<title level="j">BMC Bioinformatics</title>
<imprint>
<biblScope unit="volume">11</biblScope>
<biblScope unit="page" from="492" to="492" />
<date type="published" when="2010" />
</imprint>
</monogr>
<note type="raw_reference">K. Bretonnel Cohen, Helen L. Johnson, Karin M. Ver- spoor, Christophe Roeder, and Lawrence Hunter. 2010. The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinformatics, 11:492-492.</note>
</biblStruct>
<biblStruct xml:id="b10">
<analytic>
<title level="a" type="main">Supervised Learning of Universal Sentence Representations from Natural Language Inference Data</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Alexis</forename><surname>Conneau</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Douwe</forename><surname>Kiela</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Holger</forename><surname>Schwenk</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Loïc</forename><surname>Barrault</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Antoine</forename><surname>Bordes</surname></persName>
</author>
<idno type="DOI">10.18653/v1/D17-1070</idno>
</analytic>
<monogr>
<title level="m">EMNLP</title>
<imprint>
<date type="published" when="2017" />
</imprint>
</monogr>
<note type="raw_reference">Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault, and Antoine Bordes. 2017. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. In EMNLP.</note>
</biblStruct>
<biblStruct xml:id="b11">
<analytic>
<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Jacob</forename><surname>Devlin</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Ming-Wei</forename><surname>Chang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Kenton</forename><surname>Lee</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Kristina</forename><surname>Toutanova</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">NAACL-HLT</title>
<imprint>
<date type="published" when="2019" />
</imprint>
</monogr>
<note type="raw_reference">Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language under- standing. In NAACL-HLT.</note>
</biblStruct>
<biblStruct xml:id="b12">
<analytic>
<title level="a" type="main">A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Martin</forename><surname>Ester</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Hans-Peter</forename><surname>Kriegel</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Jörg</forename><surname>Sander</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Xiaowei</forename><surname>Xu</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">KDD</title>
<imprint>
<date type="published" when="1996" />
</imprint>
</monogr>
<note type="raw_reference">Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A Density-based Algorithm for Dis- covering Clusters in Large Spatial Databases with Noise. In KDD.</note>
</biblStruct>
<biblStruct xml:id="b13">
<analytic>
<title level="a" type="main">Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Sergey</forename><surname>Feldman</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Waleed</forename><surname>Ammar</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Kyle</forename><surname>Lo</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Elly</forename><surname>Trepman</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Madeleine</forename><surname>Van Zuylen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Oren</forename><surname>Etzioni</surname></persName>
</author>
<idno type="DOI">10.1001/jamanetworkopen.2019.6700</idno>
</analytic>
<monogr>
<title level="j">JAMA</title>
<imprint>
<date type="published" when="2019" />
</imprint>
</monogr>
<note type="raw_reference">Sergey Feldman, Waleed Ammar, Kyle Lo, Elly Trep- man, Madeleine van Zuylen, and Oren Etzioni. 2019. Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction. JAMA.</note>
</biblStruct>
<biblStruct xml:id="b14">
<analytic>
<title level="a" type="main">Doc2sent2vec: A novel two-phase approach for learning document representation</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">J</forename><surname>Ganesh</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Manish</forename><surname>Gupta</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Vijay</forename><forename type="middle">K</forename><surname>Varma</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">SIGIR</title>
<imprint>
<date type="published" when="2016" />
</imprint>
</monogr>
<note type="raw_reference">J Ganesh, Manish Gupta, and Vijay K. Varma. 2016. Doc2sent2vec: A novel two-phase approach for learning document representation. In SIGIR.</note>
</biblStruct>
<biblStruct xml:id="b15">
<analytic>
<title level="a" type="main">AllenNLP: A Deep Semantic Natural Language Processing Platform</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Matt</forename><surname>Gardner</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Joel</forename><surname>Grus</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Mark</forename><surname>Neumann</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Oyvind</forename><surname>Tafjord</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Pradeep</forename><surname>Dasigi</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Nelson</forename><forename type="middle">F</forename><surname>Liu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Matthew</forename><surname>Peters</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Michael</forename><surname>Schmitz</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Luke</forename><surname>Zettlemoyer</surname></persName>
</author>
<idno type="DOI">10.18653/v1/W18-2501</idno>
</analytic>
<monogr>
<title level="m">Proceedings of Workshop for NLP Open Source Software</title>
<meeting>Workshop for NLP Open Source Software</meeting>
<imprint>
<date type="published" when="2018" />
</imprint>
<respStmt>
<orgName>NLP-OSS</orgName>
</respStmt>
</monogr>
<note type="raw_reference">Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew Pe- ters, Michael Schmitz, and Luke Zettlemoyer. 2018. AllenNLP: A Deep Semantic Natural Language Pro- cessing Platform. In Proceedings of Workshop for NLP Open Source Software (NLP-OSS).</note>
</biblStruct>
<biblStruct xml:id="b16">
<analytic>
<title level="a" type="main">Neural Vector Spaces for Unsupervised Information Retrieval</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Christophe</forename><surname>Van Gysel</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Maarten</forename><surname>De Rijke</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Evangelos</forename><surname>Kanoulas</surname></persName>
</author>
</analytic>
<monogr>
<title level="j">ACM Trans. Inf. Syst</title>
<imprint>
<date type="published" when="2017" />
</imprint>
</monogr>
<note type="raw_reference">Christophe Van Gysel, Maarten de Rijke, and Evange- los Kanoulas. 2017. Neural Vector Spaces for Un- supervised Information Retrieval. ACM Trans. Inf. Syst.</note>
</biblStruct>
<biblStruct xml:id="b17">
<analytic>
<title level="a" type="main">Inductive Representation Learning on Large Graphs</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Will</forename><surname>Hamilton</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Zhitao</forename><surname>Ying</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Jure</forename><surname>Leskovec</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">NIPS</title>
<imprint>
<date type="published" when="2017" />
</imprint>
</monogr>
<note type="raw_reference">Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017a. Inductive Representation Learning on Large Graphs. In NIPS.</note>
</biblStruct>
<biblStruct xml:id="b18">
<analytic>
<title level="a" type="main">Inductive representation learning on large graphs</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">William</forename><forename type="middle">L</forename><surname>Hamilton</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Zhitao</forename><surname>Ying</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Jure</forename><surname>Leskovec</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">NIPS</title>
<imprint>
<date type="published" when="2017" />
</imprint>
</monogr>
<note type="raw_reference">William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017b. Inductive representation learning on large graphs. In NIPS.</note>
</biblStruct>
<biblStruct xml:id="b19">
<analytic>
<title level="a" type="main">Explaining away syntactic structure in semantic document representations</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Erik</forename><surname>Holmer</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Andreas</forename><surname>Marfurt</surname></persName>
</author>
<idno>abs/1806.01620</idno>
</analytic>
<monogr>
<title level="j">ArXiv</title>
<imprint>
<date type="published" when="2018" />
</imprint>
</monogr>
<note type="raw_reference">Erik Holmer and Andreas Marfurt. 2018. Explaining away syntactic structure in semantic document rep- resentations. ArXiv, abs/1806.01620.</note>
</biblStruct>
<biblStruct xml:id="b20">
<analytic>
<title level="a" type="main">Universal Language Model Fine-tuning for Text Classification</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Jeremy</forename><surname>Howard</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Sebastian</forename><surname>Ruder</surname></persName>
</author>
<idno type="DOI">10.18653/v1/P18-1031</idno>
</analytic>
<monogr>
<title level="m">ACL</title>
<imprint>
<date type="published" when="2018" />
</imprint>
</monogr>
<note type="raw_reference">Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Fine-tuning for Text Classification. In ACL.</note>
</biblStruct>
<biblStruct xml:id="b21">
<analytic>
<title level="a" type="main">A context-aware citation recommendation model with bert and graph convolutional networks</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Chanwoo</forename><surname>Jeong</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Sion</forename><surname>Jang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Hyuna</forename><surname>Shin</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Lucy</forename><surname>Eunjeong</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Sungchul</forename><surname>Park</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Choi</surname></persName>
</author>
<idno>abs/1903.06464</idno>
</analytic>
<monogr>
<title level="j">ArXiv</title>
<imprint>
<date type="published" when="2019" />
</imprint>
</monogr>
<note type="raw_reference">Chanwoo Jeong, Sion Jang, Hyuna Shin, Eun- jeong Lucy Park, and Sungchul Choi. 2019. A context-aware citation recommendation model with bert and graph convolutional networks. ArXiv, abs/1903.06464.</note>
</biblStruct>
<biblStruct xml:id="b22">
<analytic>
<title level="a" type="main">A Scalable Hybrid Research Paper Recommender System for Microsoft Academic</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Anshul</forename><surname>Kanakia</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Zhihong</forename><surname>Shen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Darrin</forename><surname>Eide</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Kuansan</forename><surname>Wang</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">WWW</title>
<imprint>
<date type="published" when="2019" />
</imprint>
</monogr>
<note type="raw_reference">Anshul Kanakia, Zhihong Shen, Darrin Eide, and Kuansan Wang. 2019. A Scalable Hybrid Research Paper Recommender System for Microsoft Aca- demic. In WWW.</note>
</biblStruct>
<biblStruct xml:id="b23">
<analytic>
<title level="a" type="main">Adam: A Method for Stochastic Optimization</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">P</forename><surname>Diederik</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Jimmy</forename><surname>Kingma</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Ba</surname></persName>
</author>
<idno>abs/1412.6980</idno>
</analytic>
<monogr>
<title level="j">ArXiv</title>
<imprint>
<date type="published" when="2014" />
</imprint>
</monogr>
<note type="raw_reference">Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. ArXiv, abs/1412.6980.</note>
</biblStruct>
<biblStruct xml:id="b24">
<monogr>
<title level="m" type="main">Semisupervised classification with graph convolutional networks</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">N</forename><surname>Thomas</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Max</forename><surname>Kipf</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Welling</surname></persName>
</author>
<imprint>
<date type="published" when="2017" />
</imprint>
</monogr>
<note type="raw_reference">Thomas N Kipf and Max Welling. 2017. Semi- supervised classification with graph convolutional networks. ICLR.</note>
</biblStruct>
<biblStruct xml:id="b25">
<analytic>
<title level="a" type="main">Raquel Urtasun, and Sanja Fidler. 2015. Skip-thought vectors</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Ryan</forename><surname>Kiros</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Yukun</forename><surname>Zhu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Ruslan</forename><surname>Salakhutdinov</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Richard</forename><forename type="middle">S</forename><surname>Zemel</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Antonio</forename><surname>Torralba</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">NIPS</title>
<imprint/>
</monogr>
<note type="raw_reference">Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urta- sun, and Sanja Fidler. 2015. Skip-thought vectors. In NIPS.</note>
</biblStruct>
<biblStruct xml:id="b26">
<analytic>
<title level="a" type="main">Identifying a better measure of relatedness for mapping science</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Richard</forename><surname>Klavans</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Kevin</forename><forename type="middle">W</forename><surname>Boyack</surname></persName>
</author>
</analytic>
<monogr>
<title level="j">Journal of the Association for Information Science and Technology</title>
<imprint>
<biblScope unit="volume">57</biblScope>
<biblScope unit="page" from="251" to="263" />
<date type="published" when="2006" />
</imprint>
</monogr>
<note type="raw_reference">Richard Klavans and Kevin W. Boyack. 2006. Iden- tifying a better measure of relatedness for mapping science. Journal of the Association for Information Science and Technology, 57:251-263.</note>
</biblStruct>
<biblStruct xml:id="b27">
<analytic>
<title level="a" type="main">An empirical evaluation of doc2vec with practical insights into document embedding generation</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Han</forename><surname>Jey</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Timothy</forename><surname>Lau</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Baldwin</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">Rep4NLP@ACL</title>
<imprint>
<date type="published" when="2016" />
</imprint>
</monogr>
<note type="raw_reference">Jey Han Lau and Timothy Baldwin. 2016. An empirical evaluation of doc2vec with practical in- sights into document embedding generation. In Rep4NLP@ACL.</note>
</biblStruct>
<biblStruct xml:id="b28">
<analytic>
<title level="a" type="main">Distributed Representations of Sentences and Documents</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Quoc</forename><surname>Le</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Tomas</forename><surname>Mikolov</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">ICML</title>
<imprint>
<date type="published" when="2014" />
</imprint>
</monogr>
<note type="raw_reference">Quoc Le and Tomas Mikolov. 2014. Distributed Repre- sentations of Sentences and Documents. In ICML.</note>
</biblStruct>
<biblStruct xml:id="b29">
<analytic>
<title level="a" type="main">Is searching full text more effective than searching abstracts?</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Jimmy</forename><forename type="middle">J</forename><surname>Lin</surname></persName>
</author>
</analytic>
<monogr>
<title level="j">BMC Bioinformatics</title>
<imprint>
<biblScope unit="volume">10</biblScope>
<biblScope unit="page" from="46" to="46" />
<date type="published" when="2008" />
</imprint>
</monogr>
<note type="raw_reference">Jimmy J. Lin. 2008. Is searching full text more effec- tive than searching abstracts? BMC Bioinformatics, 10:46-46.</note>
</biblStruct>
<biblStruct xml:id="b30">
<monogr>
<title level="m" type="main">Bulletin of the Medical Library Association</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Carolyn</forename><forename type="middle">E</forename><surname>Lipscomb</surname></persName>
</author>
<imprint>
<date type="published" when="2000" />
</imprint>
</monogr>
<note>Medical Subject Headings (MeSH)</note>
<note type="raw_reference">Carolyn E Lipscomb. 2000. Medical Subject Headings (MeSH). Bulletin of the Medical Library Associa- tion.</note>
</biblStruct>
<biblStruct xml:id="b31">
<analytic>
<title level="a" type="main">Unsupervised Document Embedding with CNNs</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Chundi</forename><surname>Liu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Shunan</forename><surname>Zhao</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Maksims</forename><surname>Volkovs</surname></persName>
</author>
<idno>abs/1711.04168v3</idno>
</analytic>
<monogr>
<title level="j">ArXiv</title>
<imprint>
<date type="published" when="2018" />
</imprint>
</monogr>
<note type="raw_reference">Chundi Liu, Shunan Zhao, and Maksims Volkovs. 2018. Unsupervised Document Embedding with CNNs. ArXiv, abs/1711.04168v3.</note>
</biblStruct>
<biblStruct xml:id="b32">
<monogr>
<title level="m" type="main">A Model of Extended Paragraph Vector for Document Categorization and Trend Analysis</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Pengfei</forename><surname>Liu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">King</forename><forename type="middle">Keung</forename><surname>Wu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Helen</forename><forename type="middle">M</forename><surname>Meng</surname></persName>
</author>
<imprint>
<date type="published" when="2017" />
</imprint>
</monogr>
<note>IJCNN</note>
<note type="raw_reference">Pengfei Liu, King Keung Wu, and Helen M. Meng. 2017. A Model of Extended Paragraph Vector for Document Categorization and Trend Analysis. IJCNN.</note>
</biblStruct>
<biblStruct xml:id="b33">
<analytic>
<title/>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Yinhan</forename><surname>Liu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Myle</forename><surname>Ott</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Naman</forename><surname>Goyal</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Jingfei</forename><surname>Du</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Mandar</forename><forename type="middle">S</forename><surname>Joshi</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Danqi</forename><surname>Chen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Omer</forename><surname>Levy</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Mike</forename><surname>Lewis</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Luke</forename><forename type="middle">S</forename><surname>Zettlemoyer</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Veselin</forename><surname>Stoyanov</surname></persName>
</author>
<idno>abs/1907.11692</idno>
</analytic>
<monogr>
<title level="j">RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv</title>
<imprint>
<date type="published" when="2019" />
</imprint>
</monogr>
<note type="raw_reference">Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man- dar S. Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke S. Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretrain- ing Approach. ArXiv, abs/1907.11692.</note>
</biblStruct>
<biblStruct xml:id="b34">
<analytic>
<title level="a" type="main">Accelerating t-SNE Using Tree-based Algorithms</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Laurens</forename><surname>Van Der Maaten</surname></persName>
</author>
</analytic>
<monogr>
<title level="j">Journal of Machine Learning Research</title>
<imprint>
<date type="published" when="2014" />
</imprint>
</monogr>
<note type="raw_reference">Laurens van der Maaten. 2014. Accelerating t-SNE Using Tree-based Algorithms. Journal of Machine Learning Research.</note>
</biblStruct>
<biblStruct xml:id="b35">
<analytic>
<title level="a" type="main">DisSent: Learning Sentence Representations from Explicit Discourse Relations</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Allen</forename><surname>Nie</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Erin</forename><surname>Bennett</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Noah</forename><surname>Goodman</surname></persName>
</author>
<idno type="DOI">10.18653/v1/P19-1442</idno>
</analytic>
<monogr>
<title level="m">ACL</title>
<imprint>
<date type="published" when="2019" />
</imprint>
</monogr>
<note type="raw_reference">Allen Nie, Erin Bennett, and Noah Goodman. 2019. DisSent: Learning Sentence Representations from Explicit Discourse Relations. In ACL.</note>
</biblStruct>
<biblStruct xml:id="b36">
<analytic>
<title level="a" type="main">Scikit-learn: Machine learning in Python</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">F</forename><surname>Pedregosa</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">G</forename><surname>Varoquaux</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">A</forename><surname>Gramfort</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">V</forename><surname>Michel</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">B</forename><surname>Thirion</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">O</forename><surname>Grisel</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">M</forename><surname>Blondel</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">P</forename><surname>Prettenhofer</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">R</forename><surname>Weiss</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">V</forename><surname>Dubourg</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">J</forename><surname>Vanderplas</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">A</forename><surname>Passos</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">D</forename><surname>Cournapeau</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">M</forename><surname>Brucher</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">M</forename><surname>Perrot</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">E</forename><surname>Duchesnay</surname></persName>
</author>
</analytic>
<monogr>
<title level="j">Journal of Machine Learning Research</title>
<imprint>
<biblScope unit="volume">12</biblScope>
<biblScope unit="page" from="2825" to="2830" />
<date type="published" when="2011" />
</imprint>
</monogr>
<note type="raw_reference">F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch- esnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830.</note>
</biblStruct>
<biblStruct xml:id="b37">
<monogr>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Matthew</forename><forename type="middle">E</forename><surname>Peters</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Mark</forename><surname>Neumann</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Mohit</forename><surname>Iyyer</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Matt</forename><surname>Gardner</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Christopher</forename><surname>Clark</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Kenton</forename><surname>Lee</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Luke</forename><surname>Zettlemoyer</surname></persName>
</author>
<title level="m">Deep Contextualized Word Representations</title>
<imprint>
<date type="published" when="2018" />
</imprint>
</monogr>
<note type="raw_reference">Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep Contextualized Word Rep- resentations.</note>
</biblStruct>
<biblStruct xml:id="b38">
<monogr>
<title level="m" type="main">Improving language understanding by generative pre-training</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Alec</forename><surname>Radford</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Karthik</forename><surname>Narasimhan</surname></persName>
</author>
<imprint>
<date type="published" when="2018" />
</imprint>
</monogr>
<note type="report_type">arXiv</note>
<note>Tim Salimans, and Ilya Sutskever</note>
<note type="raw_reference">Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language under- standing by generative pre-training. arXiv.</note>
</biblStruct>
<biblStruct xml:id="b39">
<analytic>
<title level="a" type="main">Software Framework for Topic Modelling with Large Corpora</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Petr</forename><surname>Radimřehůřek</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Sojka</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">LREC</title>
<imprint>
<date type="published" when="2010" />
</imprint>
</monogr>
<note type="raw_reference">RadimŘehůřek and Petr Sojka. 2010. Software Frame- work for Topic Modelling with Large Corpora. In LREC.</note>
</biblStruct>
<biblStruct xml:id="b40">
<analytic>
<title level="a" type="main">Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Nils</forename><surname>Reimers</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Iryna</forename><surname>Gurevych</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">EMNLP</title>
<imprint>
<date type="published" when="2019" />
</imprint>
</monogr>
<note type="raw_reference">Nils Reimers and Iryna Gurevych. 2019. Sentence- BERT: Sentence Embeddings using Siamese BERT- Networks. In EMNLP.</note>
</biblStruct>
<biblStruct xml:id="b41">
<analytic>
<title level="a" type="main">Vmeasure: A Conditional Entropy-based External Cluster Evaluation Measure</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Andrew</forename><surname>Rosenberg</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Julia</forename><surname>Hirschberg</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">EMNLP</title>
<imprint>
<date type="published" when="2007" />
</imprint>
</monogr>
<note type="raw_reference">Andrew Rosenberg and Julia Hirschberg. 2007. V- measure: A Conditional Entropy-based External Cluster Evaluation Measure. In EMNLP.</note>
</biblStruct>
<biblStruct xml:id="b42">
<analytic>
<title level="a" type="main">Collaborative filtering recommender systems</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Ben</forename><surname>Schafer</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Dan</forename><surname>Frankowski</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Jon</forename><surname>Herlocker</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Shilad</forename><surname>Sen</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">The adaptive web</title>
<imprint>
<publisher>Springer</publisher>
<date type="published" when="2007" />
</imprint>
</monogr>
<note type="raw_reference">J Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen. 2007. Collaborative filtering recom- mender systems. In The adaptive web. Springer.</note>
</biblStruct>
<biblStruct xml:id="b43">
<analytic>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">J</forename><surname>Martijn</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Marc</forename><surname>Schuemie</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Weeber</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">J</forename><forename type="middle">A</forename><surname>Bob</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Erik</forename><forename type="middle">M</forename><surname>Schijvenaars</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">C</forename><surname>Van Mulligen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Rob</forename><surname>Christiaan Van Der Eijk</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Barend</forename><surname>Jelier</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Jan</forename><forename type="middle">A</forename><surname>Mons</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Kors</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">Distribution of information in biomedical abstracts and full-text publications</title>
<imprint>
<date type="published" when="2004" />
<biblScope unit="volume">20</biblScope>
<biblScope unit="page" from="2597" to="604" />
</imprint>
</monogr>
<note type="raw_reference">Martijn J. Schuemie, Marc Weeber, Bob J. A. Schijve- naars, Erik M. van Mulligen, C. Christiaan van der Eijk, Rob Jelier, Barend Mons, and Jan A. Kors. 2004. Distribution of information in biomedical ab- stracts and full-text publications. Bioinformatics, 20(16):2597-604.</note>
</biblStruct>
<biblStruct xml:id="b44">
<analytic>
<title level="a" type="main">Improved semantic-aware network embedding with fine-grained word alignment</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Dinghan</forename><surname>Shen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Xinyuan</forename><surname>Zhang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Ricardo</forename><surname>Henao</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Lawrence</forename><surname>Carin</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">EMNLP</title>
<imprint>
<date type="published" when="2018" />
</imprint>
</monogr>
<note type="raw_reference">Dinghan Shen, Xinyuan Zhang, Ricardo Henao, and Lawrence Carin. 2018. Improved semantic-aware network embedding with fine-grained word align- ment. In EMNLP.</note>
</biblStruct>
<biblStruct xml:id="b45">
<analytic>
<title level="a" type="main">An Overview of Microsoft Academic Service (MAS) and Applications</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Arnab</forename><surname>Sinha</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Zhihong</forename><surname>Shen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Yang</forename><surname>Song</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Hao</forename><surname>Ma</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Darrin</forename><surname>Eide</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Bo-June Paul</forename><surname>Hsu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Kuansan</forename><surname>Wang</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">WWW</title>
<imprint>
<date type="published" when="2015" />
</imprint>
</monogr>
<note type="raw_reference">Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Dar- rin Eide, Bo-June Paul Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MAS) and Applications. In WWW.</note>
</biblStruct>
<biblStruct xml:id="b46">
<analytic>
<title level="a" type="main">Cane: Context-aware network embedding for relation modeling</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Cunchao</forename><surname>Tu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Han</forename><surname>Liu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Zhiyuan</forename><surname>Liu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Maosong</forename><surname>Sun</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">ACL</title>
<imprint>
<date type="published" when="2017" />
</imprint>
</monogr>
<note type="raw_reference">Cunchao Tu, Han Liu, Zhiyuan Liu, and Maosong Sun. 2017. Cane: Context-aware network embedding for relation modeling. In ACL.</note>
</biblStruct>
<biblStruct xml:id="b47">
<analytic>
<title level="a" type="main">Attention Is All You Need</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Ashish</forename><surname>Vaswani</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Noam</forename><surname>Shazeer</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Niki</forename><surname>Parmar</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Jakob</forename><surname>Uszkoreit</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Llion</forename><surname>Jones</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Aidan</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Lukasz</forename><surname>Kaiser</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Illia</forename><surname>Polosukhin</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">NIPS</title>
<imprint>
<date type="published" when="2017" />
</imprint>
</monogr>
<note type="raw_reference">Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. In NIPS.</note>
</biblStruct>
<biblStruct xml:id="b48">
<analytic>
<title level="a" type="main">Improving textual network learning with variational homophilic embeddings</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Wenlin</forename><surname>Wang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Chenyang</forename><surname>Tao</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Zhe</forename><surname>Gan</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Guoyin</forename><surname>Wang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Liqun</forename><surname>Chen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Xinyuan</forename><surname>Zhang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Ruiyi</forename><surname>Zhang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Qian</forename><surname>Yang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Ricardo</forename><surname>Henao</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Lawrence</forename><surname>Carin</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">Advances in Neural Information Processing Systems</title>
<imprint>
<date type="published" when="2019" />
<biblScope unit="page" from="2074" to="2085" />
</imprint>
</monogr>
<note type="raw_reference">Wenlin Wang, Chenyang Tao, Zhe Gan, Guoyin Wang, Liqun Chen, Xinyuan Zhang, Ruiyi Zhang, Qian Yang, Ricardo Henao, and Lawrence Carin. 2019. Improving textual network learning with variational homophilic embeddings. In Advances in Neural In- formation Processing Systems, pages 2074-2085.</note>
</biblStruct>
<biblStruct xml:id="b49">
<monogr>
<title level="m" type="main">A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Adina</forename><surname>Williams</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Nikita</forename><surname>Nangia</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Samuel</forename><surname>Bowman</surname></persName>
</author>
<idno type="DOI">10.18653/v1/N18-1101</idno>
<editor>NAACL-HLT</editor>
<imprint>
<date type="published" when="2018" />
</imprint>
</monogr>
<note type="raw_reference">Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A Broad-Coverage Challenge Corpus for Sen- tence Understanding through Inference. In NAACL- HLT.</note>
</biblStruct>
<biblStruct xml:id="b50">
<analytic>
<title level="a" type="main">Simplifying graph convolutional networks</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Felix</forename><surname>Wu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">H</forename><surname>Amauri</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Tianyi</forename><surname>Souza</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Christopher</forename><surname>Zhang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Tao</forename><surname>Fifty</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Kilian</forename><forename type="middle">Q</forename><surname>Yu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Weinberger</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">ICML</title>
<imprint>
<date type="published" when="2019" />
</imprint>
</monogr>
<note type="raw_reference">Felix Wu, Amauri H. Souza, Tianyi Zhang, Christo- pher Fifty, Tao Yu, and Kilian Q. Weinberger. 2019a. Simplifying graph convolutional networks. In ICML.</note>
</biblStruct>
<biblStruct xml:id="b51">
<analytic>
<title level="a" type="main">Word Mover&apos;s Embedding: From Word2Vec to Document Embedding</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Lingfei</forename><surname>Wu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Ian</forename><surname>En-Hsu Yen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Kun</forename><surname>Xu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Fangli</forename><surname>Xu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Avinash</forename><surname>Balakrishnan</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Pin-Yu</forename><surname>Chen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Pradeep</forename><surname>Ravikumar</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Michael</forename><forename type="middle">J</forename><surname>Witbrock</surname></persName>
</author>
</analytic>
<monogr>
<title level="m">EMNLP</title>
<imprint>
<date type="published" when="2018" />
</imprint>
</monogr>
<note type="raw_reference">Lingfei Wu, Ian En-Hsu Yen, Kun Xu, Fangli Xu, Avinash Balakrishnan, Pin-Yu Chen, Pradeep Ravikumar, and Michael J Witbrock. 2018. Word Mover&apos;s Embedding: From Word2Vec to Document Embedding. In EMNLP.</note>
</biblStruct>
<biblStruct xml:id="b52">
<analytic>
<title level="a" type="main">Google&apos;s neural machine translation system: Bridging the gap between human and machine translation</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Yonghui</forename><surname>Wu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Mike</forename><surname>Schuster</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Zhifeng</forename><surname>Chen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">V</forename><surname>Quoc</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Mohammad</forename><surname>Le</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Wolfgang</forename><surname>Norouzi</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Maxim</forename><surname>Macherey</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Yuan</forename><surname>Krikun</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Qin</forename><surname>Cao</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Klaus</forename><surname>Gao</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Macherey</surname></persName>
</author>
<idno>abs/1609.08144</idno>
</analytic>
<monogr>
<title level="j">ArXiv</title>
<imprint>
<date type="published" when="2016" />
</imprint>
</monogr>
<note type="raw_reference">Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google&apos;s neural machine translation system: Bridging the gap between human and machine translation. ArXiv, abs/1609.08144.</note>
</biblStruct>
<biblStruct xml:id="b53">
<analytic>
<title/>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Zonghan</forename><surname>Wu</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Shirui</forename><surname>Pan</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Fengwen</forename><surname>Chen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Guodong</forename><surname>Long</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Chengqi</forename><surname>Zhang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Philip S</forename><surname>Yu</surname></persName>
</author>
<idno>abs/1901.00596</idno>
</analytic>
<monogr>
<title level="j">A Comprehensive Survey on Graph Neural Networks. ArXiv</title>
<imprint>
<date type="published" when="2019" />
</imprint>
</monogr>
<note type="raw_reference">Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S Yu. 2019b. A Comprehensive Survey on Graph Neural Networks. ArXiv, abs/1901.00596.</note>
</biblStruct>
<biblStruct xml:id="b54">
<analytic>
<title level="a" type="main">Xlnet: Generalized autoregressive pretraining for language understanding</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Zhilin</forename><surname>Yang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Zihang</forename><surname>Dai</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Yiming</forename><surname>Yang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Jaime</forename><forename type="middle">G</forename><surname>Carbonell</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Ruslan</forename><surname>Salakhutdinov</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">V</forename><surname>Quoc</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><surname>Le</surname></persName>
</author>
<idno>abs/1906.08237</idno>
</analytic>
<monogr>
<title level="j">ArXiv</title>
<imprint>
<date type="published" when="2019" />
</imprint>
</monogr>
<note type="raw_reference">Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Car- bonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. ArXiv, abs/1906.08237.</note>
</biblStruct>
<biblStruct xml:id="b55">
<analytic>
<title level="a" type="main">From neural re-ranking to neural ranking: Learning a sparse representation for inverted indexing</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Hamed</forename><surname>Zamani</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Mostafa</forename><surname>Dehghani</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">W</forename><forename type="middle">Bruce</forename><surname>Croft</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Erik</forename><forename type="middle">G</forename></persName>
</author>
</analytic>
<monogr>
<title level="m">CIKM</title>
<imprint>
<date type="published" when="2018" />
</imprint>
</monogr>
<note>Learned-Miller, and Jaap Kamps</note>
<note type="raw_reference">Hamed Zamani, Mostafa Dehghani, W. Bruce Croft, Erik G. Learned-Miller, and Jaap Kamps. 2018. From neural re-ranking to neural ranking: Learn- ing a sparse representation for inverted indexing. In CIKM.</note>
</biblStruct>
<biblStruct xml:id="b56">
<monogr>
<title level="m" type="main">Diffusion maps for textual network embedding</title>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Xinyuan</forename><surname>Zhang</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Yitong</forename><surname>Li</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Dinghan</forename><surname>Shen</surname></persName>
</author>
<author>
<persName xmlns="http://www.tei-c.org/ns/1.0"><forename type="first">Lawrence</forename><surname>Carin</surname></persName>
</author>
<imprint>
<date type="published" when="2018" />
</imprint>
</monogr>
<note>In NeurIPS</note>
<note type="raw_reference">Xinyuan Zhang, Yitong Li, Dinghan Shen, and Lawrence Carin. 2018. Diffusion maps for textual network embedding. In NeurIPS.</note>
</biblStruct>
</listBibl>
</div>
</back>
</text>
</TEI>