Tl;DR: The task of definition detection is important for scholarly papers, because papers often make use of technical terminology that may be unfamiliar to readers. We develop a new definition detection system, HEDDEx, that utilizes syntactic features, transformer encoders, and heuristic filters, and evalu... ate it on a standard sentence-level benchmark.
Tl;DR: To navigate the collection of COVID19 papers from different domains, we present a KB of mechanisms relating to COVID19, to support domain-agnostic search and exploration of general activities, functions, influences and associations in these papers.
Tl;DR: We extracted evidence of supplement-drug interactions from 22M scientific articles. Using transfer learning approaches, we fine-tune the BERT language model using labeled evidence of drug-drug interactions, and use the resulting model to detect supplement interaction evidence. We surface these inter... actions on a demo website, SUPP.AI, and provide the dataset and model for use by other researchers.
Tl;DR: We introduce a new dataset called SciREX that requires understanding of the whole document to annotate entities, and their document-level relationships that usually span beyond sentences or even sections.
Tl;DR: A novel, unsupervised method for extracting scientific concepts from papers, based on the intuition that each scientific concept is likely to be introduced or popularized by a single paper that is disproportionately cited by subsequent papers mentioning the concept.
Tl;DR: we construct SciFact, a dataset of 1.4K expert-written scientific claims paired with evidence-containing abstracts annotated with labels and rationales. We develop baseline models for SciFact, and demonstrate that these models benefit from combined training on a large dataset of claims about Wikiped... ia articles, together with the new SciFact data.
Tl;DR: We created a spaCy pipeline for biomedical and scientific text processing. The core models include dependency parsing, part of speech tagging, and named entity recognition models retrained on general biomedical text, and custom tokenization. We also release four specific named entity recognition mod... els for more focused biomedical entity recognition. Additionally, we include optional components for abbreviation resolution, simple entity linking to UMLS, and sentence splitting.
Tl;DR: We improve relation extraction models by combining the distant supervision data with an additional directly-supervised data, which we use as supervision for the attention weights. We find that joint training on both types of supervision leads to a better model because it improves the model's ability... to identify noisy sentences.
Tl;DR: We propose a new scaffolding model for classifying citation intents using two auxiliary tasks to handle low-resouce training data. We additionally propose SciCite, a multi-domain dataset of citation intents.
Tl;DR: We introduce GrapAL (Graph database of Academic Literature), a versatile tool for exploring and investigating scientific literature which satisfies a variety of use cases and information needs requested by researchers.
Tl;DR: This paper introduces the Semantic Scholar literature graph, consisting of more than 280M nodes, representing papers, authors, entities and various interactions between them. [acknowledgements: TAGME entity linker (https://tagme.d4science.org/)]
Tl;DR: In this paper, we induce high-quality training labels for the task of figure extraction in a large number of scientific documents, with no human intervention.
Tl;DR: Our submission to SemEval 2017 Task 10 (ScienceIE) shared task placed 1st in end-to-end entity and relation extraction and 2nd in relation-only extraction. We find that pretraining neural forward and backward language model produces word representations that can drastically improve model performanc... e. This finding resulted in the later development of ELMo contextualized embeddings.
Tl;DR: We introduce the novel task of identifying important citations in scholarly literature, i.e., citations that indicate that the cited work is used or extended in the new effort. We believe this task is a crucial component in algorithms that detect and follow research topics and in methods that measur... e the quality of publications.