Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions

Dongyeop Kang, Andrew Head, Risham Sidhu, Kyle Lo, Daniel S. Weld, and Marti A. Hearst
EMNLP, Scholarly Document Processing (SDP) workshop   2020

Tl;DR: The task of definition detection is important for scholarly papers, because papers often make use of technical terminology that may be unfamiliar to readers. We develop a new definition detection system, HEDDEx, that utilizes syntactic features, transformer encoders, and heuristic filters, and evalu... ate it on a standard sentence-level benchmark.

  • s2 View and cite on Semantic Scholar
  • PDF View PDF

Extracting a Knowledge Base of Mechanisms from COVID-19 Papers

Tom Hope*, Aida Amini*, David Wadden, Madeleine van Zuylen, Eric Horvitz, Roy Schwartz, and Hannaneh Hajishirzi
preprint  2020

Tl;DR: To navigate the collection of COVID19 papers from different domains, we present a KB of mechanisms relating to COVID19, to support domain-agnostic search and exploration of general activities, functions, influences and associations in these papers.

SUPP.AI: Finding Evidence for Supplement-Drug Interactions

Lucy Lu Wang, Oyvind Tafjord, Sarthak Jain, Arman Cohan, Sam Skjonsbert, Carissa Schoenick, Nick Botner, and Waleed Ammar
ACL Demo  2020

Tl;DR: We extracted evidence of supplement-drug interactions from 22M scientific articles. Using transfer learning approaches, we fine-tune the BERT language model using labeled evidence of drug-drug interactions, and use the resulting model to detect supplement interaction evidence. We surface these inter... actions on a demo website, SUPP.AI, and provide the dataset and model for use by other researchers.

SciREX: A Challenge Dataset for Document-Level Information Extraction

Sarthak Jain, Madeleine van Zuylen, Hanna Hajishirzi, and Iz Beltagy
ACL  2020

Tl;DR: We introduce a new dataset called SciREX that requires understanding of the whole document to annotate entities, and their document-level relationships that usually span beyond sentences or even sections.

High-Precision Extraction of Emerging Concepts from Scientific Literature

Daniel King, Doug Downey, and Daniel S. Weld
SIGIR  2020

Tl;DR: A novel, unsupervised method for extracting scientific concepts from papers, based on the intuition that each scientific concept is likely to be introduced or popularized by a single paper that is disproportionately cited by subsequent papers mentioning the concept.

Fact or Fiction: Verifying Scientific Claims

David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi
preprint  2020

Tl;DR: we construct SciFact, a dataset of 1.4K expert-written scientific claims paired with evidence-containing abstracts annotated with labels and rationales. We develop baseline models for SciFact, and demonstrate that these models benefit from combined training on a large dataset of claims about Wikiped... ia articles, together with the new SciFact data.

ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing

Mark Neumann, Daniel King, Iz Beltagy, and Waleed Ammar
BioNLP  2019

Tl;DR: We created a spaCy pipeline for biomedical and scientific text processing. The core models include dependency parsing, part of speech tagging, and named entity recognition models retrained on general biomedical text, and custom tokenization. We also release four specific named entity recognition mod... els for more focused biomedical entity recognition. Additionally, we include optional components for abbreviation resolution, simple entity linking to UMLS, and sentence splitting.

Combining Distant and Direct Supervision for Neural Relation Extraction (2019)

Iz Beltagy, Kyle Lo, and Waleed Ammar
NAACL  2019

Tl;DR: We improve relation extraction models by combining the distant supervision data with an additional directly-supervised data, which we use as supervision for the attention weights. We find that joint training on both types of supervision leads to a better model because it improves the model's ability... to identify noisy sentences.

Structural Scaffolds for Citation Intent Classification in Scientific Publications

Arman Cohan, Waleed Ammar, Madeleine van Zuylen, and Field Cady
NAACL  2019

Tl;DR: We propose a new scaffolding model for classifying citation intents using two auxiliary tasks to handle low-resouce training data. We additionally propose SciCite, a multi-domain dataset of citation intents.

GrapAL: Querying Semantic Scholar's Literature Graph

Christine Betts, Joanna L. Power, and Waleed Ammar
NAACL, Demo   2019

Tl;DR: We introduce GrapAL (Graph database of Academic Literature), a versatile tool for exploring and investigating scientific literature which satisfies a variety of use cases and information needs requested by researchers.

  • s2 View and cite on Semantic Scholar
  • PDF View PDF

Construction of the Literature Graph in Semantic Scholar

Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, and et al.
NAACL  2018

Tl;DR: This paper introduces the Semantic Scholar literature graph, consisting of more than 280M nodes, representing papers, authors, entities and various interactions between them. [acknowledgements: TAGME entity linker (]

  • s2 View and cite on Semantic Scholar
  • PDF View PDF

Extracting Scientific Figures with Distantly Supervised Neural Networks

Noah Siegel, Nicholas Lourie, Russell Power, and Waleed Ammar
JCDL  2018

Tl;DR: In this paper, we induce high-quality training labels for the task of figure extraction in a large number of scientific documents, with no human intervention.

Semi-supervised End-to-End Entity and Relation Extraction

Waleed Ammar, Mathew E. Peters, Chandra Bhagavatula, and Russell Power
SemEval  2017

Tl;DR: Our submission to SemEval 2017 Task 10 (ScienceIE) shared task placed 1st in end-to-end entity and relation extraction and 2nd in relation-only extraction. We find that pretraining neural forward and backward language model produces word representations that can drastically improve model performanc... e. This finding resulted in the later development of ELMo contextualized embeddings.

  • s2 View and cite on Semantic Scholar
  • PDF View PDF

Identifying Meaningful Citations

Marco Valenzuela, Vu Ha, and Oren Etzioni
AAAI, Workshop   2015

Tl;DR: We introduce the novel task of identifying important citations in scholarly literature, i.e., citations that indicate that the cited work is used or extended in the new effort. We believe this task is a crucial component in algorithms that detect and follow research topics and in methods that measur... e the quality of publications.

  • s2 View and cite on Semantic Scholar
  • PDF View PDF