Many information retrieval tasks require viewing documents in some manner, whether this is to view information in context or to provide annotations for some downstream task (e.g., evaluation or system training). Building a high-quality document viewer often exceeds the resources of many researchers and so, in this paper, we describe the design and architecture of our new open-source document viewer, Spectator. In particular, we provide a look into the algorithmic details of how Spectator accomplishes tasks like mapping annotations back to the canonical document. Moreover, we provide a sampling of the use cases that we envision for Spectator, potential future additions depending on community need and support, and highlight situations where Spectator may not be a good fit. Furthermore, we provide a brief description of the sample application that we bundle with Spectator to demonstrate how one might use it within the context of a larger system.
Back to
Science
Spectator: An Open Source Document Viewer
Share this article:
Read more papers

The Utility of Context When Extracting Entities From Legal Documents
When reviewing documents for legal tasks such as Mergers and Acquisitions, granular information need to be identified and extracted. Inspired by previous work in Named Entity Recognition, we investigate how NER techniques can be leveraged to aid lawyers in this review process.

Redesigning Document Viewer for Legal Documents
This paper reports on the user-focused redesign of our document viewer that is used by clients to review documents and train machine learning algorithms to find pertinent information from these contracts.