The Utility of Context When Extracting Entities From Legal Documents

Jonathan Donnelly and Adam Roegiest • October 2020 • CIKM 2020

Patent

US20110255784A1

When reviewing documents for legal tasks such as Mergers and Acquisitions, granular information (such as start dates and exit clauses) need to be identified and extracted. Inspired by previous work in Named Entity Recognition (NER), we investigate how NER techniques can be leveraged to aid lawyers in this review process. Due to the extremely low prevalence of target information in legal documents, we find that the traditional approach of tagging all sentences in a document is inferior, in both effectiveness and data required to train and predict, to using a first-pass layer to identify sentences that are likely to contain the relevant information and then running the more traditional sentence-level sequence tagging.

Moreover, we find that such entity-level models can be improved by training on a balanced sample of relevant and non-relevant sentences. We additionally describe the use of our system in production and how its usage by clients means that deep learning architectures tend to be cost inefficient, especially with respect to the necessary time to train models.

Read the Paper

The Utility of Context When Extracting Entities From Legal Documents

Patent

Read more papers

Towards Protecting Sensitive Text with Differential Privacy

Spectator: An Open Source Document Viewer