arrow_back Back to Features

Related Document Linking

Noah Waisberg • August 11, 2022 • 4 minute read


Contracts are frequently part of a family of agreements: a base agreement; amendments to it; renewals; sub-agreements; as well as schedules, addendums, and statements of work under it. At Kira Systems, we had one customer contract family that I suspect was 25 documents. Contract AI users can be well served by seeing all these documents together. Related document linking (aka related document grouping) attempts to automatically identify documents that are part of the same family.

Related document grouping is important with many types of agreements. Specifically, it can come up a lot with:

  • Leases
  • Credit agreements
  • Indentures
  • Commercial contracts (license agreements, supply agreements, distribution agreements, services agreements)

Related document grouping is less common with:

  • NDAs
  • Employment agreements

Note that document clustering won’t necessarily help group related documents. Document clustering will show documents that are very similar to each other. Related documents might not necessarily be similar to each other. For example, a supply agreement might be a long and complicated document. An amendment to it might be very simple and all fit on one or two pages. A similarity algorithm would not necessarily put these two documents together, but they are family.

Grouping related documents together:

  • Enables organizing agreements in a logical way.
  • Allows end users to get a fuller picture of the arrangements under a series of related agreements.
  • Is a very important part of enabling reviewers to give one summary of a family of agreements. (Note that UI features are also a key part of enabling this. In our opinion, the UI features for this are most user friendly when related agreements are automatically grouped together.)

If evaluating related document grouping features, you will primarily be judging accuracy.

  • How accurate is the AI at putting related documents together?
    • More specifically, how does the AI put related documents together?
      • This is a pretty interesting question since how one chooses to do this is inherently dependent on the criteria individuals will use. That is, two systems might be similar but also could differ in some ways due to differing understandings of documents and the criteria that indicates linkages (e.g., maybe one system is just trying to group contract families while another might link a set of shareholder meeting minutes that references a particular lease or amendment).
    • Can this be changed if it is inconsistent with how you’ve historically done this?
  • Does the AI perform as well as you need it to on different types of related documents? E.g., perhaps it is good at linking a main agreement together with amendments, but not so good at identifying SOWs.

With any accuracy evaluation, it is important to test using documents that approximate what you expect your system to face in the real world.

Two limitations to note with related document grouping:

  • You need to store documents in the system for related document grouping to work, as with clustering. In contrast to the technology underlying Contracts AI classification and extraction features—which can ingest, then almost immediately delete documents as they are processed—an AI needs to keep [some details of] documents around in order to identify related ones. Keeping documents around in an additional system (like an Embeddable Contracts AI) is less attractive than immediately deleting them—since this entails additional storage, as well as the security risk of having documents stored—but is what it is for this feature to work.
    • That said, these negatives can be mitigated by loading potentially related documents into the Embeddable Contracts AI relatively close together in time, and then deleting them soon thereafter.
  • Wouldn’t it be great if software could take an agreement and a series of amendments to it, and create an amended and restated agreement out of them? Seriously! While this isn’t really related document grouping, sometimes this similar problem comes up while discussing the topic. Sadly, depending on how the amendments are done, this can be a very tricky task for software. Specifically, amendments that run something like “the interest rate in section 8(b)(ii) is hereby changed to SOFR plus 5%” are very hard for today’s AI to correctly place, requiring a lot of parts to work correctly to be viable. So, while a feature that did this would be great, we are not optimistic that a good version of it will exist in the imminent future.