Contracts are generally in the form of mostly unstructured text. The core function of an Embeddable Contracts AI is converting that unstructured text into semi-structured (e.g., provisions) or structured (e.g., entities, normalized values, “answers”) data. Other Contracts AI features derive from this core. Let’s walk through some different types of data produced. Provision Extraction Provision extraction is the AI finding clauses in a contract, e.g., an agreement’s term, termination, indemnification, exclusivity, tenant’s right to sublet, or default for cross default clause.
Clustering is the grouping of related information together. In general, contract information tends to be grouped along two attributes: Provision-level clustering. Document clustering. Provision-level Clustering In provision-level clustering, similar provisions are grouped together. So, for example, clustering might enable an end user to see change of control clauses in other documents similar to one they had just reviewed. Grouping similar provisions together can be a useful feature because it:
In a contract, defined terms are words given a specific meaning. E.g., “Start Date” means September 1, 2019. “Usage Data” means information reflecting the access or use of the Services by or on behalf of Customer or any User, provided that for clarity Usage Data shall not include any Customer Data or any information that identifies, or reasonably would permit identification of Customer Data. Defined term detection is tech that finds defined terms in contracts.
Document comparison shows differences between one document and another (or other_s_, in the case of multiple document comparison). This can otherwise be known as redlining and blacklining. Think track changes, but (potentially) in situations where track changes wouldn’t work, like where one or more documents are not a Word document (e.g., a PDF or image file), or where you need to compare multiple documents against one base document. Document comparison is useful in multiple situations:
Contracts are frequently part of a family of agreements: a base agreement; amendments to it; renewals; sub-agreements; as well as schedules, addendums, and statements of work under it. At Kira Systems, we had one customer contract family that I suspect was 25 documents. Contract AI users can be well served by seeing all these documents together. Related document linking (aka related document grouping) attempts to automatically identify documents that are part of the same family.
Contract risk scoring assigns a risk grade to agreements (or clauses), based on some predetermined view of what constitutes contract risk. Risk definitions could be determined by a vendor, or they could be custom for an individual organization deploying the technology. Certain clauses - or wordings of clauses - have the potential to be high risk. These include: Restrictive covenants - exclusivity, non-competition, and (less so) non-solicitation. Indemnification. Most favored customer treatment.
Signature (and Other Handwritten Information) Detection and Recognition Signature detection helps determine whether a document has been signed. A related feature identifies whether pages have any handwritten information on them. Recognition refers to actually converting that handwritten information to text (i.e., analogous to OCR on printed text). Signature detection is useful because: Often there can be many draft versions of an agreement floating around, and the signed version will be the one that matters.
Many contracts come in the form of image files - i.e., they are scanned documents. In order to review them, the images need to be converted into (correctly laid out) text. This is where Optical Character Recognition (OCR) comes in. OCR converts images into text. So you can upload a contract (or other document) in the form of an image file into the Embeddable Contracts AI application you’re using, and OCR will convert that file into text, which the Embeddable Contracts AI can process further (e.