As professional users interact with more AI-enabled tools, it has become increasingly important to understand how their work and behaviour are affected by such tools. In this paper, we present the insights that we have gleaned from a qualitative user study conducted with nine of our software’s users who are all legal professionals. We find that as our participants become more accustomed to the system they begin to subtly alter their behaviours and interactions with the system. Using their shared experiences, we distill these into insights that may inform the design of similar systems.
Read the Paperarrow_back Back to
Science
Interested in hearing more from Zuva?
Read more papers
Science
From Bubbles to Lists: Designing Clustering for Due Diligence
In due diligence, lawyers are tasked with reviewing a large set of legal documents to identify documents and portions thereof that may be problematic for a merger or acquisition. In an effort to aid users to review more efficiently, we sought to determine how document-level clustering may help users of a due diligence system during their workflow.
Following an iterative design methodology, we conducted several user studies with different versions of a document-level clustering feature consisting of three distinct phases and 27 users. We found that the interface should adapt to a user’s understanding of what “similar documents” means so that trust can be established in the feature. Furthermore, the ability to negotiate with the underlying algorithm is facilitated by the establishment of trust. Finally, while the usage of this feature may be influenced by a user’s role, it remains primarily a project management tool.
Science
Automatic and Semi-Automatic Document Selection for Technology-Assisted Review
In the TREC Total Recall Track (2015-2016), participating teams could employ either fully automatic or human-assisted (“semi-automatic”) methods to select documents for relevance assessment by a simulated human reviewer. According to the TREC 2016 evaluation, the fully automatic baseline method achieved a recall-precision breakeven (“R-precision”) score of 0.71, while the two semi-automatic efforts achieved scores of 0.67 and 0.51. In this work, we investigate the extent to which the observed effectiveness of the different methods may be confounded by chance, by inconsistent adherence to the Track guidelines, by selection bias in the evaluation method, or by discordant relevance assessments. We find no evidence that any of these factors could yield relative effectiveness scores inconsistent with the official TREC 2016 ranking.