For years, litigators cited a lack of judicial guidance as their primary objection to using predictive coding technology. The objection is based on the notion that even though predictive coding technology promises to significantly reduce the time, cost, and error rate of pure human document review during discovery, few attorneys want to be the first to defend the use of technology they don’t understand. It is this fear of what some characterize as “black box technology” that has led many outside counsel to caution corporate clients to take a “wait and see” approach, in spite of continued pressure from those same clients to decrease document review costs.

In 2012, the wait for judicial guidance ended abruptly when not one, but three new predictive coding cases surfaced: Da Silva Moore v. Publicis Groupe; Kleen Products, LLC v. Packaging Corporation of America; and Global Aerospace Inc., v. Landow Aviation, LLP. In Da Silva Moore, Judge Andrew Peck even approved the use of predictive coding technology in “appropriate cases,” leaving some to believe the courthouse doors had been thrown open to unbridled use of the technology. Somehow, within weeks of the decision, the wheels of the predictive coding freight train locked up, leaving many wondering whether or not these new predictive coding cases provided clarity or merely added more confusion.

This article explains how predictive coding technology works, explores recent predictive coding cases, and provides a roadmap for understanding what must happen for predictive coding to regain momentum and become mainstream in the legal field.

What is Predictive Coding?

Predictive coding is a type of machine-learning technology that enables a computer to automatically predict how documents should be classified based on limited human input. The technology is exciting for corporate legal departments attempting to manage skyrocketing litigation budgets, because the ability to automatically rank and then “code” or “tag” electronic documents, based on criteria such as “relevance” and “privilege” has the potential to save companies millions in e-discovery costs. The savings are directly attributable to the fact that fewer dollars are spent paying lawyers to review every document before documents are produced to outside parties during discovery.

The main advantage for corporations is that a fraction of documents are reviewed which results in a fraction of the review costs. The process begins by feeding “relevance” and “privilege” decisions made by attorneys about a small number of case documents called a “seed set” into a computer system. The computer then relies on these “training” decisions to create an algorithm that ranks and codes the remaining documents automatically. The attorneys can then evaluate the accuracy of the computer’s automated decisions. If the accuracy of the computer’s decisions is insufficient, then the attorneys can conduct further training until the required accuracy levels are achieved.

Contrary to industry legend, most organizations do not simply produce the top-ranked documents to opposing parties without further review. Typically, only the top-ranked documents identified by the computer as “relevant” are manually reviewed for accuracy prior to production, while lower-ranked documents may not be reviewed at all. This approach not only saves money—if performed correctly, predictive coding is also more accurate than pure human review. Given these benefits, many wonder why adoption of predictive coding technology has been slow.

Do Recent Predictive Coding Cases Provide Clarity or Add Confusion?