Earlier this year, the Rand Corp. published a paper titled “Where the Money Goes; Understanding Litigant Expenditures for Producing Electronic Discovery” (Pace, N., & Zakaras, L. . Santa Monica: Rand Corporation). As stated on its website, the Rand Corporation is a “nonprofit institution that helps improve policy and decision making through research and analysis.”
The researchers stated that the cost of discovery has steadily increased due to the exploding deluge of electronically stored information. I am certainly not going to disagree with that statement. However, one could argue that the increasing cost of discovery more directly results from the availability of inexpensive and ubiquitous storage coupled with ineffective document management practices at a corporate level.
The paper uses case studies from eight large organizations in a quest to identify: 1) the costs associated with each phase of e-discovery, 2) how these costs are distributed across sources of labor, 3) a means that costs can be reduced without sacrificing quality, and 4) what litigants perceive to the be the key challenges related to ESI preservation.
Each corporation chose a minimum of five cases in which they reviewed and produced documents to an opposing party. Additionally, the researchers interviewed key legal personnel from the corporations in an effort to understand how these organizations respond to and process electronic discovery requests. The results were informative and confirmation for some, including me, that were already well aware of the most expensive part of the discovery process.
Phases of e-discovery
Researchers separated the discovery process into three distinct parts: collection, processing and review.
Collection comprises identifying sources of ESI and the act of gathering the material for later processing and possible review.
Processing is culling the pool of ESI using technology and software. It is basically reducing 300,000 collected emails to an acceptable level for review. That process may include using keywords, identifying and removing duplicate files or using more advanced analytic techniques to identify obvious non-relevant files.
Review is putting the documents through a process (usually involving human eyes) to determine relevance, confidentiality or privilege. You have a bunch of documents (paper and electronic) that may need to be produced in response to a document request, now go find em’ … that is review.
If asked what part is the most expensive, many litigators might impulsively blurt out collection or processing. In my opinion that is because those are the parts that they have the least familiarity with when compared to traditional review, which they know well.
The most expensive
Surprise — neither collection nor processing were the most expensive. Review was the winner and by a wide margin. That is right, for every dollar spent on discovery, the researchers reported that 73 percent of that dollar went towards review while 19 percent and 8 percent went to processing and collection, respectively. The case studies also uncovered that 70 percent of total costs were attributable to outside counsel compared to 26 percent from e-discovery vendors.
Some possible remedies
If you recall, the paper not only set out to identify the source of costs but how to contain them without sacrificing quality:
“With more than half of our cases reporting that review consumed at least 70 percent of the total costs of document production, this single area is an obvious target for reducing e-discovery expenditures.”
A document review has traditionally been an “eyes-on” process. Large reviews may have hundreds of attorneys scouring documents for months on end. The documents are usually stored in some sophisticated application in the mysterious “cloud”. OK, admittedly that is not your everyday case, but even a document review with a few attorneys scouring over a few thousand emails adds up to many thousands of dollars very quickly.
How do we manage the “eyes-on” process? How do we get reviewers to review faster and more accurately? There is a limit to this approach even with advances in technology, bigger monitors, and better software. Review companies (companies that provide on-demand attorneys to review documents) claim that technologies such as near-duplicate detection, clustering and email threading allow for the rate of review to exceed 400 documents per hour.
Even if these claims are true there is a limit to what can be done at a per person level.
Is that it then? Can we, as humans, do no better than 400 documents per hour?
The researchers stated, “We believe that one way to achieve substantial savings in producing massive amounts of electronic information would be to let computers do the heavy lifting for review.” Thank you, Konrad Zuse.
The paper discusses how predictive coding, which is a type of “computer-categorized review application that classifies documents according to how well they match the concepts and terms in sample documents”, can potentially reduce the 73 percent spent on document review. Sounds like magic, right?
At a 30,000 foot level the predictive coding process requires a subject matter expert to go through a sample set of documents. The sample documents are tagged either relevant or not-relevant. The predictive coding software then uses the YES/NO decision on those exemplar documents to rank the rest of the documents in priority order from most likely relevant to least likely.
For example, if you have a population of 500,000 emails one could possibly review 2,000 documents and the predictive coding software would then go through the other 498,000 and identify the other potentially relevant documents (documents similar to the ones tagged as relevant).
Assume that set of documents pulled from the larger population is 50,000 documents (10 percent of the population). One must conduct an eyes-on review of those documents for privilege, confidentiality and confirm relevance. Clearly it is much less expensive to go through 50,000 or even 200,000 documents than 500,000.
Judicial acceptance and the future
Predictive coding is not main-stream (yet), but it is gaining acceptance as a legitimate method for plowing through volumes of ESI. The support is not only from those preaching of its value; it is gaining judicial support as well. Look no further than the recent Da Silva Moore case in the SDNY. In that matter, the defendant proposed the use of predictive coding and Judge Andrew Peck ordered the parties to submit protocols for using predictive coding technology in an effort to reduce the discovery burdens.
It is only a matter of time until predictive coding is playing a role in nearly every ESI request, big or small. Software vendors are already starting to embed it and similar technologies into their products. So yes, we can do better than 400 documents per hour with a little help from our friend, PC. That’s predictive coding, not personal computer.
To learn more, the Monroe County Bar Association will be hosting a CLE focused on Predictive Coding on Dec. 12. You can register on their website at www.mcba.org.
Peter Coons is a senior vice president at D4, providing eDiscovery consulting services to clients. He is an EnCase Certified Examiner, an AccessData Certified Examiner, a Certified Computer Examiner (computer forensic certificates) and is a member of the High Technology Crime Investigation Association, the professional organization for people involved in computer forensics.