As Information Retrieval researchers, our implicit goal is to help people find what they want to see. In this talk, I argue that IR researchers must start to formalize their thinking about preventing people from finding things that they should not be able to see. One such application is incident upon searching for evidence during a civil litigation. In 2006, the Federal Rules of Civil Procedure were amended to make it clear that all forms of electronically stored information, including emails, were within the scope of evidence that could be requested from either counterparty in a lawsuit. Thus was born the multi-billion dollar industry that has come to be called electronic discovery or e-discovery.
The high cost of e-discovery results from two main factors: (1) Because the standard for relevance is expansive, large numbers of relevant documents could be found, and (2) Producing parties can assert privilege (to foster socially desirable outcomes such as open communication between attorneys and their clients) on some relevant documents to withhold confidential content. Thus, in practice, electronic evidence that are found to be responsive to a production request are subjected to an exhaustive manual review for privilege in order to be sure that material that has to be withheld is not inadvertently revealed. Although the budgetary constraints on relevance review can be achieved using automation to some degree, attorneys have been hesitant to adopt technology for the high stake privilege review process.
This work-in-progress talk focuses on introducing a framework that encourages the use of automation during e-discovery (to support both relevance and privilege review). The objective of our framework is; (1) to maximize the benefits obtained by the manual review process (2) to optimize the cost of e-discovery process. We propose to build our core framework by designing what we call hybrid model which is neither fully automated nor fully manual. We represent the hybrid model, as a cost-sensitive semi-automated classification problem. The cost-sensitivity is reflected as a part of document ranking process, which is dependent on the different types of classification errors.
Jyothi Vinjumur is a PhD candidate at the University of Maryland, College Park, in the College of Information Studies. Her PhD advisor is Dr. Douglas Oard. Her PhD dissertation involves the use of Information Retrieval, Information Visualization and Applied Machine Learning Techniques to support end users (eg. Lawyers) seek the information they want to find in a cost-effective manner. Additional information is available at http://terpconnect.umd.edu/~jyothikv/
1:00pm to 3:00pm