We address the e-rulemaking problem of categorizing public comments according to the issues that they address. In contrast to previous text categorization research in e-rulemaking [5, 6], and in an attempt to more closely duplicate the comment analysis process in federal agencies, we employ a set of rule-specific categories, each of which corresponds to a significant issue raised in the comments. We describe the creation of a corpus to support this text categorization task and report interannotator agreement results for a group of six annotators. We outline those features of the task and of the e-rulemaking context that engender both a non-traditional text categorization corpus and a correspondingly difficult machine learning problem. Finally, we investigate the application of standard and hierarchical text categorization techniques to the e-rulemaking data sets and find that automatic categorization methods show promise as a means of reducing the manual labor required to analyze large comment sets: the automatic annotation methods approach the performance of human annotators for both flat and hierarchical issue categorization.
Cardie, Claire; Farina, Cynthia R.; Aijaz, Adil; Rawding, Matt; and Purpura, Stephen, "A Study in Rule-Specific Issue Categorization for e-Rulemaking" (2008). Cornell e-Rulemaking Initiative Publications. Paper 4.