Jared Bernstein, Alexei V. Ivanov, and Elizabeth Rosenfeld

Benchmarking Automated Text Correction Services

Abstract: We compared the performance of two automated grammar checkers on a small random sample of student texts to select the better text checker to incorporate in a product to tutor American students, age 12–16, in essay writing. We studied two competing error detection services that located and labeled errors with claimed accuracy well above “90%”. However, performance measurement with reference to a small set of essays (double-annotated for text errors) found that both services operated at similar low accuracies (F1 values in the range of 0.25 to 0.35) when analyzing either sentence- or word-level errors. Several findings emerged. ...

Get Natural Language Processing and Cognitive Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.