NORTHAMPTON, Mass. – A recent study by computer scientists at Smith College and the University of Massachusetts, Amherst, demonstrated a new electronic method to transcribe handwritten historical documents better than any other available method.
Currently, computer recognition programs tend to work well on machine-printed text and on limited handwriting tasks, such as reading postal addresses for which there are a finite number possibilities, according to Nicholas R. Howe, Smith associate professor. Only a few programs have demonstrated any success in translating cursive script common to historical documents.
“Reliable recognition of texts from historical collections is often infeasible with current technology, and yet those texts hold the potential to open new worlds to scholarship,” noted Howe, who collaborated with Shaolei Feng, a graduate student, and R. Manmatha, research associate professor, at UMass.
“Our method improves the best previously reported recognition rates,” he said.
In their paper, “Finding words in alphabet soup,” the researchers documented an 85 percent success rate for the “flexible inference model” – a model that identifies the most probable sequence of letters in a portion of hand written text.
The technology recognized lettersusing a form of object recognition software that is similar to that which is commonly installed in digital cameras. (In cameras, the software identifies faces that appear within a frame and pulls them into focus.)
Researchers tested the software on twenty pages of George Washington’s letters, correspondence written in longhand script by several secretaries, and a medieval Latin text that have been tested using other handwriting software.
While the study breaks new ground in the development of software that translates handwriting, with all its imperfections, more research needs to be done, said Howe.
“Exploration of the possibilities has only begun,” he said. No rush; historical documents, by definition, will surely still be around.
The study was supported with funding from the Center for Intelligent Information Retrieval, Google, and the National Science Foundation.
-30-
|