TATTLER is the foundation for the forensically-tuned modules in ALIAS. TATTLER is our text analysis toolkit– it allows a text to tell us basic facts about the textual structure, meaning and graphical representation. TATTLER includes both an industry-grade parser as well as specific algorithms for language used in social media, email, blogs –the kind of language that is unedited and natural.
When Carole Chaski started building ALIAS in 1994, using forensically realistic data, she immediately realized that the available text analysis tools in academic computational linguistics were simply not able to handle the data accurately. The academic projects had all focused on highly edited, grammatical and clean data from sources such as the Wall Street Journals and other news outlets. These parsers were really excellent at handling this one style of language use. But the kind of textual data that Chaski was given in forensic linguistic cases was unedited, “uncorrected,” and messy data. The style of language use that forensic linguistic evidence exhibited, even before the explosion of the Internet and social media, was much less formal and far more naturalistic than most academic and industry parsers could handle. Chaski developed her own parser to handle this kind of data, using and extending techniques of natural language processing that she had learned as a computational linguist.
In fact, Chaski had been prepared for this task by two experiences in her life: first, her experiences teaching English composition at dialectally-varied middle school, high school and universities taught her about the wide range of dialect variation in English, and second, her experiences in computational linguistics as a graduate student consultant. As a teacher, Chaski was used to handling dialectal differences in English and “ungrammaticality” in written English. As a graduate student, she had helped industry linguists design a grammar check system, spell check system, and a parser that relaxed rules to enable it to handle what she began to call “really natural natural language” –not the kind of edited language that parsers were traditionally built to handle, but the kind of language that people actually produce in a natural setting for informal or educational communication.
Currently, TATTLER uses an industry-grade multi-lingual text analytics toolkit Rosette by Basis Technology. The Rosette analysis is supplemented by the original ALIAS parser. In this way, TATTLER can be both multi-lingual and responsive to really natural natural language. For forensic linguistics, this kind of partnership between ALIAS Technology and Basis Technology is essential. Now we can really hear that the linguistic data is telling us, in many languages and in many styles.