C-LARA

Overview Use C-LARA Example content Ongoing Activities Contributors Documents Technical issues Blog and Discord Contact

Technical issues

C-LARA is implemented in Python using the Django framework, with Django-Q for asynchrony. The code is available from the public GitHub repository.

The repository contains, in descending order of size, Python, HTML template, prompt template and example, documentation, JavaScript and CSS files. It currently totals about 26K lines. All the material has been created by ChatGPT-4 working in close collaboration with Manny Rayner, with the AI responsible for about 90% of the code and the greater part of the software design.

Further details about the structure of the repository are available in the README file.

Error rates for writing and annotation

Our evaluations show that C-LARA’s performance varies a great deal between languages. For well-resourced languages given a high priority by OpenAI, like English and Mandarin, C-LARA can use the underlying ChatGPT-4 functionality to write entertaining texts on a wide variety of subjects, with an error rate of well under 1%. Error rates for glossing and lemma tagging for these languages are typically in the mid single digits, with errors most commonly being due to incorrect treatment of multi-words (phrases). Performance on smaller and less highly prioritised languages is substantially worse. The platform offers many options for tuning language-specific performance.