Alconost MQM Annotation Tool

What is Alconost MQM Annotation Tool?

Alconost MQM Annotation Tool is a web-based tool for translation quality evaluation using MQM (Multidimensional Quality Metrics), an error-based framework used in WMT shared tasks and industry benchmarks. It supports human-in-the-loop workflows for annotating translation errors against MQM guidelines, as well as system-level and segment-level analysis of those annotations.

The tool lets you upload translation outputs, mark and categorize errors by MQM taxonomy and severity, and export structured results for downstream evaluation. It also converts MQM annotations into a normalized Quality Score (%) intended to be comparable across languages by accounting for translation length using XLM-R SentencePiece tokens.

Key Features

MQM-guideline error annotation for translation outputs: annotate explicit error categories and severities instead of using only holistic scores.
MQM taxonomy coverage with granular categories and severities: includes categories such as Accuracy, Fluency, and Terminology, with severity levels including Minor, Major, and Critical.
Structured exports for analysis: exports annotated data in formats such as TSV/CSV (tabular) and JSONL (line-delimited JSON) to support system- and segment-level reporting.
Reporting & analytics: includes project scoring and insight views such as error distribution charts and session timing estimates.
Automated scoring based on token-normalized penalties: calculates total penalty as Σ(error count × error weight) and derives Quality Score (%) from total token count; pass/fail threshold and error weights are configurable.
API integration for import/export workflows: provides a REST API to create projects, import content, and export annotated results (JSONL, TSV, CSV).

How to Use Alconost MQM Annotation Tool

Create or start an MQM annotation project in the tool.
Upload your data containing source and target translations (and optionally segment IDs, system IDs, and document IDs).
Annotate errors using the MQM categories and severity levels. To mark a segment as checked with no errors, add a “no-error” annotation.
Review project reporting (including scoring and error distributions) and export the annotated data for analysis.

For automation, use the provided REST API to import segments programmatically and to export results in JSONL, TSV, or CSV.

Use Cases

Human translation quality evaluation: linguists annotate specific MQM error types (e.g., Accuracy/Addition, Fluency/Grammar) to produce an auditable error profile.
Machine translation system comparison: multiple system outputs can be annotated and compared using the normalized Quality Score and error distribution reporting.
LLM or neural MT evaluation workflows: annotate translation outputs from neural/LLM-based MT using the same MQM taxonomy to keep evaluations consistent.
Regression testing and error analysis: track how specific error categories change across model versions by exporting structured annotations.
Vendor or internal QA review with blind annotation: have an annotator complete MQM error annotation to create an objective basis for review of translation quality.

FAQ

What input formats are supported? The tool’s structured format examples include TSV (tabular) and JSONL (line-delimited JSON). It also supports importing CSV/TSV/JSONL and raw JSON via the REST API.

How does the Quality Score (%) work? The tool computes a total penalty from annotated errors using error counts and error weights, then normalizes by the total token count using XLM-R SentencePiece tokens. The default severity weights are Critical: 25, Major: 5, Minor: 1, and the default pass threshold is 99.0% or higher; both pass/fail and weights are adjustable.

How do I record that a segment has no errors? Add an annotation with the category no-error so the segment is counted as checked and correct rather than skipped or pending.

Can I include additional context for annotators? Yes. The context field can be provided to display extra information in the annotation interface (e.g., glossary terms, reference links, style rules).

Can I integrate MQM annotation into an automated workflow? Yes. The tool provides a REST API with an OpenAPI specification for automated import and export of projects and annotated results.

Alternatives

MQM annotation tools (open or self-hosted): if you want a similar MQM taxonomy and annotation workflow but manage infrastructure yourself, open MQM-inspired tools may fit; the main difference is workflow control and setup responsibility.
General-purpose translation error analysis with custom tag sets: spreadsheet- or UI-based tools can support error annotation, but you would need to define your own taxonomy/weighting and scoring logic rather than using an MQM-focused model.
Annotation platforms with export-only pipelines: platforms that support labeling tasks and structured exports can replicate the “human-in-the-loop” part, but they may not provide MQM-specific category/severity structures and token-normalized scoring out of the box.
Quality evaluation dashboards that focus on scoring only: some tools may focus on calculating quality metrics, but without MQM-style categorical error annotation and structured exports they may not support the same granularity for error analysis.