Language Classification

When files are classified using the language endpoint, the predominant language of the file will be classified. Zuva will classify files with languages in Latin and some non-Latin characters.

Important notes when classifying files:

  • When Zuva classifies languages, detection relies in part on the OCR quality of the file, where lower quality ratings of the OCR may cause incorrect language classification.
  • In the case where a document has multiple languages, Zuva will identify a single language, typically the predominant language in the document.
  • We have seen Zuva encounter errors with files of non-Latin script languages, where the OCR had trouble extracting the non-Latin, specifically Asian-language, characters. Even though you may see some documents in these scripts languages appear properly in Zuva, they are not officially supported.

The following languages can be classified:

Arabic Estonian Italian Slovak
Bulgarian Finnish Japanese Slovenian
Cantonese French Korean Spanish
Catalan German Lithuanian Swedish
Chinese Greek Luxembourgian Tagalog
Croatian Hebrew Malay Thai
Czech Hindi Norwegian Turkish
Danish Hungarian Polish Ukrainian
Dutch Indonesian Portugese Vietnamese
English Icelandic Russian Welsh

What if a file is classified incorrectly?

Once a document has been classified, you cannot change the classification.

In most instances, an incorrect language classification is caused by poor OCR quality of the document, as small changes in OCR can cause Zuva to identify the document language with another language that has a very similar likelihood.

Various languages used throughout a document may also impact the accuracy of the language tags. In these instances, Zuva will only identify the most likely language, and not all languages that may appear in a document.

If a document has been classified in the wrong language, you can always try to increase the quality of the original document, and re-upload the document into Zuva.

Need Help?

Couldn’t find the information you were looking for or need more assistance?

Contact Support