Language Classification

DocAI helps you classify your documents based on their primary language. Currently, DocAI can identify 40 different languages.

In the case where a document has multiple languages, Zuva will identify a single language, typically the predominant language in the document.

When Zuva classifies languages, detection relies in part on the OCR quality of the file, where lower quality ratings of the OCR may cause incorrect language classification. In some cases, you may be able to improve your language classifcation results by improving the quality of your documents and re-uploading them.

We have seen Zuva encounter errors with files of non-Latin script languages, where the OCR had trouble extracting the non-Latin, specifically Asian-language, characters. Even though you may see some documents in these scripts languages appear properly in Zuva, they are not officially supported.

For example usage of the language service, check out the language workflow page.

Languages

The following languages are the possible results of the language classification service.

ArabicGreekMalay
BulgarianHebrewNorwegian
CantoneseHindiPolish
CatalanHungarianPortugese
ChineseIcelandicRussian
CroatianIndonesianSlovak
CzechItalianSlovenian
DanishIcelandicSpanish
DutchIndonesianSwedish
EnglishJapaneseTagalog
EstonianKoreanUkrainian
FinnishLithuanianVietnamese
FrenchLuxembourgianWelsh
German

* Note: this list of languages applies only to the Language Classification service. DocAI’s other services vary in their support for non-English languages and non-Latin script.

On This Page