Language Classification

Zuva helps you classify your documents based on their primary language. Currently, Zuva can identify 40 different languages.

In the case where a document has multiple languages, Zuva will identify a single language, typically the predominant language in the document.

When Zuva classifies languages, detection relies in part on the OCR quality of the file, where lower quality ratings of the OCR may cause incorrect language classification. In some cases, you may be able to improve your language classifcation results by improving the quality of your documents and re-uploading them.

We have seen Zuva encounter errors with files of non-Latin script languages, where the OCR had trouble extracting the non-Latin, specifically Asian-language, characters. Even though you may see some documents in these scripts languages appear properly in Zuva, they are not officially supported.

For example usage of the language service, check out the language workflow page.

Languages

The following languages are the possible results of the language classification service.


Arabic	Greek	Malay
Bulgarian	Hebrew	Norwegian
Cantonese	Hindi	Polish
Catalan	Hungarian	Portugese
Chinese	Icelandic	Russian
Croatian	Indonesian	Slovak
Czech	Italian	Slovenian
Danish	Icelandic	Spanish
Dutch	Indonesian	Swedish
English	Japanese	Tagalog
Estonian	Korean	Ukrainian
Finnish	Lithuanian	Vietnamese
French	Luxembourgian	Welsh
German

* Note: this list of languages applies only to the Language Classification service. Zuva’s other services vary in their support for non-English languages and non-Latin script.