File Submission
In order to process your documents with Zuva’s API, you must first upload them using the POST /files endpoint.
Uploaded documents expire and are removed from the system after 7 days by default. The expiration time can also be configured to a maximum of 14 days, and later renewed (until the original expiration time is reached). The file can be deleted at any time using the DELETE /files endpoint.
Zuva automatically recognizes over 60 common document and image file formats. Documents in these formats will automatically be processed with OCR prior to classification and field extraction.
Zuva supports PDF, DOC, DOCX, and TXT file formats via our API.
Plain Text Content
Plain text documents are not subject to OCR. Instead, Zuva creates pages by applying word-wrapping and inserting appropriate page breaks. Only UTF-8 encoding is currently supported. Plain text will be automatically detected, but you may also choose to specify the content type text/plain
.
Plain text should only be used if it is the original document format, or if conversion to plain text preserves most of the document formatting (e.g. paragraph breaks, alignment of titles etc.). If possible, it is recommended to use Zuva’s eOCR format (see below) rather than plain text, in order to retain the positional information of the characters from the original document.
eOCR content
If your documents are already in a format such as .hocr, you will need to convert them to the eOCR (Zuva OCR) format for upload. Example code to convert hOCR to eOCR is available on Github. The file should then be uploaded to the POST /files endpoint with content-type application/eocr
.
On This Page