File Management

Before processing your file with DocAI, you first need to upload it and obtain a file ID to use in your subsequent requests.

Using this guide

This guide uses plain Python 3 and built-in libraries here for illustrative purposes, but if you plan to use Python in your own code you may want to check out our prebuilt Python wrapper.

To run the code samples, you’ll need the following imports and constants:

1
2
3
4
5
6
7
import os, json, requests

# Assumes you've exported your token as an environment variable
TOKEN = os.getenv('DOCAI_TOKEN')

# Change this if you are using another region
REGION_URL = "https://us.app.zuva.ai"

Step 1: Upload a file to DocAI

Make a POST request to the ‘files’ API to upload your file using the following function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def upload_file(filename, content_type="application/octet-stream", expiration=""):
   with open(filename, 'rb') as f:
       resp=requests.request("POST", REGION_URL + "/api/v2/files",
                             headers={"Authorization": "Bearer " + TOKEN,
                                      "Content-Type": content_type,
                                      "Expiration": expiration},
                             data=f)
       if resp.status_code==201:
           return json.loads(resp.text)
       else:
           raise RuntimeError('Unexpected status code: '+resp.status_code)

file_info = upload_file("filename.pdf")

Specifying content-type

The content type can be omitted (defaulting to application/octet-stream) for most file types. However, the content type application/eocr must be specified for eOCR documents. It is also a good practise to explicitly declare the content type of text files (plain/text) and PDF files (application/pdf).

Specifying expiration

The expiration time of the file may be specified using the Expiration request header, as either an absolute timestamp (e.g. 2022-12-23T12:34:56Z) or an interval (e.g 12h or 10d). The maximum allowed value is 14 days.

Step 2 (optional): File validation

If desired, you can check the sha-256 hash to ensure the integrity of your uploaded file.

The hash can be found from the Submit File response:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import hashlib

def get_sha256(filename):
    with open(filename, 'rb') as in_file:
        m = hashlib.sha256()
        m.update(in_file.read())
        return m.hexdigest()

print("Uploaded file hash:" + file_info['attributes']['sha-256'])
print("Original file hash:" + get_sha256('filename.pdf'))

You can also find the sha-256 hash using your operating system’s native tools:

MacOS/Linux

1
shasum -a 256 filename.pdf

Windows PowerShell

1
Get-FileHash filename.pdf -Algorithm SHA256

Step 3: Process a file

Get the file_id from the response for use in any of the other workflows: classification, language, field extraction, OCR and training.

1
file_id = file_info["file_id"]

Step 4 (Optional): Delete a file from DocAI

Files uploaded to DocAI are automatically deleted after 48 hours. However, if you would like to delete the file sooner, you can use the Delete file endpoint as follows:

1
2
3
4
5
6
7
def delete_file(file_id):
    resp=requests.request("DELETE", REGION_URL + "/api/v2/files/" + file_id,
                            headers={"Authorization": "Bearer " + TOKEN})
    if resp.status_code!=204:
        raise RuntimeError('Unexpected status code: '+resp.status_code)

delete_file(file_id)