Base URL (default local):
http://localhost:8000

1) POST /upload

Upload a PDF file to start processing.

  • Method: POST
  • Path: /upload
  • Content-Type: multipart/form-data
  • Form field: file (the PDF file)
Success response (HTTP 200):
{
  "job_id": "<uuid>"
}
Example (curl):
curl -X POST "http://localhost:8000/upload" \
  -F "file=@/path/to/document.pdf"
Example (Python requests):
import requests
url = "http://localhost:8000/upload"
with open("document.pdf", "rb") as f:
    r = requests.post(url, files={"file": f})
print(r.json())

2) GET /status/{job_id}

Poll the job status and retrieve logs, partial progress, and final results.

  • Method: GET
  • Path: /status/{job_id}
Response JSON structure (fields vary by state):
{
  "job_id": "<uuid>",
  "status": "pending|running|done|scanned|error",
  "logs": ["...","..."],
  "data": [
    { "Key": "Full Name", "Value": "Vijay Kumar", "Comment": "" },
    ...
  ],
  "eval_result": {
    "avg_bleu": 0.848,
    "coverage": 0.848,
    "total_fields": 51,
    "non_empty_fields": 44
  }
}

Notes: status is the job state; done indicates final results. scanned means image-only PDF (OCR not supported). data is an array of extracted rows (Key/Value/Comment) for display.

Example (curl):
curl "http://localhost:8000/status/<job_id>"
Example (Python requests):
import requests
r = requests.get(f"http://localhost:8000/status/{job_id}")
json_resp = r.json()
print(json_resp["status"])  # pending|running|done

3) GET /download/{job_id}

Download the generated Excel file for a completed job.

  • Method: GET
  • Path: /download/{job_id}
  • Response: 200 OK with Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Example (curl):
curl -o output.xlsx "http://localhost:8000/download/<job_id>"
Example (Python requests):
import requests
r = requests.get(f"http://localhost:8000/download/{job_id}")
with open("output.xlsx", "wb") as f:
    f.write(r.content)
Important: the server does not persist Excel files permanently. Download the file immediately after job completion.

Error handling

If the upload is not a PDF or cannot be processed, /upload may return HTTP 400 with a JSON error. If the job fails due to LLM errors or unexpected conditions, /status/{job_id} will include status: "error" and logs describing the problem.

Integration tips

  • Poll /status/{job_id} every 1–3s for near-real-time updates; use exponential backoff for many jobs.
  • For bulk processing, use pipeline_runner.py for CLI batch runs.
  • Keep your API keys secure; do not embed them in client-side code.