http://localhost:8000
1) POST /upload
Upload a PDF file to start processing.
- Method: POST
- Path:
/upload -
Content-Type:
multipart/form-data -
Form field:
file(the PDF file)
{
"job_id": "<uuid>"
}
curl -X POST "http://localhost:8000/upload" \ -F "file=@/path/to/document.pdf"
import requests
url = "http://localhost:8000/upload"
with open("document.pdf", "rb") as f:
r = requests.post(url, files={"file": f})
print(r.json())
2) GET /status/{job_id}
Poll the job status and retrieve logs, partial progress, and final results.
- Method: GET
- Path:
/status/{job_id}
{
"job_id": "<uuid>",
"status": "pending|running|done|scanned|error",
"logs": ["...","..."],
"data": [
{ "Key": "Full Name", "Value": "Vijay Kumar", "Comment": "" },
...
],
"eval_result": {
"avg_bleu": 0.848,
"coverage": 0.848,
"total_fields": 51,
"non_empty_fields": 44
}
}
Notes: status is the job state;
done indicates final results.
scanned means image-only PDF (OCR not supported).
data is an array of extracted rows (Key/Value/Comment)
for display.
curl "http://localhost:8000/status/<job_id>"
import requests
r = requests.get(f"http://localhost:8000/status/{job_id}")
json_resp = r.json()
print(json_resp["status"]) # pending|running|done
3) GET /download/{job_id}
Download the generated Excel file for a completed job.
- Method: GET
- Path:
/download/{job_id} -
Response: 200 OK with
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
curl -o output.xlsx "http://localhost:8000/download/<job_id>"
import requests
r = requests.get(f"http://localhost:8000/download/{job_id}")
with open("output.xlsx", "wb") as f:
f.write(r.content)
Error handling
If the upload is not a PDF or cannot be processed,
/upload may return HTTP 400 with a JSON error. If the
job fails due to LLM errors or unexpected conditions,
/status/{job_id} will include
status: "error" and logs describing the problem.
Integration tips
-
Poll
/status/{job_id}every 1–3s for near-real-time updates; use exponential backoff for many jobs. -
For bulk processing, use
pipeline_runner.pyfor CLI batch runs. - Keep your API keys secure; do not embed them in client-side code.