AI Document Anonymization Platform
Production-style event-driven microservices platform using FastAPI, RabbitMQ, Keycloak, OCR, NER and RBAC.
Pipeline Snapshot
10
Core Services
3
Security Layers
7
Smoke Checks
System Topology
Interactive Architecture Flow
Gateway authentication and role checks control ingress, while outbox-backed event publication drives asynchronous OCR and NER pipelines.
Execution Graph
Pipeline orchestration as a live control-plane walkthrough
A cinematic runtime view of the anonymization DAG, from authenticated ingress through outbox publication, OCR extraction, entity detection, and review-ready artifacts.
Control State
Idle
Awaiting authenticated job submission
Execution
0/10
Sequenced pipeline checkpoints
Completion
0%
Progress snapshot for the DAG walkthrough
Runtime Progress
Replay the sequence to inspect each stage
Orchestration Surface
Control-plane view of sequential state transitions, event publication, and downstream worker execution.
Security Model
Authentication and RBAC Guardrails
Authentication terminates at the gateway boundary. Internal services trust only propagated identity headers from validated requests.
Keycloak OIDC Authentication
Users authenticate through Keycloak. Access tokens carry issuer, subject and role claims.
JWT Validation at API Gateway
Gateway validates signature, expiry and audience before forwarding to internal services.
Role-Based Access Control
Route-level policies enforce uploader, reviewer and admin permissions at ingress.
Trusted Identity Propagation
Gateway injects trusted `X-User-Sub` and role context for internal service auditing.
Role Matrix
Policy enforced at API Gateway before internal routing.
| Role | Upload | Review | Admin |
|---|---|---|---|
| uploader | Allow | Deny | Deny |
| reviewer | Deny | Allow | Deny |
| admin | Allow | Allow | Allow |
Engineering Notes
Platform Design Decisions
The system emphasizes reliability, controlled trust boundaries and observable asynchronous workflows.
Outbox Pattern
Document and event records are persisted atomically before asynchronous publication.
Event-Driven Architecture
RabbitMQ decouples ingestion and downstream compute workloads with durable messaging.
Async Workers
OCR and NER workers process jobs independently and scale without gateway pressure.
Artifact Registration
Each generated artifact is addressable through metadata links and traceable job states.
MinIO Object Storage
Binary documents and extracted artifacts are versioned and retained in object storage.
Dockerized Local Platform
Services, broker and storage are orchestrated in reproducible local environments.
Smoke-Tested End-to-End
Critical user and worker paths are validated with deterministic smoke scenarios.
Keycloak + Gateway Security
Identity issuance and request authorization remain centralized and auditable.
Artifacts
Generated Outputs
Pipeline stages produce concrete intermediate artifacts and review-ready responses, not opaque background tasks.
Uploaded PDF
{
"document_id": "doc_8f19",
"filename": "fichier_de_test.pdf",
"size_bytes": 294212,
"content_type": "application/pdf",
"stored_at": "s3://documents/doc_8f19/original.pdf"
}OCR Output
{
"artifact": "ocr_json",
"document_id": "doc_8f19",
"language": "fr",
"pages": 2,
"extract": "Fichier de test ..."
}Detected Entities
Review Response
{
"document_id": "doc_8f19",
"status": "ready_for_review",
"entities": 2,
"artifacts": {
"ner_json": "s3://artifacts/doc_8f19/ner.json",
"ocr_json": "s3://artifacts/doc_8f19/ocr.json"
}
}Verification
Validated Scenarios
Core security and processing paths are smoke-tested to verify role boundaries and artifact lifecycles.
This project demonstrates platform engineering concerns across security, messaging, async processing, storage and API design.