Delma
AI-powered healthcare platform. 13 Spring Boot microservices. 4 LLM features. Built from scratch.
13
Microservices
4
AI Features
3
Kafka Topics
9
Databases
2
Arch Targets
01 — Architecture
13 Services
Each service owns its data, scales independently, and fails without taking down the rest of the platform.
:8761
Eureka service registry
:8089
Spring Cloud Gateway · JWT auth · WebFlux
:8111
Auth · OTP · JWT · refresh tokens
:8010
Profiles · Redis cache · approval workflow
:8012
Slots · booking · ZEGOCLOUD video tokens
:8083
Razorpay integration · HMAC-SHA256 verify
:8091
AWS S3 · presigned URLs · Kafka events
:8095
All AI features · Groq · Voyage AI · pgvector
:8017
Kafka consumer · Gmail SMTP
:8016
E-store product catalog
:8015
Category + slug management
:8013
Cart · orders · checkout
:—
Shared ApiResponse · exceptions · DTOs
Kafka Topics
notification-topic
Appointment confirmations, doctor approvals
document-uploaded
Triggers RAG indexing in aiservice
consultation-notes-ready
Triggers AI report generation
02 — AI Features
4 LLM-Powered Features
Each feature solves a real clinical problem. Each has specific engineering decisions worth explaining.
AI Symptom Checker
POST /api/v1/ai/symptom-check
How it works
- 01Patient types symptoms in plain English
- 02Prompt uses a fixed specialization list matching platform doctor categories
- 03JSON response parsed → fallback to General Medicine on parse failure (never returns 500)
- 04Frontend auto-filters doctor listing based on returned specialization
Engineering decision
Fixed specialization list prevents the LLM from inventing specialties that don't exist on the platform. Fallback to General Medicine ensures the endpoint is always usable even if Groq is degraded.
RAG Document Summarizer
GET /api/v1/ai/summarize/{userId}
How it works
- 01Patient uploads PDF → documentservice saves to S3 → publishes Kafka event
- 02aiservice consumes event: downloads PDF → Apache PDFBox text extraction
- 03Fixed-size chunking: 500 chars, 100 char overlap
- 04Voyage AI voyage-3-lite embeds each chunk (512-dim), stored in pgvector via JdbcTemplate + ::vector cast
- 0521s Thread.sleep between chunks (Voyage free tier: 3 RPM)
- 06On doctor query: embed query → cosine similarity search → top-K chunks → Groq generates 3-point clinical summary
Engineering decision
Fixed-size chunking chosen over semantic/agentic — medical documents are 1-5 pages, agentic chunking would have tripled API calls against the 3 RPM limit for zero measurable quality gain. JdbcTemplate used instead of Hibernate to bypass custom column type limitations with pgvector.
MCP Booking Agent (ReAct)
POST /api/v1/ai/agent/chat
How it works
- 015 tools: search_doctors · get_available_slots · book_appointment · get_my_appointments · create_payment_order
- 02ReAct loop: reason → call tool → observe result → reason again (max 10 iterations)
- 03Redis slot session injection: real doctorId + slotIds stored in Redis per userId, injected as SYSTEM message at position 1 before every LLM call
- 04400 correction loop: if LLM passes string slotId instead of integer, error injected back → forces re-fetch
- 05Floating chat widget in frontend (patients only)
Engineering decision
LLM hallucination problem — qwen3-32b would invent slotId=123 in later turns because conversation history only carries text, not structured tool results. Redis injection gives the LLM ground truth IDs on every call, eliminating hallucinations. This was the hardest engineering problem in the project.
Post-Consultation AI Notes
Kafka: consultation-notes-ready
How it works
- 01Doctor types notes during video call (ConsultationNotesPanel — auto-saves every 30s to DRAFT status)
- 02Doctor clicks 'End Call & Save' → appointmentservice saves notes → publishes to consultation-notes-ready Kafka topic
- 03aiservice consumer picks up event: formats medications as numbered list (Drug | Dose | Freq | Route)
- 04buildPrompt() creates a 7-section contract: Consultation Summary · Diagnosis · Medications · Tests · Recovery · Follow-up · Disclaimer
- 05temperature=0.3: consistency over creativity — medical docs must be literal
- 06AI report saved back via Feign → AppointmentClient.updateAiReport()
- 07Patient sees report in MyConsultationReports between Upcoming/History sections
Engineering decision
Kafka used (not Feign) because report generation takes 5-10 seconds. Direct call would freeze the doctor's End Call button for 10 seconds — unacceptable UX. Exceptions swallowed in consumer (not rethrown): notes are already saved in DB, AI report is enhancement not blocker. Rethrowing would freeze the partition and block every other patient's report.
03 — Platform Features
Core Platform
Video Consultations
- ZEGOCLOUD SDK with AES-256 per-session encryption
- Appointment-scoped video tokens generated server-side
- ConsultationNotesPanel for doctors (auto-saves every 30s)
- Authorization check: only participants can get video token
Auth & Security
- Email OTP verification — 6-digit, Redis TTL 10min, one-time use
- JWT access (15min) + refresh rotation (7 days, HttpOnly cookie)
- Refresh tokens stored in DB — explicit revocation on logout
- Gateway-level JWT validation — X-User-Id/X-Roles header injection
Medical Documents
- Upload to AWS S3 — presigned URLs with 10-minute TTL
- Access control: only the uploading patient + their assigned doctor
- Kafka event on upload triggers RAG indexing in aiservice
- Metadata stored in PostgreSQL, files in S3 (separation of concerns)
Medical E-Store
- Product catalog via productservice + categoryservice
- Cart management and order lifecycle in orderservice
- Razorpay payment with HMAC-SHA256 signature verification
- Order confirmation via Kafka → notificationservice → email
Real-time Notifications
- Kafka consumer in notificationservice (notification-topic)
- Triggers on: booking confirmation, doctor approval, order updates
- Gmail SMTP delivery + in-app notification persistence
- Frontend: 5-minute module-level TTL cache (not useState — survives remounts)
Doctor Workflow
- Apply → pending review → admin approves via Feign chain (user→doctor→user)
- Redis cache-aside on listings (@Cacheable + @CacheEvict + TTL)
- Slot management: doctor creates availability windows
- Doctor profile search by name or specialization
04 — Engineering Decisions
Why I built it this way
Every non-obvious decision has a reason. These are the ones worth explaining to a senior engineer.
Double-Booking Race Condition
Problem
Two patients could book the same slot simultaneously — both pass the AVAILABLE check before either writes BOOKED.
Solution
@Version on DoctorSlot entity. Hibernate generates: UPDATE doctor_slot SET status='BOOKED', version=2 WHERE id=5 AND version=1. Only one thread succeeds. The other gets ObjectOptimisticLockingFailureException → HTTP 409.
Why this approach
Optimistic over pessimistic because true slot contention is rare. Pessimistic (SELECT FOR UPDATE) would queue all booking requests under load.
JWT + Refresh Token Rotation
Problem
Access tokens that never expire are a security liability. But requiring re-login every 15 minutes breaks UX.
Solution
Access token (15 min) in Authorization header. Refresh token (7 days) in HttpOnly cookie stored in DB. On refresh: old token deleted, new token issued. Stolen refresh tokens become invalid after first legitimate use.
Why this approach
HttpOnly cookie prevents XSS token theft. DB storage enables explicit revocation on logout. Rotation means a compromised token has a very limited attack window.
Redis Cache-Aside + TTL (Defense in Depth)
Problem
Doctor listings hit the DB on every request. A single cache eviction failure would serve stale data indefinitely.
Solution
@Cacheable on GET /api/v1/doctor/all (doctors::all key) + @CacheEvict on any approval/rejection + 10-minute TTL as safety net.
Why this approach
@CacheEvict alone: if Redis has a brief outage during approval, eviction fails silently → stale data forever. TTL alone: stale for up to 10 minutes after every state change. Together: immediate freshness + bounded staleness window.
Kafka for Async Notification Decoupling
Problem
If appointmentservice called notificationservice via Feign and the notification service was down, the booking would fail — two completely unrelated concerns sharing a failure surface.
Solution
appointmentservice publishes NotificationEvent to Kafka and returns immediately. notificationservice consumes when available. Kafka retains events during downtime.
Why this approach
Temporal decoupling. Booking is a synchronous, critical path. Notification delivery is best-effort. They should have independent failure surfaces.
OTP via Redis (not PostgreSQL)
Problem
Email OTPs are ephemeral — they expire in 10 minutes, are one-time use, and need fast lookup.
Solution
Redis key: otp:{email} with 10-minute EXPIRE. On successful verification: key deleted immediately (one-time use). SecureRandom for generation (not java.util.Random).
Why this approach
Redis native TTL eliminates the need for a cron cleanup job. Sub-millisecond reads. Zero schema overhead. PostgreSQL would be the wrong tool for ephemeral data.
Spring Cloud Gateway (WebFlux, not MVC)
Problem
A gateway that blocks one thread per request would become a bottleneck under load — it's the single entry point for all 13 services.
Solution
Spring Cloud Gateway 5.0 built on Project Reactor (Netty). Non-blocking, event-loop-based. JwtAuthGatewayFilter validates JWT, injects X-User-Id and X-Roles headers. Downstream services read headers — they do NOT re-validate JWT.
Why this approach
The gateway does almost nothing except validate and forward. Non-blocking I/O handles thousands of concurrent connections with a small thread pool. JWT validation at gateway = single point of trust, not repeated in every service.
05 — Stack
Full Tech Stack
Backend
AI & LLM
Messaging & Cache
Frontend
Infrastructure
Database
CI/CD Pipeline
Multi-arch builds mean the same image runs on both the Hetzner production server (amd64) and Apple Silicon development machines (arm64) without architecture-specific Dockerfiles.
Want to dig deeper?
Full source code on GitHub. README covers every service in detail.