Full Case Study

Delma

AI-powered healthcare platform. 13 Spring Boot microservices. 4 LLM features. Built from scratch.

13

Microservices

4

AI Features

3

Kafka Topics

9

Databases

2

Arch Targets

01 — Architecture

13 Services

Each service owns its data, scales independently, and fails without taking down the rest of the platform.

discovery-serverInfra

:8761

Eureka service registry

gatewayInfra

:8089

Spring Cloud Gateway · JWT auth · WebFlux

userserviceCore

:8111

Auth · OTP · JWT · refresh tokens

doctorserviceCore

:8010

Profiles · Redis cache · approval workflow

appointmentserviceCore

:8012

Slots · booking · ZEGOCLOUD video tokens

paymentserviceCore

:8083

Razorpay integration · HMAC-SHA256 verify

documentserviceCore

:8091

AWS S3 · presigned URLs · Kafka events

aiserviceAI

:8095

All AI features · Groq · Voyage AI · pgvector

notificationserviceAsync

:8017

Kafka consumer · Gmail SMTP

productserviceStore

:8016

E-store product catalog

categoryserviceStore

:8015

Category + slug management

orderserviceStore

:8013

Cart · orders · checkout

common-libLib

:

Shared ApiResponse · exceptions · DTOs

Kafka Topics

notification-topic

Appointment confirmations, doctor approvals

document-uploaded

Triggers RAG indexing in aiservice

consultation-notes-ready

Triggers AI report generation

02 — AI Features

4 LLM-Powered Features

Each feature solves a real clinical problem. Each has specific engineering decisions worth explaining.

AI Symptom Checker

POST /api/v1/ai/symptom-check

Groq llama-3.1-8b-instant

How it works

  • 01Patient types symptoms in plain English
  • 02Prompt uses a fixed specialization list matching platform doctor categories
  • 03JSON response parsed → fallback to General Medicine on parse failure (never returns 500)
  • 04Frontend auto-filters doctor listing based on returned specialization

Engineering decision

Fixed specialization list prevents the LLM from inventing specialties that don't exist on the platform. Fallback to General Medicine ensures the endpoint is always usable even if Groq is degraded.

RAG Document Summarizer

GET /api/v1/ai/summarize/{userId}

Voyage AI voyage-3-lite · pgvector · Groq LLaMA 3.1

How it works

  • 01Patient uploads PDF → documentservice saves to S3 → publishes Kafka event
  • 02aiservice consumes event: downloads PDF → Apache PDFBox text extraction
  • 03Fixed-size chunking: 500 chars, 100 char overlap
  • 04Voyage AI voyage-3-lite embeds each chunk (512-dim), stored in pgvector via JdbcTemplate + ::vector cast
  • 0521s Thread.sleep between chunks (Voyage free tier: 3 RPM)
  • 06On doctor query: embed query → cosine similarity search → top-K chunks → Groq generates 3-point clinical summary

Engineering decision

Fixed-size chunking chosen over semantic/agentic — medical documents are 1-5 pages, agentic chunking would have tripled API calls against the 3 RPM limit for zero measurable quality gain. JdbcTemplate used instead of Hibernate to bypass custom column type limitations with pgvector.

MCP Booking Agent (ReAct)

POST /api/v1/ai/agent/chat

Groq qwen3-32b

How it works

  • 015 tools: search_doctors · get_available_slots · book_appointment · get_my_appointments · create_payment_order
  • 02ReAct loop: reason → call tool → observe result → reason again (max 10 iterations)
  • 03Redis slot session injection: real doctorId + slotIds stored in Redis per userId, injected as SYSTEM message at position 1 before every LLM call
  • 04400 correction loop: if LLM passes string slotId instead of integer, error injected back → forces re-fetch
  • 05Floating chat widget in frontend (patients only)

Engineering decision

LLM hallucination problem — qwen3-32b would invent slotId=123 in later turns because conversation history only carries text, not structured tool results. Redis injection gives the LLM ground truth IDs on every call, eliminating hallucinations. This was the hardest engineering problem in the project.

Post-Consultation AI Notes

Kafka: consultation-notes-ready

Groq llama-3.1-8b-instant · temperature=0.3

How it works

  • 01Doctor types notes during video call (ConsultationNotesPanel — auto-saves every 30s to DRAFT status)
  • 02Doctor clicks 'End Call & Save' → appointmentservice saves notes → publishes to consultation-notes-ready Kafka topic
  • 03aiservice consumer picks up event: formats medications as numbered list (Drug | Dose | Freq | Route)
  • 04buildPrompt() creates a 7-section contract: Consultation Summary · Diagnosis · Medications · Tests · Recovery · Follow-up · Disclaimer
  • 05temperature=0.3: consistency over creativity — medical docs must be literal
  • 06AI report saved back via Feign → AppointmentClient.updateAiReport()
  • 07Patient sees report in MyConsultationReports between Upcoming/History sections

Engineering decision

Kafka used (not Feign) because report generation takes 5-10 seconds. Direct call would freeze the doctor's End Call button for 10 seconds — unacceptable UX. Exceptions swallowed in consumer (not rethrown): notes are already saved in DB, AI report is enhancement not blocker. Rethrowing would freeze the partition and block every other patient's report.

03 — Platform Features

Core Platform

Video Consultations

  • ZEGOCLOUD SDK with AES-256 per-session encryption
  • Appointment-scoped video tokens generated server-side
  • ConsultationNotesPanel for doctors (auto-saves every 30s)
  • Authorization check: only participants can get video token

Auth & Security

  • Email OTP verification — 6-digit, Redis TTL 10min, one-time use
  • JWT access (15min) + refresh rotation (7 days, HttpOnly cookie)
  • Refresh tokens stored in DB — explicit revocation on logout
  • Gateway-level JWT validation — X-User-Id/X-Roles header injection

Medical Documents

  • Upload to AWS S3 — presigned URLs with 10-minute TTL
  • Access control: only the uploading patient + their assigned doctor
  • Kafka event on upload triggers RAG indexing in aiservice
  • Metadata stored in PostgreSQL, files in S3 (separation of concerns)

Medical E-Store

  • Product catalog via productservice + categoryservice
  • Cart management and order lifecycle in orderservice
  • Razorpay payment with HMAC-SHA256 signature verification
  • Order confirmation via Kafka → notificationservice → email

Real-time Notifications

  • Kafka consumer in notificationservice (notification-topic)
  • Triggers on: booking confirmation, doctor approval, order updates
  • Gmail SMTP delivery + in-app notification persistence
  • Frontend: 5-minute module-level TTL cache (not useState — survives remounts)

Doctor Workflow

  • Apply → pending review → admin approves via Feign chain (user→doctor→user)
  • Redis cache-aside on listings (@Cacheable + @CacheEvict + TTL)
  • Slot management: doctor creates availability windows
  • Doctor profile search by name or specialization

04 — Engineering Decisions

Why I built it this way

Every non-obvious decision has a reason. These are the ones worth explaining to a senior engineer.

Double-Booking Race Condition

Problem

Two patients could book the same slot simultaneously — both pass the AVAILABLE check before either writes BOOKED.

Solution

@Version on DoctorSlot entity. Hibernate generates: UPDATE doctor_slot SET status='BOOKED', version=2 WHERE id=5 AND version=1. Only one thread succeeds. The other gets ObjectOptimisticLockingFailureException → HTTP 409.

Why this approach

Optimistic over pessimistic because true slot contention is rare. Pessimistic (SELECT FOR UPDATE) would queue all booking requests under load.

JWT + Refresh Token Rotation

Problem

Access tokens that never expire are a security liability. But requiring re-login every 15 minutes breaks UX.

Solution

Access token (15 min) in Authorization header. Refresh token (7 days) in HttpOnly cookie stored in DB. On refresh: old token deleted, new token issued. Stolen refresh tokens become invalid after first legitimate use.

Why this approach

HttpOnly cookie prevents XSS token theft. DB storage enables explicit revocation on logout. Rotation means a compromised token has a very limited attack window.

Redis Cache-Aside + TTL (Defense in Depth)

Problem

Doctor listings hit the DB on every request. A single cache eviction failure would serve stale data indefinitely.

Solution

@Cacheable on GET /api/v1/doctor/all (doctors::all key) + @CacheEvict on any approval/rejection + 10-minute TTL as safety net.

Why this approach

@CacheEvict alone: if Redis has a brief outage during approval, eviction fails silently → stale data forever. TTL alone: stale for up to 10 minutes after every state change. Together: immediate freshness + bounded staleness window.

Kafka for Async Notification Decoupling

Problem

If appointmentservice called notificationservice via Feign and the notification service was down, the booking would fail — two completely unrelated concerns sharing a failure surface.

Solution

appointmentservice publishes NotificationEvent to Kafka and returns immediately. notificationservice consumes when available. Kafka retains events during downtime.

Why this approach

Temporal decoupling. Booking is a synchronous, critical path. Notification delivery is best-effort. They should have independent failure surfaces.

OTP via Redis (not PostgreSQL)

Problem

Email OTPs are ephemeral — they expire in 10 minutes, are one-time use, and need fast lookup.

Solution

Redis key: otp:{email} with 10-minute EXPIRE. On successful verification: key deleted immediately (one-time use). SecureRandom for generation (not java.util.Random).

Why this approach

Redis native TTL eliminates the need for a cron cleanup job. Sub-millisecond reads. Zero schema overhead. PostgreSQL would be the wrong tool for ephemeral data.

Spring Cloud Gateway (WebFlux, not MVC)

Problem

A gateway that blocks one thread per request would become a bottleneck under load — it's the single entry point for all 13 services.

Solution

Spring Cloud Gateway 5.0 built on Project Reactor (Netty). Non-blocking, event-loop-based. JwtAuthGatewayFilter validates JWT, injects X-User-Id and X-Roles headers. Downstream services read headers — they do NOT re-validate JWT.

Why this approach

The gateway does almost nothing except validate and forward. Non-blocking I/O handles thousands of concurrent connections with a small thread pool. JWT validation at gateway = single point of trust, not repeated in every service.

05 — Stack

Full Tech Stack

Backend

Java 21Spring Boot 4Spring Cloud Gateway 5.0EurekaOpenFeignResilience4jHibernate 7JPAMaven

AI & LLM

Groq llama-3.1-8b-instantGroq qwen3-32bVoyage AI voyage-3-litepgvectorSpring AIApache PDFBox

Messaging & Cache

Apache Kafka 3.xRedis 7.xSpring Kafka

Frontend

Next.js 14 (App Router)TypeScriptRedux ToolkitTailwind CSSshadcn/uiFramer Motion

Infrastructure

DockerDocker ComposeGitHub Actions CI/CDMulti-arch builds (amd64 + arm64)AWS S3ZEGOCLOUDRazorpay

Database

PostgreSQL (x9 separate DBs)pgvector extensionRedis

CI/CD Pipeline

Push to main→ GitHub Actions→ mvn clean install→ Docker Buildx (amd64 + arm64)→ Docker Hub (aakash354/delma-*)→ Hetzner CX31 (8GB RAM)

Multi-arch builds mean the same image runs on both the Hetzner production server (amd64) and Apple Silicon development machines (arm64) without architecture-specific Dockerfiles.

Want to dig deeper?

Full source code on GitHub. README covers every service in detail.