☀ SOLAR STORM

How it works, layer by layer

The game is 100% static (vanilla JS + HTML5 Canvas, no build) served by GitHub Pages. The only thing that goes out to the network is the chat with the bums: a fetch to our proxy. And that request crosses a whole self-hosted Kubernetes infra. This is the full end-to-end journey of one message. Nothing is magic or someone else's cloud: it's our own hardware, and everything is declared via API.

Where does the game come from? — GitHub Pages vs. our own infra

Two different things travel: the game's static files (HTML/JS/CSS) and the AI chat. The static files can be served from GitHub Pages (today) or from our own infra (nginx in the cluster); the chat always goes through the self-hosted infra. Same browser, two origins:

🖥️ Your browser
▼ asks for two things ▼

① The game — static files

HTML · JS · CSS · assets (no build)
MODE A — TODAY · GitHub Pages GitHub's CDN · villadalmine.github.io/tormenta-solar
free, global, zero own infra.
MODE B — self-hosted · ✅ LIVE nginx in the cluster · tormenta-solar.cybercirujas.club
via HAProxy → Cilium Gateway → nginx pod. "All local".

② The chat — AI

fetch to the proxy (in both modes)
ALWAYS · our self-hosted infra llm-tormenta-solar.cybercirujas.club
HAProxy (SNI) → Cilium Gateway (TLS) → Node proxy → LiteLLM → OpenRouter / GPU (HAMi+Ollama) / RK1 NPU.
No matter where the game comes from: the AI, the key and the hardware are always ours. That's why it's free for the player.
▮ GitHub Pages = static files only (mode A) · ▮ Own infra = the chat always, and the static files too in mode B.

The journey of a chat message

🖥️Browser / GitHub Pages
the static game · does a fetch to the proxy
HTTPS
🌐Public DNS + WAN IP
llm-tormenta-solar… → home IP · :443
🧱HAProxy — Mac mini G4 · OpenBSD
TCP · SNI passthrough (doesn't terminate TLS)
raw TCP
🚪Cilium Gateway API · VIP .200
TLS terminates HERE (Let's Encrypt)
HTTPRoute + cilium-envoy
⚙️Node proxy — tormenta-ai-proxy
CORS · personas · guardrails
POST /v1/chat/completions
🔀LiteLLM — router
key pool · fallback · routing by model
by model
☁️OpenRouter
cloud · free
🎮NVIDIA GPU
HAMi + Ollama
🔌RK1 NPU ×4
local inference
See the same path in ASCII
   [ Browser / GitHub Pages ]              the game (static)
              │  HTTPS  fetch(llm-tormenta-solar.cybercirujas.club)
              ▼
   [ Public DNS + WAN IP ]                 A → home IP
              │  :443
              ▼
   [ HAProxy  (edge, Mac mini G4 · OpenBSD) ]   TCP mode · SNI passthrough
              │  forwards raw TCP by hostname (doesn't terminate TLS)
              ▼
   [ Cilium Gateway API  192.168.178.200 ] ← TLS terminates HERE (Let's Encrypt)
              │  cilium-envoy  +  HTTPRoute (hostname → Service)
              ▼
   [ Service → Pod: tormenta-ai-proxy ]    Node · CORS · personas · guardrails
              │  POST /v1/chat/completions
              ▼
   [ LiteLLM  (the central router) ]       key pool · fallback · routing
              ├──────────────┬──────────────┐
              ▼              ▼              ▼
       [ OpenRouter ]   [ NVIDIA GPU ]   [ RK1 NPUs ]
        cloud · free     HAMi + Ollama    4× local inference
    
0

The client — the game in your browser

Vanilla JS + Canvas, no framework, no build, hosted on GitHub Pages. The chat does a fetch POST with the NPC, your message and a bit of context. Your API key is optional: by default it hits our proxy (free); if you set an OpenRouter key, it stays only in your browser as an override. The server's "real" key never touches the client.

vanilla JSHTML5 CanvasGitHub Pagesi18n ES/EN
1

DNS + TLS — getting in without opening extra ports

The domain llm-tormenta-solar.cybercirujas.club resolves to the home public IP. The certificate is from Let's Encrypt, issued by cert-manager using a DNS-01 challenge via acme-dns — so there's no need to expose :80 or validate over HTTP. The cert renews itself.

cert-managerLet's EncryptDNS-01 / acme-dns
2

HAProxy — the edge (outside the cluster)

The edge is the most "junkyard-hacker" part of all: a PowerPC Mac mini G4 running OpenBSD — yes, a ~2005 machine recycled as a TLS router. HAProxy lives there in TCP mode with SNI passthrough: it reads the req.ssl_sni from the TLS hello and, depending on the hostname, forwards the raw TCP to the right backend — without decrypting anything (TLS is terminated by the gateway, further in). Several domains share the same backend pointing at the cluster VIP. Here we tune maxconn and timeout so long LLM responses don't get cut.

HAProxyOpenBSD / PPC G4TCP / L4SNI routing
3

Cilium Gateway API — the cluster's front door

Traffic enters the Kubernetes cluster through the cluster-gateway (GatewayClass cilium), a fixed VIP served by Cilium LB-IPAM. TLS terminates here: a per-host HTTPS listener presents the certificate (the Secret filled by cert-manager). It's Gateway API, not Ingress: routing is a declarative, standard resource.

Cilium 1.19Gateway APIeBPFLB-IPAM
4

HTTPRoute + Envoy — L7 routing

An HTTPRoute matches the hostname and routes to the proxy's Service. The data plane is cilium-envoy, acting as an HTTP reverse-proxy inside the cluster. Adding a new domain is, quite literally, adding an HTTPRoute and a listener — without touching HAProxy beyond the SNI rule.

HTTPRoutecilium-envoyL7
5

The game proxy — tormenta-ai-proxy

A tiny Node service (our own image, arm64). It does three things: sets the CORS headers so GitHub Pages can call it; keeps the personas (each bum's system-prompt) server-side; applies the guardrails (if the model is slow, it returns the "the solar storm is interfering with the model" line instead of leaving you hanging). Then it forwards the request to LiteLLM with the real key, which never leaves to the browser.

NodeCORSserver-side personasClusterIP Service
6

LiteLLM — the model router

A single OpenAI-compatible endpoint (/v1/chat/completions) that's the routing brain. It keeps an API key pool, does fallback between models if one fails or saturates, and decides where to send each request based on the model_name. Switching from "cloud" to "own hardware" is changing a model name — the game has no idea. The chat currently uses a free model (Gemma family) by default.

LiteLLMOpenAI APIkey poolfallbacks
7

Inference — cloud and/or own hardware

Behind LiteLLM there are three interchangeable destinations:

  • OpenRouter (cloud): free models, $0 cost — the chat's current default.
  • Self-hosted NVIDIA GPU: a node with a GPU, shared via HAMi (vGPU slicing, several workloads on one card), running Ollama.
  • Self-hosted RK1 NPUs: 4 RK1 boards (one per arm64 node) doing local inference, round-robin.

The idea: start free in the cloud and, when it makes sense, move the chat to our own hardware without touching the game or the proxy.

OpenRouterNVIDIA + HAMiOllamaRK1 NPU ×4
8

Observability — seeing everything that happens

Hubble (from Cilium) shows every L3/L4/L7 network flow in the cluster: you can watch, live, the request leaving the proxy toward LiteLLM and on to inference. Prometheus scrapes the metrics (LiteLLM exposes requests, latency, spend, fallbacks) and Grafana charts them. If something is slow or failing, you see it on a dashboard, not blind.

HubblePrometheusGrafana
9

Build & deploy — no Docker daemon, fully reproducible

The proxy image is built inside the cluster with Kaniko orchestrated by Argo Workflows (no Docker daemon or external CI needed), on an arm64 node, and pushed to an internal registry. The deploy is a Helm chart that creates everything declarative: the HTTPRoute, the Certificate, and even an idempotent hook that adds the HTTPS listener to the shared gateway. Standing it up on another cluster is one command.

KanikoArgo Workflowsinternal registryHelm

Everything is API / declarative

None of this was done "by hand and let's see if it works": Gateway and HTTPRoute (Gateway API), Certificate and ClusterIssuer (cert-manager CRDs), the build (Argo CRD), the deploy (Helm values). Everything is a versioned object that re-applies the same way every time. That's what makes a home-grown infra serious: it's reproducible.

🔜

Coming soon — Telegram + Hermes

A Telegram bot wired to Hermes (an agent already running in the same cluster) to run the game from Telegram chat: administer it, generate new content and orchestrate the world from your phone. The game gets a conversational "control panel", on the same infra.

Telegram BotHermes (agent)conversational control