Tenki Cloud System Architecture
Last updated: 2025-06-12
Overview
Tenki Cloud is a cloud compute marketplace that provides GitHub Actions runner management as a service. The system is built as a distributed microservices architecture with clear separation of concerns.
High-Level Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ GitHub.com │────▶│ GitHub Proxy │────▶│ Temporal │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
┌─────────────────┐ ┌──────────────────┐ ▼
│ Next.js App │────▶│ tRPC Gateway │ ┌─────────────────┐
└─────────────────┘ └──────────────────┘ │ Backend Engine │
│ └─────────────────┘
▼ │
┌──────────────────┐ ▼
│ Backend API │ ┌─────────────────┐
│ (Connect RPC) │ │ PostgreSQL │
└──────────────────┘ └─────────────────┘
Core Components
Frontend Layer
Next.js Application (apps/app/)
- Server-side rendered React application
- TypeScript with tRPC for type-safe API calls
- Tailwind CSS with Radix UI components
- Authentication via Kratos sessions
API Gateway Layer
tRPC Router (apps/app/src/server/api/)
- Type-safe RPC layer between frontend and backend
- Handles session management and authentication
- Routes requests to appropriate backend services
Backend Services
Engine (backend/cmd/engine/)
- Main orchestrator for all backend operations
- Implements Connect RPC (gRPC-Web compatible)
- Manages service lifecycle and dependencies
Domain Services (backend/internal/domain/)
- Identity: User authentication (Kratos) and authorization (Keto)
- Workspace: Multi-tenant workspace and project management
- Runner: GitHub Actions runner lifecycle management
- Billing: Usage tracking, TigerBeetle ledger, Stripe integration
- Compute: VM provisioning via CloudStack/Kubernetes
Event Processing
GitHub Proxy (backend/cmd/github-proxy/)
- Receives GitHub webhooks
- Validates and transforms events
- Publishes to Temporal for processing
Temporal Workflows
- Long-running business processes
- Runner provisioning workflows
- Billing cycle management
- Retry and failure handling
Data Layer
PostgreSQL
- Primary data store
- Managed via migrations (
backend/schema/) - Type-safe queries via sqlc
Redpanda
- Event streaming platform
- Audit log collection
- Inter-service communication
TigerBeetle
- Financial ledger for billing
- Double-entry bookkeeping
- High-performance transaction processing
Key Design Decisions
1. Monorepo Structure
See ADR-001
2. Temporal for Workflows
See ADR-002
3. Connect RPC over REST
See ADR-003
Security Architecture
Authentication Flow
User → Next.js → Kratos → Session Cookie → tRPC → Backend
Authorization Model
- Keto for fine-grained permissions
- Workspace-based multi-tenancy
- Project-level access control
Secrets Management
- SOPS for encrypted configuration
- Kubernetes secrets for runtime
- No secrets in environment variables
Deployment Architecture
Kubernetes Deployment
- GitOps via Flux
- Horizontal pod autoscaling
- Service mesh for inter-service communication
Infrastructure Components
- Ingress: Traefik with automatic TLS
- Monitoring: Prometheus + Grafana
- Logging: Loki + Grafana
- Tracing: Tempo
Data Flow Examples
Runner Provisioning
- GitHub sends webhook to proxy
- Proxy validates and publishes to Kafka
- Backend consumes event, starts Temporal workflow
- Workflow provisions runner in Kubernetes
- Runner registers with GitHub
- Status updates flow back via Temporal
Billing Flow
- Runner usage tracked via Temporal activities
- Usage events written to TigerBeetle
- Daily aggregation job calculates costs
- Monthly billing workflow generates invoices
- Stripe processes payments
- Payment status updates ledger
Scalability Considerations
Horizontal Scaling
- Stateless services scale via replicas
- Database uses read replicas for queries
- Temporal workers scale independently
Performance Optimization
- Redis for session caching
- CDN for static assets
- Database query optimization via indexes
Reliability
- Circuit breakers for external services
- Retry logic in Temporal workflows
- Graceful degradation for non-critical features
Future Architecture Plans
- Multi-region deployment for global latency optimization
- GraphQL federation for more flexible API access
- Event sourcing for complete audit trail
- Service mesh for advanced traffic management