Architecture

Football Infrastructure is designed for high-throughput, real-time event processing with analytics capabilities.

System Overview

                                    ┌─────────────────────────────────────────────────────────────┐
                                    │                        AWS Cloud                             │
                                    │  ┌────────────────────────────────────────────────────────┐ │
                                    │  │                    Docker Swarm                         │ │
┌──────────────────┐                │  │                                                        │ │
│   Mobile Apps    │                │  │  ┌─────────┐    ┌─────────┐    ┌──────────┐          │ │
│   Web Clients    │───HTTPS───────▶│──│──│ Traefik │───▶│ go-api  │───▶│  Kafka   │          │ │
│   Smart TVs      │                │  │  │  (SSL)  │    │(3 repl) │    │ (KRaft)  │          │ │
│   100K+ viewers  │                │  │  └─────────┘    └─────────┘    └────┬─────┘          │ │
└──────────────────┘                │  │                                      │               │ │
                                    │  │                                      ▼               │ │
                                    │  │  ┌─────────┐    ┌─────────┐    ┌──────────┐          │ │
                                    │  │  │ Grafana │◀───│Promethe.│◀───│go-consum.│          │ │
                                    │  │  │         │    │         │    │(2 repl)  │          │ │
                                    │  │  └─────────┘    └─────────┘    └────┬─────┘          │ │
                                    │  │                                      │               │ │
                                    │  │                                      ▼               │ │
                                    │  │                              ┌──────────────┐        │ │
                                    │  │                              │  ClickHouse  │        │ │
                                    │  │                              │   (OLAP DB)  │        │ │
                                    │  │                              └──────────────┘        │ │
                                    │  └────────────────────────────────────────────────────────┘ │
                                    └─────────────────────────────────────────────────────────────┘

Components

API Service (go-api)

The API service handles all incoming HTTP requests from clients.

Responsibilities:

Technology:

Scaling:

Consumer Service (go-consumer)

The consumer processes events from Kafka and persists them to ClickHouse.

Responsibilities:

Technology:

Configuration:

Message Queue (Kafka)

Apache Kafka provides reliable, ordered message delivery.

Configuration:

Topics: | Topic | Purpose | |——-|———| | football_simulator.events | Match events (goals, passes, fouls) | | football_simulator.engagements | Viewer engagement events | | football_simulator.retry | Failed events for retry | | football_simulator.dead | Events that exceeded max retries |

Analytics Database (ClickHouse)

ClickHouse provides fast OLAP queries for real-time analytics.

Tables:

Table Purpose Engine
match_events Game events from the field MergeTree
engagement_events Viewer engagement tracking MergeTree
api_events API request/response logging MergeTree
active_sessions Concurrent viewer tracking ReplacingMergeTree

Materialized Views:

Analytics Views:

Reverse Proxy (Traefik)

Traefik handles SSL termination and load balancing.

Features:

Routes:

Monitoring Stack

Prometheus:

Grafana:

Data Flow

Match Event Flow

1. Client sends POST /api/events
2. API validates event structure
3. API produces to Kafka topic: football_simulator.events
4. Consumer batches events (1000 or 5s)
5. Consumer writes batch to ClickHouse match_events table
6. Materialized views update automatically

Engagement Event Flow

1. Client sends POST /api/engagements (batch of events)
2. API validates each event
3. API produces to Kafka topic: football_simulator.engagements
4. Consumer batches events
5. Consumer writes to ClickHouse engagement_events table
6. Materialized views aggregate data in real-time

Query Flow

1. Client sends GET /api/matches/{matchId}/metrics
2. API queries ClickHouse views
3. ClickHouse returns aggregated results
4. API formats and returns JSON response

Deployment Architecture

Development

Docker Compose (bridge network)
├── go-api (1 replica)
├── go-consumer (1 replica)
├── kafka (single broker)
├── clickhouse (single instance)
├── prometheus
├── grafana
└── kafka-ui (debugging)

Production

Docker Swarm (overlay network)
├── traefik (1 replica, manager node)
├── go-api (3 replicas, rolling updates)
├── go-consumer (2 replicas)
├── kafka (1 broker, persistent volume)
├── clickhouse (1 instance, labeled node)
├── prometheus (1 replica, persistent volume)
└── grafana (1 replica, persistent volume)

AWS Infrastructure

VPC (10.0.0.0/16)
└── Public Subnet (10.0.1.0/24)
    └── EC2 Instance (r5.xlarge)
        ├── Docker Swarm Manager
        ├── 300GB gp3 EBS (6000 IOPS)
        └── Elastic IP

ECR Repositories
├── go-api (with image scanning)
└── go-consumer (with image scanning)

IAM Roles
├── EC2 Instance Role (SSM, CloudWatch, ECR)
└── GitHub Actions Role (ECR push, SSM deploy)

Scalability Considerations

Current Limits (Single Node)

Metric Capacity
Concurrent viewers 100K+
Events per minute 10K+
API replicas 3
Consumer replicas 2

Scaling Strategies

Horizontal (add more nodes):

  1. Add worker nodes to Swarm
  2. Scale API replicas: docker service scale football_go-api=10
  3. Scale consumer replicas for more Kafka partitions

Vertical (bigger instance):

Multi-Node Kafka:

  1. Add Kafka broker nodes
  2. Increase topic partitions
  3. Configure replication factor

ClickHouse Cluster:

  1. Add ClickHouse shards
  2. Configure distributed tables
  3. Set up ZooKeeper for coordination

Security

Network Security

Authentication

Container Security

Monitoring & Observability

Metrics (Prometheus)

API Metrics:

Consumer Metrics:

Logs

Health Checks

Endpoint Purpose Interval
/health Basic liveness 30s
/ready Dependency check 30s
/metrics Prometheus scrape 15s

Failure Handling

API Failures

Kafka Failures

ClickHouse Failures