Gateway Horizontal Scaling
The Cornserve Gateway is designed to scale horizontally to handle increased application registration and invocation traffic. Scaling out the gateway means running multiple replicas of the gateway service behind a load balancer, with shared state managed through Kubernetes Custom Resource Definitions (CRDs).
Configuring Replicas
Horizontal scaling is controlled by the replicas field in the gateway deployment configuration. By default, Cornserve runs with a single gateway replica.
To scale out, update the replicas count in kubernetes/kustomize/cornserve/base/gateway/deployment.yaml.
After updating the configuration, apply it to the cluster using kubectl apply -k kubernetes/kustomize/cornserve/base/gateway/.
Load Balancing
The gateway is exposed via a standard Kubernetes Service (ClusterIP). This service does not use sessionAffinity (it defaults to None), meaning that kube-proxy distributes incoming HTTP and WebSocket requests across all available gateway pods using a round-robin strategy.
Because all critical control-plane state is distributed via CRDs, the system does not require sticky sessions for application registration or invocation. A client can register an application through one gateway replica and immediately invoke it through another. The development overlay's NodePort service behaves in the same manner.
Distributed State via CRDs
The gateway uses the Kubernetes API server as its shared source of truth. The CRDs involved fall into two categories:
Pre-existing CRDs (unchanged)
These CRDs existed before the scale-out work and are used as-is:
- AppInstance CRD (
appinstances.cornserve.ai): Stores the authoritative definition for each application, including its unique ID, Python source code, required task keys, streaming flags, and lifecycle state (NOT_READYorREADY). - LatestAppRV CRD (
latestapprvs.cornserve.ai): A singleton resource that tracks the latest resource version (RV) of app-related changes. It acts as a synchronization barrier to ensure all replicas have observed a specific update.
Extended CRD
- UnitTaskInstance CRD (
unittaskinstances.cornserve.ai): This CRD was pre-existing and manages the lifecycle of unit tasks (storingdefinitionRef,config,executionDescriptorName). For scale-out, we extended its schema with ausageRefcountinteger field. This field enables cross-gateway task deduplication and shared reference counting — multiple gateways can share the same physical unit task deployment, and the task is only torn down when the last reference is released.
AppRegistry Watcher
Each gateway replica runs an AppRegistry background watcher that uses the Kubernetes "list then watch" pattern. The watcher maintains a local in-memory cache of AppInstance resources.
On ADDED or MODIFIED events from the Kubernetes API server, the watcher updates its local cache. When an application is requested on a gateway that did not handle the original registration, the gateway uses this cache to discover the app's definition and materialize its execution logic.
Cross-Gateway Task Refcounting
The usageRefcount field on UnitTaskInstance CRs is the core mechanism for safe task lifecycle management across gateway replicas.
Registration (declare_used)
When a gateway registers an app and needs to deploy unit tasks:
- The gateway computes a canonical task key and acquires a Kubernetes
Lease(coordination.k8s.io/v1) for that key. - While holding the lease, it searches existing
UnitTaskInstanceCRs for an equivalent configuration (matchingdefinitionRef,config, andexecutionDescriptorName). - If a matching CR is found: The task is already deployed in the cluster by another gateway. The gateway increments the CR's
usageRefcountand skips the gRPC deployment call to the Resource Manager. - If no matching CR is found: The gateway creates a new CR with
usageRefcount=1and issues a gRPCDeployUnitTaskcall to the Resource Manager. - The lease is released after the critical section completes.
Unregistration (declare_not_used)
When a gateway unregisters an app and releases unit tasks:
- The gateway decrements its local usage counter. If it reaches 0, the gateway proceeds to check the CR.
- The gateway acquires the same per-task Kubernetes
Leaseused by registration. - While holding the lease, it decrements the CR's
usageRefcount. - If the CR refcount reaches 0: The gateway performs gRPC
TeardownUnitTaskand CR deletion while still holding the lease. - If the CR refcount is still > 0: The gateway releases only local state; the shared deployment remains active.
Concurrency Notes
Lease-based serialization removes the lost-update race for concurrent register/unregister operations on the same logical task by enforcing a single writer per task key.
The lock is time-bounded (leaseDurationSeconds), so very long critical sections can still require careful tuning of lease duration and retry behavior.
Invocation on Non-Registering Gateways
When an invocation request lands on a gateway that did not handle the original registration, its local TaskManager state may be empty. Before matching tasks, the gateway calls _ensure_tasks_from_crs() which lists all UnitTaskInstance CRs and reconstructs UnitTask objects via the TaskRegistry. This populates local state on-demand so the invocation can proceed normally.
What Stays Local
While control-plane metadata is shared, certain state remains local to each gateway process:
- WebSocket sessions (
SessionManager): Active sessions are bound to specific TCP connections and cannot be migrated between replicas. - In-process module caches: Each gateway replica independently executes (
exec()) the application source code from the CRD to materialize PythonModuleTypeobjects. - Active streams: HTTP streaming responses and SSE connections are tied to the replica handling the request.
- Transport clients: Local
aiohttpsessions, gRPC channels, and OpenTelemetry exporters are per-process.
Registration Flow
When a client registers an application, the following distributed flow occurs:
- A client sends a
POST /app/registerrequest, which is routed to any gateway replica. - Validation of the source code happens first, followed by the creation of an
AppInstanceCR withstate=NOT_READY. - Next, the gateway requests task deployment through the
TaskManager. For each task, the CR-based refcount mechanism is used (see above) to either reuse an existing deployment or create a new one. - Upon successful deployment, the
AppInstanceCR state is updated toREADYand theLatestAppRVsingleton is bumped. - A call to
sync_watchers()then blocks until the local watcher has caught up to the new resource version. - Finally, the gateway returns success to the client. The application is now ready to be invoked from any replica in the cluster.
Invocation Flow
When an invocation request hits a gateway replica:
- The gateway checks its local cache for the application's materialized module.
- If the module is not found, a lookup occurs in the
AppRegistrycache (populated by the CRD watcher). - When an app is found and its state is
READY, the gateway materializes theModuleTypefrom the stored source code and caches it locally. - If the local
TaskManagerhas no tasks, it synchronizes fromUnitTaskInstanceCRs to populate task state. - Normal invocation logic then proceeds as usual.