The Story of RSS3 VSL Chain and Cloud Native

As we all know, RSS3's own L2 mainnet has been online and stable for some time. I would like to share with you the story of running your own L2 network and how to integrate it with cloud-native Kubernetes from a DevOps perspective.

Various charts mentioned in the article have been open-sourced on the rss3-network org:
https://github.com/RSS3-Network/helm-charts

Optimism Node Architecture#

Illustrated in the following image:

Optimism (referred to as OP) L2 Network is mainly composed of op-geth and op-node. Similar to the L1 node, geth is still used to store data and provide RPC responses. However, op-node is responsible for building the entire decentralized network and determining data synchronization.

Cloudnative upgrade#

I may be ignorant, and I am not aware of other L2 RPC's cloud-native deployment methods. I also found that other helm charts based on OP are not very user-friendly on GitHub. Therefore, I started to write a set of VSL charts (theoretically applicable to the deployment of other L2 chains, but not tested).

Q1 security auth with JWT#

op-node and op-geth must be in a 1 <-> 1 correspondence mode. Therefore, the Kubernetes Sidecar mode is used, where they are two containers of an RPC Pod, sharing the same network namespace and can access each other through 127.0.0.1.

In the environment variables of op-geth, there is a GETH_AUTHRPC_JWTSECRET, and in the environment variables of op-node, there is a OP_NODE_L2_ENGINE_AUTH that needs to be configured. These two containers communicate using the same jwt secret.

As mentioned above, this auth endpoint always stays at 127.0.0.1, and the port is only open to localhost to solve this problem. Not only that, the management of the jwt token value is also a problem. Therefore, we use the initContainer command openssl rand -hex 32 > {{ .Values.jwt.path }} to randomly generate this problem every time a Pod is created. This ensures security on the network namespace and randomness on the secret.

Q2 idempotent op-node peer id#

In Q1, it was mentioned that we can use the openssl rand command to randomly generate a string of random characters as the jwt secret in each pod, making the communication between op-geth and op-node more secure.

However, op-node has a peer id, and other nodes can set the initial synchronization node address through the fixed OP_NODE_P2P_STATIC. As the operator of the node, it is natural to give each node a fixed peer id.

It is easy to achieve complete randomness, but achieving idempotence without increasing complexity is a problem. After research, it was found that as long as the environment variable OP_NODE_P2P_PRIV_PATH of op-node remains unchanged, the peer id will not change. The problem has changed from fixing a peer id to fixing the content of a file.

Naturally, we can solve this problem by writing a long list of environment variables, such as POD_PRIV_0 to POD_PRIV_N. However, this is not elegant, and it is not possible to create a new pod quickly. Each time, a new environment variable needs to be created and injected into each container because statefulset replicas cannot independently set individual pods.

Fortunately, openssl supports passing parameters as random number seeds. So as long as we ensure that the random number seeds are the same, the generated file contents will also be the same, naturally achieving idempotence. So, what is different for each pod in a statefulset? Yes, it is the information of the pod itself, such as the pod's name and namespace. So we can generate an idempotent seed by using the pod's information and the Kubernetes downward API.

Partial configuration is as follows:

        - name: generate-key
          image: openquantumsafe/openssl3
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: P2P_ROOT_KEY
              valueFrom:
                secretKeyRef:
                  key: {{ .Values.node.p2p.generateSecretKey }}
                  name: {{ .Values.node.p2p.generateSecretName }}
            - name: P2P_GENERATE_KEY
              value: $(P2P_ROOT_KEY)-$(POD_NAMESPACE)-$(POD_NAME)
          command: ["/bin/sh", "-c"]
          args:
            - |
              echo ${P2P_GENERATE_KEY} | openssl sha256 -binary | xxd -p -c 32 > {{ .Values.node.p2p.privateKeyPath }};
              openssl rand -hex 32 > {{ .Values.jwt.path }};

A new environment variable, P2P_ROOT_KEY, is introduced here to ensure that if there are multiple clusters in the same environment, the peer ids will not conflict.

After these operations, we can ensure that each Pod of our L2 RPC has a unique Peer id, and scheduling Pods will not cause id changes.

Q3 HA sequencer#

The third problem is that there can only be one active sequencer responsible for block generation in the L2 network. Therefore, the high availability and block generation settings of the entire sequencer are also a headache.

This is where VSL-Reconcile comes in.

First, we will deploy multiple sequencer nodes with sequencer and admin-api enabled, but they are all in a stopped block generation state by default. Then, reconcile will determine which node should be the current block generation node and route the traffic of the sequencer domain to the block generation node.

This part is inspired by Vault HA mode.

Vault is a secret manager. In high availability mode, only one instance is active, and the others are standby. Therefore, our sequencer nodes are the same.

Referring to Vault's Kubernetes service discovery mode, we also label the sequencer pods to control pod scheduling and traffic switching.

First, we label all sequencer pods with vsl.rss3.io/synced=false, which means that block synchronization has not caught up. Then we check the synchronization progress of the current block and change the label to vsl.rss3.io/synced=true. The service will only select pods with vsl.rss3.io/synced=true as the load, ensuring that when the RPC request traffic is high and the number of pod replicas needs to be increased, the newly created pods will not be included in the requests because they have not finished synchronizing.

Then we select all vsl.rss3.io/synced=true sequencer pods and label them with vsl.rss3.io/active=false. This label means that this pod can become a block generation node.

When reconcile enables the block generation function of a node, vsl.rss3.io/active will become true, and the load balancer of the sequencer domain will have a label selector vsl.rss3.io/active=true. This ensures that no matter which pod becomes the current block generation node, traffic can be correctly and automatically routed to the pod.