Technical Specification

This document describes the the technical specification of Phala Network including the overall protocol and the detailed data structure and algorithm. It’s still a work-in-progress.

Blockchain Entities

In Phala Network, there are three kinds of entities:

  • Client, which operates on normal devices without any special hardware requirement;
  • Worker, which operates on TEE (Trusted Execution Environment) and serves as the computation nodes for confidential smart contracts;
  • Gatekeeper, which operates on TEE and serves as the authorities and key managers;

We present the interactions between different entities as follow.

Phala Network

The basic design of Phala Network meant to ensure the security and confidentiality of the blockchain and the smart contracts on it. With the introducing of more security improvements, Phala Network is able to defend advanced attacks.

Entity Key Initialization

In Phala, the communication between any entities should be encrypted, so each entity generates the following entity key pairs with pseudorandom number generator during initialization:

  1. IdentityKey
    • a secp256k1 key pair to uniquely identify an entity;
  2. EcdhKey
    • a secp256r1 key pair for secure communication;

[Improvement]

Switch both IdentityKey and EcdhKey to sr25519 in the future. The Ristretto group has a good ecosystem in Rust (which can be easily compiled to WASM) and JavaScript, even with ECDH support.

For clients, the key pairs are generated by the user’s wallet. While for workers and gatekeepers, the key pairs are fully managed by pRuntime and their usage is strictly limited.

pRuntime Initialization

During initialization, pRuntime automatically generates the entity key pairs above with pseudorandom number generator. The generated key pairs are managed in pRuntime in TEE, which means the workers and gatekeepers can only use it with the limited APIs exported by pRuntime, and can never gain the plaintext key pairs to read the encrypted data out of TEE.

The generated key pairs can be locally encrypted and cached on the disk by SGX Sealing, and can be decrypted and loaded when restarting. This applies to both gatekeepers and workers.

Secure Communication Channels

The EcdhKey public key in the pRuntime of a worker or gatekeeper is publicly available. Therefore a ECDH key agreement protocol can be applies to establish a secure communication channel between a worker (or a gatekeeper) and any other entity non-interactively.

A channel between two entities A and B is denoted as $Channel(Pk_A, Pk_B)$, where $Pk_A$ and $Pk_B$ is the public key of their ECDH key pairs correspondingly. A shared secret can be derived from one’s ECDH private key and the counterpart’s public key via Diffie Hellman algorithm. Then the final communication key CommKey(A, B) can be calculated via a one-way function. CommKey(A, B) is used to encrypt the messages between the two entities.

In pre-mainnet, the EcdhKey is a secp256r1 key pair. We can adopt the child key derivation (CKD) functions from Bitcoin BIP32 to derive CommKey(A, B) from the key agreed by ECDH.

The messages are E2EE with aes-gcm-256.

[Improvement] When we switch to sr25519, we should adopt Substrate’s key derivation algorithm instead of BIP32.

The public key of the entities are registered on-chain. So we can build on-chain or off-chain communication channels:

  • On-chain Communication
    1. Both A and B knows each other’s public key from the blockchain. They can derives CommKey(A, B);
    2. A posts a cipher message encrypted by CommKey(A, B) to the blockchain;
    3. B receives it, and decrypt it with CommKey(A, B);
  • Off-chain (A is off-chain and B is an on-chain worker) Communication
    1. A can learn B’s public key from the blockchain and derive CommKey(A, B);
    2. A learns the API endpoint of B from its WorkerInfo in WorkerState on chain;
    3. A sends a signed cipher message (encrypted by CommKey(A, B)) with its public key to B directly;
    4. B gets A’s public key from the message, and derive CommKey(A, B) to decrypt it;

Client-worker Payload Example

A client communicates with a worker only for contract invocation. An invocation is consisted of at least the following payloads.

1{
2    from: Client_IdentityKey,
3    payload: {
4        to: Contract_IdentityKey,
5        input: "0xdeadbeef",
6    },
7    nonce: 12345,
8    sig: UserSignature,
9}
  • nonce is necessary for defending against Double-spend and Replay Attack.
  • from field shows the identity of caller, and can be verified with sig. from will be further passed to contract.
  • Since a worker can run multiple contracts (or even different instances of the same contract), to is needed to specify the invocation target.
  • input encodes the invoked function and arguments, it should be serialized according to the ABI of contracts.

[Improvement]

Serialization

Currently the paylaods are serialized in browser-friendly JSON, but it’s very space inefficient. Use some compact binary format instead (e.g. Protobuf, parity-scale-codec).

EcdhKey Rotation

Unlike the IdentityKey which shows the identity of a worker or gatekeeper thus should not be changed, we recommend a regular rotation of the EcdhKey to ensure the security of the communication channels between different entities. In the future, pRuntime will automatically rotate the managed EcdhKey key after certain time interval.

Worker

Worker Registration

Registration is required before a worker or gatekeeper can join the network. Any parties with TEE-supported devices can serve as workers. To register as a validated worker in the blockchain, TEE runners need to run pRuntime and let it send a signed attestation report to gatekeepers.

pRuntime requests a Remote Attestation with a hash of the WorkerInfo committed in the attestation report. WorkerInfo includes the public key of IdentityKey and EcdhKey and other data collected from the enclave. By verifying the report, gatekeepers can know the hardware information of workers and ensure that they are running unmodified pRuntime.

Remote Attestation

The attestation report is relayed to the blockchain by register_worker() call. The blockchain has the trusted certificates to validate the attestation report. It validates:

  1. The signature of the report is correct;
  2. The embedded hash in the report matches the hash of the submitted WorkerInfo;

register_worker() is called by workers, and a worker can only be assigned contracts when it has certain amounts of staking PHA tokens. On the blockchain there is a WorkerState map from the worker to the WorkerInfo entry. Gatekeepers will update the WorkerState map after they receive and verify the submitted WorkerInfo.

Offline Worker Detection

The pRuntime of a worker is regularly required to answer the online challenge as a heartbeat event on chain. The blockchain detects the liveness of workers by monitoring the interval of their heartbeat events. A worker is punished with the penalty of his staking tokens if it goes offline during contract execution

[Improvement]

Contract deployers are allowed to set a configurable timeout of the contract execution. Accordingly, an execution is regarded to be failed if the worker fails to provide the results within the timeout. A minor penalty (compared to offline penalty) needs to be payed.

[TODO: Current Impl]

Now we have a random challenge for workers. If a worker correctly answers the challenge, it gets rewarded (based on the tokenomics). Otherwise if it fails to respond within a time window, it gets slashed.

Subject to changes: we may want to 100% occupy the CPU cycles.

Gatekeeper

Gatekeeper Election

Gatekeepers share the same pRuntime as normal workers. To distinguish gatekeepers, their IdentityKey public keys are recorded in the GatekeeperState list on blockchain.

In the pre-mainnet of Phala Network, the list of gatekeepers is hard-coded in the genesis block of the blockchain.

[Improvement]

Gatekeepers are elected on blockchain by NPoS mechanism similar to Polkadot. This is done by Staking pallet, where nominators can stake their tokens, and vote for their trusted gatekeepers. Once a gatekeeper is elected, both itself and the nominators can get PoS reward from PHA inflation.

MasterKey Generation

MasterKey is used to derive the keys to encrypt the states of confidential smart contracts and communicate. In Phala Network, only the pRuntime of a gatekeeper is authorized to manage the MasterKey. Noted that since MasterKey is managed by pRuntime and its usage is limited, even a malicious gatekeeper cannot decrypt any contract states without fully compromising the TEE and pRuntime.

MasterKey is a secp256k1 key pair generated and managed by gatekeepers.

[Improvement]

Switch to sr25519 in the future.

In the pre-mainnet of Phala Network, all the gatekeepers share the same pre-generated MasterKey.

[Improvement]

Introduce DKG (distributed key generation) so that more than one gatekeepers are required to produce MasterKey, and each gatekeeper only hold a share of the key. When DKG is enabled, the contract key shares are provisioned to the workers by the gatekeepers separately.

[Improvement] Rotation of Shared MasterKey

Similar to the rotation of EcdhKey, the MasterKey needs to be rotated regularly to achieve forward secrecy, and defend any attempts to leak MasterKey and decrypt the contract states.

The rotation of MasterKey is triggered after certain interval of block height. The key to MasterKey rotation is the re-encryption of saved contract states. It may take several blocks to complete. MasterKey rotation is consisted of the following steps:

  • Gatekeepers generate new MasterKey;
  • Gatekeepers use old MasterKey to decrypt the saved contract states, and use new MasterKey to encrypt them in parallel;
  • The old MasterKey and saved contract states are abandoned;

Again, since all these operations happen inside pRuntime in TEE, the gatekeepers themselves cannot take a peek at the contract states.

State Migration

We must make sure the data can be migrated to a new version of blockchain and pRuntime without revealing the contents. The state migration is triggered by on-chain governance decision denoted by an event, and can be achieved in the same way we proposed for MasterKey rotation.

Confidential Smart Contracts

Contract Key Generation

A client should upload the signed contract code with code hash to the blockchain. Once a client uploads a confidential contract to the blockchain, it emits an event ContractUploaded(deployer_id, code_hash, sequence). Gatekeepers keep listening to such events, and generate a contract key for each newly deployed contract.

The contract key is generated by a KDF (key derivation function). In pre-mainnet, we adopt the child key derivation (CKD) functions from Bitcoin, and extra data like deployer_id serves as entropy during key derivation:

$$ ContractKey_{deployer\_id, code\_hash, sequence} = KDF(MasterKey, deployer\_id, code\_hash, sequence) $$

The following keys are needed for a contract, and are derived from the ContractKey:

  • IdentityKey
    • a secp256k1 key pair, used to sign the output messages of the contract;
  • EcdhKey
    • a secp256r1 key pair, used to encrypt the IO to the contract (including both Commands and Queries);
  • StorageKey
    • an aes-256-gcm key. StorageKey can be generated in the same way as EcdhKey by introducing extra nonce, it is used to encrypted the contract states (a.k.a. key-value pairs in contract storage);

In the pre-mainnet of Phala Network, the contract keys above do not need to be stored inside gatekeepers' pRuntime storage because it is easy to generate them on-the-fly.

When the key is generated, the ContractKey public key is included in the ContractInfo. ContractInfo should also include the identity of contract deployer, sequence, contract code hash and (optional) contract source code. Gatekeepers can easily reproduce ContractKey on-the-fly given the ContractInfo (since MasterKey is managed by them) for future verification and migration.

Contract Key Provision

To assign a contract to a worker, gatekeepers first retrieve the ContractInfo of the contract and generates ContractKey on-the-fly.

Gatekeepers will only provision the keys to the qualified workers. It establishes a on-chain secure communication channel and passes ContractInfo and ContractKey key pairs.

[Improvement]

Allow the deployers to specify the hardware requirements and number of replications in ContractInfo and gatekeepers should assign desired workers.

[TODO]

Allow to create worker whitelist (subnet), allow users to choose which list to deploy contracts. All workers in a subnet are replicates.

Gatekeepers emit a ContractDeployed(Worker_IdentityKey, ContractKey) event (multiple events should be emitted if there are multiple workers). We keep a ContractState map from ContractKey to the worker on chain. Gatekeepers will keep the ContractState map up-to-date so deployers can locate the assigned workers.

Commands Invocation

A client takes the following step to send Commands to contract:

  1. Use contract’s EcdhKey key and client’s private key to apply Ecdh;
  2. Use the generated key to encrypt invocation data;
  3. Post a ContractCommand(ContractKey, Client_IdentityKey, encrypted_data) event on chain;

Noted that since the invocation data is encrypted with a secret generated by client’s private key and contract’s public key, only executed contract itself (not assigned worker) can decrypt the invocation data. Also, new workers can be assigned for re-execution if previous assigned worker is offline.

The worker should keep listening to the ContractCommand events to the contract after deployment.

Queries Invocation

A client can post a ContractQuery(query_id, ContractKey, Client_IdentityKey, encrypted_data) in the same way as above. And workers should listen such events and return a ContractReturn(return_id, query_id, encrypted_return_value) accordingly.

[Improvement]

We will include the API endpoint of workers in WorkerState. A client can directly get the workers of certain contract by listening to ContractDeployed events, then it establishes a secure communication channel with workers and send queries.

[TODO]

Ensure the connectivity of workers.

[TODO]

How to punish inaccessible workers? Before we have a way to make connectivity stable, we must either use whitelist (user can trust the operators), or allow querying on the blockchain (very high latency and expensive).

Contract Execution

The key to confidential contract execution is the decryption and update of contract states (a.k.a. all the key-value pairs in contract storage).

For now, we prefer to adopt the following confidential storage model: the key and value are first encrypted by StorageKey of the contract and then inserted into the trie storage, so the underlying database engine can be agnostic about the encryption. Each key-value pair is encrypted with different key derived from StorageKey:

  • $StorageKey_{key} = KDF(StorageKey, key)$

State Update

The state update written back to chain need to be signed with ContractKey, so a worker cannot provide false state update without cracking TEE and pRuntime. A trivial solution for state update is that worker writes back all the updates of key-value pairs after the execution of a command. Worker should also update the timestamp (i.e., block height of the latest-processed transaction) in storage so we can know which transactions have already been processed.

This solution applies to the situation when a little contracts are deployed and limited commands are processed.

[Improvement]

With the increasing number of contracts, certain cache mechanism is necessary, i.e., the worker can cache and merge all the changes to contract state (by keys, only the latest value of a key is preserved) and only update the Substrate storage after certain interval. The interval needs to be carefully chosen to prevent all the workers update states in the same block (e.g., 13 or 17 blocks). Noted that the longer the interval is, the less update will be applied to the storage, while there will be more replay tasks if the worker is down.

[Improvement]

State Update Arbitration

The observable state update conflicts of multiple workers are automatically handled by consensus. If only one worker is executing the contract, it can provide wrong state update without being noticed. Noted that this can only achieved if the malicious worker manages to extract the ContractKey from pRuntime. In this case, we allow the contract deployers to launch an arbitration against suspicious state updates by posting a ArbitrationRequest on chain within limited time window. The gatekeepers will keep listening to such requests and assign extra workers for re-execution and validation. If the state update is proved to be wrong, gatekeepers will vote for the right final state and slash the malicious workers.

State Decryption

Since the pRuntime of worker receive the ContractKey from gatekeepers during contract deployment, and it’s used to recover IdentityKey, EcdhKey, and StorageKey, it can decrypt any key-value pairs of the contract in trie. Noted that the usage of ContractKey for decryption is totally managed by pRuntime, and only contract code in WASM interpreter in pRuntime can access the plaintext.

When a worker tries to resume the execution of a contract, it first needs to fetch the latest state of a contract from blockchain. We can fetch all key-value pairs for now.

[Improvement]

Introduce cache mechanism. The cache mechanisms, such as the one based on locality of FIFO, can be chosen after we evaluate the common access pattern of contracts in pre-mainnet. Also, we can allow the developers to choose the most appropriate way.

Since the contract state is stored with timestamp, the worker only needs to replay the transactions after that.

[Improvement]

ContractKey Rotation

The key rotation mechanism of Phala Network is crucial to the security and confidentiality of smart contracts. By combining the random assignment of contracts and key rotation, Phala Network is able to defend known advanced attacks against Intel SGX since attackers have to locate the target and leak the secret within limited time window. Also, forward secrecy is promised with rotation mechanism even if some secrets are leaked.

Key Rotation

In the pre-mainnet of Phala Network, the MasterKey and ContractKey are bound, and any ContractKey can be generated on-the-fly given the MasterKey and according ContractInfo. While in key rotation, we detach the mapping between MasterKey and ContractKey and rotate them separately. That is, ContractKey is rotated after certain epoch by increasing the sequence number in key generation. After the ContractKey is generated by MasterKey, it is stored in the gatekeepers and will not be affected by the following rotation of MasterKey. The gatekeepers keep a certain number of the latest ContractKeys and rotate the MasterKey after certain time interval.

The detach of MasterKey and ContractKey means that historical ContractKeys cannot be re-generated after the rotation of MasterKey, thus ensuring forward secrecy, while this also requires extra mechanism to ensure the availability of ContractKey list: a re-election of gatekeepers and the backup of the ContractKey list is immediately triggered if 1/3 of the gatekeepers are down.

Performance Optimization for MasterKey Rotation

Since MasterKey is distribution key, the rotation of MasterKey requires several rounds of communication between gatekeepers, thus can be expensive. By leveraging the homomorphism property, such cost can be greatly reduced. For example, one of the most widely-used DKG algorithm is Shamir’s Secret Sharing, and it has been proved to be (+, +)-homomorphic in 1998. This means we can rotate the Shamir’s secrets without any communication between gatekeepers under proper design.

Rolling Re-encryption of Contract States

The rotation of ContractKey requires the regular re-encryption of contract states. To minimize the performance impact of state re-encryption, we try to amortize the cost into epochs. That is, the key-values pairs of contracts are encrypted with the ContractKey of current epoch. Workers can get the historical ContractKeys from the key list of gatekeepers. After certain number of epochs (e.g., 1000 epochs), only the untouched key-value pairs encrypted with outdated ContractKeys need to be re-encrypted.