December 3, 2021


Your Partner in the Digital Era

Optimizing the Internet Computer Memory System’s Performance | by DFINITY | The Internet Computer Review | Oct, 2021

The Internet Computer blockchain makes it possible for the world’s systems and services to be created using advanced canister smart contracts that run entirely on-chain. Canisters enable autonomous service composability that can drive extraordinary network effects, allowing practically any enterprise to be fundamentally reimagined. Since network Genesis in May, the DFINITY Canister SDK has been used to create thousands of canister smart contracts on the Internet Computer, many of which are complete Web 3.0 dapps.

The rapid growth of canisters and users on the Internet Computer blockchain presents interesting engineering challenges. A recent increase in memory-intensive canisters demonstrated that the memory system had a performance bottleneck under heavy load. This blog post describes performance optimizations from NNS proposal 20461, providing details about scaling and WebAssembly (Wasm) memory.

  1. Increased and more stable finalization: The choppy finalization rate recovered from 0.5 blocks per second to the expected level of 1 block per second.
  2. Improved message execution time: The average message execution time improved by ~3x and the maximum improved by ~10x.
Figure 1. Block finalization rate before and after the rollout of the optimizations. The red line shows the time when the replica was restarted after the upgrade.
Figure 2. The average message execution duration before and after the rollout of the optimizations.
Figure 3. The maximum message execution duration before and after the rollout of the optimizations.

Any implementation of orthogonal persistence has to solve two problems:

  1. How to map the persisted memory into the Wasm memory.
  2. How to keep track of all modifications in the Wasm memory so that they can be persisted later on.

The current implementation uses page protection to solve both problems. When a message starts executing, we divide the entire address range of the Wasm memory into 4KiB chunks called pages. Initially, all pages are marked as inaccessible using the page protection flags of the operating system. This means that the first memory access triggers a page fault, pauses the execution, and invokes our signal handler. The signal handler then fetches the corresponding page from persisted memory and marks the page as read-only. Subsequent read accesses to that page will succeed without any help from the signal handler. The first write access will trigger another page fault, however, and allow the signal handler to remember the page as modified and mark the page as readable and writable. All subsequent accesses to that page (both read and write) will succeed without invoking the signal handler.

Invoking a signal handler and changing page protection flags are expensive operations. Messages that read or write large chunks of memory cause a storm of such operations, degrading the performance of the whole system. This is the performance bottleneck observed under heavy load. Note that the signal handler was written before the launch of the Internet Computer, and its main priority was correctness and not performance.

A naive solution to this problem would be to copy the entire memory after each update message. That would be slow and would use a lot of storage. Thus, the current implementation takes a different route. It keeps the modified memory pages in a persistent tree data-structure called PageDelta that is based on Fast Mergeable Integer Maps. At a regular interval (i.e., every N rounds), there is a checkpoint event that commits the modified pages into the checkpoint file after cloning the file to preserve its previous version. Figure 4 shows how the Wasm memory is constructed from PageDelta and the checkpoint file.

Figure 4. a) The checkpoint file stores the Wasm memory at the last checkpoint. b) Pages modified since the last checkpoint are stored in a persistent data-structure called PageDelta. c) The Wasm memory is constructed lazily by the signal handler by copying the checkpoint file pages and modified pages.

Optimization 1: memory-mapping the checkpoint file

Optimization 2: page tracking in queries

Optimization 3: amortized prefetching of pages