September 14th 2021 - Post Mortem Japanese Translation
Background on finalization
Finality is an important concept in blockchain that guarantees that a transaction cannot be changed, cancelled, or revoked. Bitcoin and most chains use probabilistic finalization there. In Symbol, we have both probabilistic and deterministic finalization as separate functions. That is, a transaction is considered final when it is included in a finalized block.
To finalize a block, 2/3 of the active Voting nodes must vote on a hash of the block. Voters are free to participate in that finalization process. Membership changes become effective at the next epoch. Also, at the end of each phase, active voting rights are calculated and that amount is used as the denominator in the next epoch.
Issue
On Sunday, September 12, we noticed that the network was having problems with the finalization process for epoch 361. This epoch was not unexpected, as we expected this epoch to be a heavy load for the finalization process.
The Voting (voting) node, which registered its voting keys using symbol-bootstrap at network startup, had registered keys for epochs 1 through 360. Many users did not register keys beyond epoch 361 because the importance of updating voting keys regularly was not well known. As a result, many nodes that had voted at epoch 360 did not vote at epoch 361.
Nevertheless, there was still enough consensus to proceed to the next epoch (362). Unfortunately, we stalled here. Several more large nodes stopped voting, and the network was unable to reach agreement on more than a majority of voting nodes.
It appeared to be related to XYM holdings of expired Voting nodes, which is not desirable. Upon investigation, we found that the client properly handled the XYM holdings of accounts that had registered updated voting keys without counting them, but did count the XYM holdings of accounts that had not updated their voting keys. The data stored in the data showed immediately that there was a glitch. If it had been working as designed, the total vote balance should have decreased significantly, but there was little change there.
Primary Fix
We fixed this bug by excluding accounts with no voting key registered in the current epoch. Also, the supermajority threshold was lowered from 70% to 67%. With these changes, the finalization process will resume as soon as this fix is distributed to supermajority voting accounts.
A hard fork will occur in Block 528,000. This is necessary to correct the calculation of the statistics stored in the Importance Block (since it needs to be verified on all nodes).
Since definitive finalization is implemented as a single function in Symbol, the chain continues to progress while this definitive finalization is in abeyance. Once this finalization is finalized, we expect all blocks to be synchronized quickly.
Secondary fixes
When a Voting Voting node finishes using its voting key, the node crashes by default. To alleviate this problem, we have changed it so that it is simply logged as an error. This means less noise, however it is not recommended to run nodes in this state for extended periods of time. Nodes that wish to stop voting should be restarted after setting enableVoting to false.
Multisig accounts have always had the inconvenience of requiring another account (the parent account) to pay the fee. This restriction will be relaxed so that in forked blocks, the multisig account can indirectly pay the fee by remitting the fee to the parent account. Note that this does not change the sending behavior of partial transactions that require hash locking.
Future work
In the current version of Symbol, there is no reward or penalty for Voting voting nodes. While this greatly simplifies the implementation of finalization for launch, it does not create the correct incentive we intended for Voting voting nodes. We plan to add an appropriate system to reward voters in the near future.
We realize that we need to invest in better tools. Being able to monitor the number of active Voting Voting Nodes and the drop in turnout would have alerted us to problems before the final decision process stalled. In the coming months, we will be prioritizing infrastructure, including a new block explorer, an improved REST API, and data analysis that is easy to understand and useful for users and researchers. If you would like to work with us, join us on Discord.