Upgradeable Smart Contracts: In Defense of delegateCall()

If privacy-preserving solutions are unaffordable, the security discussion is already a non-starter.
In the weird world of upgradeable smart contracts (or have we moved to calling them "persistent scripts" yet?), three possible patterns form a great majority of implementations: migrations (though these are sometimes positioned as an alternative to upgrades), the "proxy pattern", and the "data separation pattern." (Another possibility is that every upgrade is a hard fork, though it's controversial to call this pattern "upgradeable", and is in any case beyond the scope of this post.)
At NuCypher, we have opted for the proxy pattern, although we are aware that there are substantial security and implementation pitfalls inherent in this pattern.
At the core of the proxy pattern is the use of delegateCall
, which makes a naked call to a target contract, keeping the context of the calling contract, with no expensive checks or mappings in between. This makes delegateCall
cheap but potentially quite dangerous.
To get a primer on some of the basics of the proxy pattern and its variants, you might start with this ZeppelinOS blog post, which in turn links to their documentation and codebase.
Trail of Bits lays out the other side of the argument in an excellent blog post here, which discusses the dangers of the proxy pattern and the risk that delegateCall
may introduce critical bugs during an upgrade.
In short: the proxy pattern, and particularly the use of delegateCall
, confers both otherwise unattainable benefits and distinct risks.
The proxy pattern at NuCypher
I sat down to discuss these tradeoffs with my colleague Victoria, who has done a lot of the technical and emotional labor to allow us to adopt this pattern in a way that we think is both economically sensible and considerate of the risks that we need to mitigate.
"In data separation you need to always access data in another contract. You can't just work with data right there in your code," says Vicky, a native Russian-speaker. "If we aren't so optimal about creating policies, it will be a big increase of the price of using our network - two times I think; maybe more. And it's pure waste - it's like burning oil when a green energy option is available."
Let's pause for a moment to underscore the stakes here. History tells us with some clarity that as the price of privacy-preserving tech increases, the incidence of insecure, cost-saving workarounds also increases. If there is no affordable privacy layer for the blockchain - and soon - we are sentencing future beginners to an experience of insecurity. That's the future NuCypher was born to prevent.
On contract upgrades and future features
Central to our activities at NuCypher is our ongoing practice of researching cryptographic primitives which we think can be best implemented in a Byzantine fault-tolerant and collusion resistant environment (for which we currently use the Ethereum blockchain). Our first offering is our Proxy Re-Encryption network (underwritten by our cryptological character Ursula, who performs the PRE operation). However, we're excited by the prospect of additional cryptographic primitives and functionality being added to the NuCypher network in the future, through a decentralized governance mechanism.
In other words, contract upgrades are a particularly sensitive component of our offering, and one that Vicky believes is far better served by delegateCall
than the other patterns.
"We have in StakingEscrow
, for example, one structure with ten fields. Now imagine we want to extend this as the features of our network expand. With the data separation approach, these fields mean additional round-trips via call
, and each of these will add at least 2,000 gas beyond what our users will pay via delegateCall
."
If we (and other similarly situated projects) are gun-shy about adding the features that will best support the privacy rights of our users, we will have failed at our mission from the first deploy.
delegateCall in the wild at LivePeer
We are not alone in putting this particular sensibility to work - late last year, we started noticing similar development occurring at LivePeer, a project with whom we tend to (inadvertently or otherwise) cluster at various conferences and hackathons.
I caught up with LivePeer's Yondon Fu, who has authored quite a bit of their contract upgrade infrastructure.
"We initially decided to use the delegatecall proxy pattern for the Livepeer contracts because we thought it resulted in less code complexity," Yondon told me via Discord. He went on to point out that "if you're using a storage contract that serves as a generic KV store then in order to use structs in your logic contract you'll have to translate the struct schema into corresponding writes to the storage contract," echoing Vicky's concerns.
I asked Yondon if, after performing a couple of mainnet contract upgrades, he still felt good about the decision to use delegateCall
.
"We've been happy with the pattern selection thus far because it has enabled us to deploy a few upgrades to our contracts some of which were feature additions and some of which were bug fixes. The latter was particularly important and we knew that some type of a bug was inevitably going to arise given that we were deploying an alpha version of the software."
Mitigating the risks
The pressing question for us, then, is, how do we use Vicky's "green energy option" in a way that mitigates the risks we know are inherent in the proxy pattern, and particularly with delegateCall?
Yondon posed a similar question during our exchange, saying "up until now, we've mainly been very careful and thorough in our upgrade process. This process has been very manual though (in particular the storage layout validation) and going forward we would like to automate these storage layout checks."
The centerpiece of our mitigation strategy at NuCypher is the way that we verify the layout of our previously-deployed and candidate contracts prior to an upgrade. We deploy candidate contracts to the blockchain and then we perform checks to verify compatibility between the layout of the previously-deployed contract and the candidate contract.
On layout verification, Victoria says: "We aren't merely checking storage - the check itself is on-chain. You always check only with bytecode, with direct access for slots. One minus: it's costly because you pay for the on-chain parts of this verification."
"We are not 100% safe - nobody is. We can't check everything, but we can check the main slots in the storage layout. And again, because the check is on-chain, this means that even if we put an incompatible contract in our repo, we won't be able to upgrade to it even by force."
Best practices for writing upgradeable smart contracts in a secure way are still emerging and very much unclear. While we think the proxy pattern best fits our project's requirements, we also recognize that the other patterns make sense in other places.
For our part, we will continue to pursue our mission by ensuring that privacy-preserving solutions are not only available but affordable.
We're hopeful that this post will contribute to the broader conversation and help us get to a point where we're all more comfortable writing, deploying, and upgrading smart contracts.