Possible futures of the Ethereum protocol, part 5: The Purge
2024 Oct 26
See all posts
Possible futures of the Ethereum protocol, part 5: The Purge
Special thanks to Justin Drake, Tim Beiko, Matt Garnett, Piper
Merriam, Marius van der Wijden and Tomasz Stanczak for feedback and
review
One of Ethereum's challenges is that by default, any blockchain
protocol's bloat and complexity grows over time. This happens in two
places:
- Historical data: any transaction made and any
account created at any point in history needs to be stored by all
clients forever, and downloaded by any new clients making a full sync to
the network. This causes client load and sync time to keep increasing
over time, even as the chain's capacity remains the same.
- Protocol features: it's much easier to add a new
feature than to remove an old one, causing code complexity to increase
over time.
For Ethereum to sustain itself into the long term, we need a strong
counter-pressure against both of these trends, reducing
complexity and bloat over time. But at the same time, we need
to preserve one of the key properties that make blockchains
great: their permanence. You can put an NFT, a love note in
transaction calldata, or a smart contract containing a million dollars
onchain, go into a cave for ten years, come out and find it still there
waiting for you to read and interact with. For dapps to feel comfortable
going fully decentralized and removing their upgrade keys, they need to
be confident that their dependencies are not going to upgrade
in a way that breaks them - especially the L1 itself.
The Purge, 2023 roadmap.
Balancing between these two needs, and minimizing or reversing bloat,
complexity and decay while preserving continuity, is absolutely possible
if we put our minds to it. Living organisms can do it: while most age
over time, a lucky few
do not. Even social systems can have
extreme longevity. On a few occasions, Ethereum has already shown
successes: proof of work is gone, the SELFDESTRUCT
opcode
is mostly gone, and beacon chain nodes already store old data up to only
six months. Figuring out this path for Ethereum in a more generalized
way, and moving toward an eventual outcome that is stable for the long
term, is the ultimate challenge of Ethereum's long term scalability,
technical sustainability and even security.
The Purge: key goals
- Reducing client storage requirements by reducing or
removing the need for every node to permanently store all history, and
perhaps eventually even state
- Reducing protocol complexity by eliminating
unneeded features
In this chapter
History expiry
What problem does it solve?
As of the time of this writing, a full-synced Ethereum node requires
roughly
1.1 terabytes of disk space for the execution
client, plus another few hundred gigabytes for the consensus client.
The great majority of this is history: data about historical
blocks, transactions and receipts, the bulk of which are many years old.
This means that the size of a node keeps increasing by hundreds of
gigabytes each year, even if the gas limit does not increase at all.
What is it, and how does it
work?
A key simplifying feature of the history storage problem is that
because each block points to the previous block via a hash link (and other
structures),
having consensus on the present is enough to have consensus on history.
As long as the network has consensus on the latest block, any historical
block or transaction or state (account balance, nonce, code, storage)
can be provided by any single actor along with a Merkle proof, and the
proof allows anyone else to verify its correctness. While consensus is
an N/2-of-N trust model, history is a 1-of-N
trust model.
This opens up a lot of options for how we can store the history. One
natural option is a network where each node only stores a small
percentage of the data. This is how torrent networks have worked for
decades: while the network altogether stores and distributes millions of
files, each participant only stores and distributes a few of them.
Perhaps counterintuitively, this approach does not even necessarily
decrease the robustness of the data. If, by making node running
more affordable, we can get to a network with 100,000 nodes, where each
node stores a random 10% of the history, then each piece of data would
get replicated 10,000 times - exactly the same replication factor as a
10,000-node network where each node stores everything.
Today, Ethereum has already started to move away from the model of
all nodes storing all history forever. Consensus blocks (ie. the parts
related to proof of stake consensus) are only stored for ~6 months.
Blobs are only stored for ~18 days. EIP-4444 aims to
introduce a one-year storage period for historical blocks and receipts.
A long-term goal is to have a harmonized period (which could be ~18
days) during which each node is responsible for storing everything, and
then have a peer-to-peer network made up of Ethereum nodes storing older
data in a distributed way.
Erasure codes can be used to increase robustness while keeping the
replication factor the same. In fact, blobs already come erasure-coded
in order to support data availability sampling. The simplest solution
may well be to re-use this erasure coding, and put execution and
consensus block data into blobs as well.
What are some links to
existing research?
What is left to
do, and what are the tradeoffs?
The main remaining work involves building out and integrating a
concrete distributed solution for storing history - at least execution
history, but ultimately also consensus and blobs. The easiest solutions
for this are (i) to simply introduce an existing torrent library, and
(ii) an Ethereum-native solution called the Portal network. Once either of
these is introduced, we can turn EIP-4444 on. EIP-4444 itself does
not require a hard fork, though it does require a new network
protocol version. For this reason, there is value in enabling it for all
clients at the same time, because otherwise there are risks of clients
malfunctioning from connecting to other nodes expecting to download the
full history but not actually getting it.
The main tradeoff involves how hard we try to make "ancient"
historical data available. The easiest solution would be to simply stop
storing ancient history tomorrow, and rely on existing archive nodes and
various centralized providers for replication. This is easy, but this
weakens Ethereum's position as a place to make permanent records. The
harder, but safer, path is to first build out and integrate the torrent
network for storing history in a distributed way. Here, there are two
dimensions of "how hard we try":
- How hard do we try to make sure that a maximally large set of nodes
really is storing all the data?
- How deeply do we integrate the historical storage into the
protocol?
A maximally paranoid approach for (1) would involve proof
of custody: actually requiring each proof of stake validator to
store some percentage of history, and regularly cryptographically
checking that they do so. A more moderate approach is to set a voluntary
standard for what percentage of history each client stores.
For (2), a basic implementation involves just taking the work that is
already done today: Portal already stores ERA files containing the
entire Ethereum history. A more thorough implementation would involve
actually hooking this up to the syncing process, so that if someone
wanted to sync a full-history-storing node or an archive node, they
could do so even if no other archive nodes existed online, by syncing
straight from the Portal network.
How does
it interact with other parts of the roadmap?
Reducing history storage requirements is arguably even more important
than statelessness if we want to make it extremely easy to run or spin
up a node: out of the 1.1 TB that a node needs to have, ~300 GB is
state, and the remaining ~800 GB is history. The vision of an Ethereum
node running on a smart watch and taking only a few minutes to set up is
only achievable if both statelessness and EIP-4444 are
implemented.
Limiting history storage also makes it more viable for newer Ethereum
node implementations to only support recent versions of the
protocol, which allows them to be much simpler. For example, many lines
of code can be safely removed now that empty storage slots created
during the 2016 DoS attacks have all been removed. Now that the
switch to proof of stake is ancient history, clients can safely remove
all proof-of-work-related code.
State expiry
What problem does it solve?
Even if we remove the need for clients to store history, a client's
storage requirement will continue to grow, by around 50 GB per year,
because of ongoing growth to the state:
account balances and nonces, contract code and contract storage. Users
are able to pay a one-time cost to impose a burden on present and future
Ethereum clients forever.
State is much harder to "expire" than history, because the EVM is
fundamentally designed around an assumption that once a state object is
created, it will always be there and can be read by any transaction at
any time. If we introduce statelessness, there is an argument that maybe
this problem is not that bad: only a specialized class of block builders
would need to actually store the state, and all other nodes (even inclusion
list production!) can run statelessly. However, there is an argument
that we don't want to lean on statelessness too much, and
eventually we may want to expire state to keep Ethereum
decentralized.
What is it, and how does it
work?
Today, when you create a new state object (which can happen in one of
three ways: (i) sending ETH to a new account, (ii) creating a new
account with code, (iii) setting a previously-untouched storage slot),
that state object is in the state forever. What we want instead, is for
objects to automatically expire over time. The key challenge is doing
this in a way that accomplishes three goals:
- Efficiency: don't require huge amounts of extra
computation to run the expiry process
- User-friendliness: if someone goes into a cave for
five years and comes back, they should not lose access to their ETH,
ERC20s, NFTs, CDP positions...
- Developer-friendliness: developers should not have
to switch to a completely unfamiliar mental model. Additionally,
applications that are ossified today and do not update should continue
to work reasonably well.
It's easy to solve the problem without satisfying these goals. For
example, you could have each state object also store a counter for its
expiry date (which could be extended by burning ETH, which could happen
automatically any time it's read or written), and have a process that
loops through the state to remove expired state objects. However, this
introduces extra computation (and even storage requirements), and it
definitely does not satisfy the user-friendliness requirement.
Developers too would have a hard time reasoning about edge cases
involving storage values sometimes resetting to zero. If you make the
expiry timer contract-wide, this makes life technically easier
for developers, but it makes the economics harder: developers
would have to think about how to "pass through" the ongoing costs of
storage to their users.
These are problems that the Ethereum core development community
struggled with for many years, including proposals like "blockchain rent"
and "regenesis".
Eventually, we combined the best parts of the proposals and converged on
two categories of "known least bad solutions":
- Partial state-expiry solutions
- Address-period-based state expiry proposals.
Partial state expiry
Partial state expiry proposals all work along the same principle. We
split the state into chunks. Everyone permanently stores the "top-level
map" of which chunks are empty or nonempty. The data within
each chunk is only stored if that data has been recently accessed. There
is a "resurrection" mechanism where if a chunk is no longer stored,
anyone can bring that data back by providing a proof of what the data
was.
The main distinctions between these proposals are: (i) how do we
define "recently", and (ii) how do we define "chunk"? One concrete
proposal is EIP-7736, which
builds upon the "stem-and-leaf" design introduced
for Verkle trees (though compatible with any form of statelessness,
eg. binary trees). In this design, header, code and storage slots that
are adjacent to each other are stored under the same "stem". The data
stored under a stem can be at most 256 * 31 = 7,936
bytes.
In many cases, the entire header and code, and many key storage slots,
of an account will all be stored under the same stem. If the data under
a given stem is not read or written for 6 months, the data is no longer
stored, and instead only a 32-byte commitment ("stub") to the data is
stored. Future transactions that access that data would need to
"resurrect" the data, with a proof that would be checked against the
stub.
There are other ways to implement a similar idea. For example, if
account-level granularity is not enough, we could make a scheme where
each 1/232 fraction of the tree is governed by a similar
stem-and-leaf mechanism.
This is trickier because of incentives: an attacker could force
clients to permanently store a very large amount of state by putting a
very large amount of data into a single subtree and sending a single
transaction every year to "renew the tree". If you make the renewal cost
proportional (or renewal duration inversely-proportional) to the tree
size, then someone could grief another user by putting a very large
amount of data into the same subtree as them. One could try to limit
both problems by making the granularity dynamic based on the subtree
size: for example, each consecutive 216 = 65536 state objects
could be treated as a "group". However, these ideas are more complex;
the stem-based approach is simple, and it aligns incentives, because
typically all the data under a stem is related to the same application
or user.
Address-period-based
state expiry proposals
What if we wanted to avoid any permanent state growth at
all, even 32-byte stubs? This is a hard problem because of resurrection
conflicts: what if a state object gets removed, later EVM execution
puts another state object in the exact same position, but then after
that someone who cares about the original state object comes back and
tries to recover it? With partial state expiry, the "stub" prevents new
data from being created. With full state expiry, we cannot afford to
store even the stub.
The address-period-based design is the best known idea for solving
this. Instead of having one state tree storing the whole state, we have
a constantly growing list of state trees, and any state that gets read
or written gets saved in the most recent state tree. A new empty state
tree gets added once per period (think: 1 year). Older state trees are
frozen solid. Full nodes are only expected to store the most recent two
trees. If a state object was not touched for two periods and thus falls
into an expired tree, it still can be read or written to, but the
transaction would need to prove a Merkle proof for it - and once it
does, a copy will be saved in the latest tree again.
A key idea for making this all user and developer-friendly is the
concept of address periods. An address period is a
number that is part of an address. A key rule is that an address
with address period N can only be read or written to during or after
period N (ie. when the state tree list reaches length N). If
you're saving a new state object (eg. a new contract, or a new
ERC20 balance), if you make sure to put the state object into a contract
whose address period is either N or N-1, then you can save it
immediately, without needing to provide proofs that there was nothing
there before. Any additions or edits to state in older address periods,
on the other hand, do require a proof.
This design preserves most of Ethereum's current properties, is very
light on extra computation, allows applications to be written almost as
they are today (ERC20s will need to rewrite, to ensure that balances of
addresses with address period N are stored in a child contract which
itself has address period N), and solves the "user goes into a cave for
five years" problem. However, it has one big issue: addresses
need to be expanded beyond 20 bytes to fit address periods.
Address space extension
One proposal is to introduce a new 32-byte address format, which
includes a version number, an address period number and an expanded
hash.
0x01000000000157aE408398dF7E5f4552091A69125d5dFcb7B8C2659029395bdF
The red is a version number. The four zeroes colored orange here are
intended as empty space, which could fit a shard number in the future.
The green is an address period number. The blue is a 26-byte hash.
The key challenge here is backwards compatibility. Existing contracts
are designed around 20 byte addresses, and often use tight byte-packing
techniques that explicitly assume addresses are exactly 20 bytes long.
One
idea for solving this involves a translation map, where old-style
contracts interacting with new-style addresses would see a 20-byte hash
of the new-style address. However, there are significant complexities
involved in making this safe.
Address space contraction
Another approach goes the opposite direction: we immediately forbid
some 2128-sized sub-range of addresses (eg. all addresses
starting with 0xffffffff
), and then use that range to
introduce addresses with address periods and 14-byte hashes.
0xffffffff000169125d5dFcb7B8C2659029395bdF
The key sacrifice that this approach makes, is that it introduces
security risks for counterfactual addresses:
addresses that hold assets or permissions, but whose code has not yet
been published to chain. The risk involves someone creating an address
which claims to have one piece of (not-yet-published) code, but also has
another valid piece of code which hashes to the same address. Computing
such a collision requires 280 hashes today; address space
contraction would reduce this number to a very accessible 256
hashes.
The key risk area, counterfactual addresses that are not
wallets held by a single owner, is a relatively rare case today, but is
likely to become more common as we enter a multi-L2 world. The only
solution is to simply accept this risk, but identify all common use
cases where this may be an issue, and come up with effective
workarounds.
What are some links
to existing research?
What is left to
do, and what are the tradeoffs?
I see four viable paths for the future:
- We do statelessness, and never introduce state
expiry. State is ever-growing (albeit slowly: we may not see it
exceed 8 TB for decades), but only needs to be held by a relatively
specialized class of users: not even PoS validators would need the
state.
The one function that needs access to parts of
the state is inclusion list production, but we can
accomplish this in a decentralized way: each user is responsible for
maintaining the portion of the state tree that contains their own
accounts. When they broadcast a transaction, they broadcast it with a
proof of the state objects accessed during the verification
step (this works for both EOAs and ERC-4337 accounts). Stateless
validators can then combine these proofs into a proof for the whole
inclusion list.
- We do partial state expiry, and accept a much lower
but still nonzero rate of permanent state size growth. This outcome is
arguably similar to how history expiry proposals involving peer-to-peer
networks accept a much lower but still nonzero rate of permanent history
storage growth from each client having to store a low but fixed
percentage of the historical data.
- We do state expiry, with address space
expansion. This will involve a multi-year process of making
sure that the address format conversion approach works and is safe,
including for existing applications.
- We do state expiry, with address space
contraction. This will involve a multi-year process of making
sure that all of the security risks involving address collisions,
including cross-chain situations, are handled.
One important point is that the difficult issues around
address space expansion and contraction will eventually have to be
addressed regardless of whether or not state expiry schemes that depend
on address format changes are ever implemented. Today, it takes
roughly 280 hashes to generate an address collision, a
computational load that is already feasible for extremely well-resourced
actors: a GPU can do around 227 hashes, so running for a year
it can compute 252, so all ~230
GPUs in the world could compute a collision in ~1/4 of a year, and
FPGAs and ASICs could accelerate this further. In the future, such
attacks will become open to more and more people. Hence, the actual cost
of implementing full state expiry may not be as high as it seems, since
we have to solve this very challenging address problem regardless.
How does
it interact with other parts of the roadmap?
Doing state expiry potentially makes transitions from one state tree
format to another easier, because there will be no need for a transition
procedure: you could simply start making new trees using a new format,
and then later do a hard fork to convert the older trees. Hence, while
state expiry is complex, it does have benefits in simplifying other
aspects of the roadmap.
Feature cleanup
What problems does it solve?
One of the key preconditions of security, accessibility and credible neutrality
is simplicity. If a protocol is beautiful and simple, it reduces the
chance that there will be bugs. It increases the chance that new
developers will be able to come in and work with any part of it. It's
more likely to be fair and easier to defend against special interests.
Unfortunately, protocols, like any social system, by default become more
complex over time. If we do not want Ethereum to go into a black hole of
ever-increasing complexity, we need to do one of two things: (i) stop
making changes and ossify the protocol, (ii) be able to
actually remove features and reduce
complexity. An intermediate route, of making fewer
changes to the protocol, and also removing at least a little
complexity over time, is also possible. This section will talk how we
can reduce or remove complexity.
What is it, and how does it
work?
There is no big single fix that can reduce protocol complexity; the
inherent nature of the problem is that there are many little fixes.
One example that is mostly finished already, and can serve as a
blueprint for how to handle the others, is the removal of the
SELFDESTRUCT opcode. The SELFDESTRUCT opcode was the only
opcode that could modify an unlimited number of storage slots within a
single block, requiring clients to implement significantly more
complexity to avoid DoS attacks. The opcode's original purpose was to
enable voluntary state clearing, allowing the state size to decrease
over time. In practice, very few ended up using it. The opcode was nerfed to
only allow self-destructing accounts created in the same transaction in
the Dencun hardfork. This solves the DoS issue and allows for
significant simplification in client code. In the future, it likely
makes sense to eventually remove the opcode completely.
Some key examples of protocol simplification opportunities that have
been identified so far include the following. First, some examples that
are outside the EVM; these are relatively non-invasive, and thus easier
to get consensus on and implement in a shorter timeframe.
- RLP → SSZ transition: originally, Ethereum objects
were serialized using an encoding called RLP.
RLP is untyped, and needlessly complex. Today, the beacon chain uses SSZ,
which is significantly better in many ways, including supporting not
just serialization but also hashing. Eventually, we want to get rid of
RLP entirely, and move all data types into being SSZ structs, which
would in turn make upgradability much easier. Current EIPs for this
include [1] [2] [3].
- Removal of old transaction types: there are too
many transaction types today, many of them could potentially be removed.
A more moderate alternative to full removal is an account abstraction
feature by which smart accounts could include the code to process and
verify old-style transactions if they so choose.
- LOG reform: logs create bloom filters and other
logic that adds complexity to the protocol, but is not actually used by
clients because it is too slow. We could remove these
features, and instead put effort into alternatives, such as
extra-protocol decentralized log reading tools that use modern
technology like SNARKs.
- Eventual removal of the beacon chain sync committee
mechanism: the sync
committee mechanism was originally introduced to enable light client
verification of Ethereum. However, it adds significant complexity to the
protocol. Eventually, we will be able to verify
the Ethereum consensus layer directly using SNARKs, which would
remove the need for a dedicated light client verification protocol.
Potentially, changes to consensus could enable us to remove sync
committees even earlier, by creating a more "native" light client
protocol that involves verifying signatures from a random subset of the
Ethereum consensus validators.
- Data format harmonization: today, execution state
is stored in a Merkle Patricia tree, consensus state is stored in an SSZ
tree, and blobs are committed to with KZG
commitments. In the future, it makes sense to make a single unified
format for block data and a single unified format for state. These
formats would cover all important needs: (i) easy proofs for stateless
clients, (ii) serialization and erasure coding for data, (iii)
standardized data structures.
- Removal of beacon chain committees: this mechanism
was originally introduced to support a particular
version of execution sharding. Instead, we ended up doing sharding
through
L2s and blobs. Hence, committees are unnecessary, and so there is an
in-progress move
toward removing them.
- Removal of mixed-endianness: the EVM is big-endian
and the consensus layer is little-endian. It may make sense to
re-harmonize and make everything one or the other (likely big-endian,
because the EVM is harder to change)
Now, some examples that are inside the EVM:
- Simplification of gas mechanics: the current gas
rules are not quite well-optimized to give clear limits to the quantity
of resources required to verify a block. Key examples of this include
(i) storage read/write costs, which are meant to bound
the number of reads/writes in a block but are currently pretty
haphazard, and (ii) memory filling rules, where it is
currently hard to estimate the max memory consumption of the EVM.
Proposed fixes include statelessness gas cost
changes, which harmonize all storage-related costs into a simple
formula, and this proposal
for memory pricing.
- Removal of precompiles: many of the precompiles
that Ethereum has today are both needlessly complex and relatively
unused, and make up a large percentage of consensus failure near-misses
while not actually being used by any applications. Two ways of dealing
with this are (i) just removing the precompile, and (ii) replacing it
with a (inevitably more expensive) piece of EVM code that implements the
same logic. This draft EIP
proposes to do this for the identity precompile as a first step;
later on, RIPEMD160, MODEXP and BLAKE may be candidates for
removal.
- Removal of gas observability: make the EVM
execution no longer able to see how much gas it has left. This would
break a few applications (most notably, sponsored transactions), but
would enable much easier upgrading in the future (eg. for more advanced
versions of multidimensional
gas). The EOF spec
already makes gas unobservable, though to be useful for protocol
simplification EOF would need to become mandatory.
- Improvements to static analysis: today EVM code is
difficult to statically analyze, particularly because jumps can be
dynamic. This also makes it more difficult to make optimized EVM
implementations that pre-compile EVM code into some other language. We
can potentially fix this by removing dynamic
jumps (or making them much more expensive, eg. gas cost linear in
the total number of JUMPDESTs in a contract). EOF does this, though
getting protocol simplification gains out of this would require making
EOF mandatory.
What are some links
to existing research?
What is left to
do, and what are the tradeoffs?
The main tradeoff in doing this kind of feature simplification is (i)
how much we simplify and how quickly vs (ii) backwards compatibility.
Ethereum's value as a chain comes from it being a platform where you can
deploy an application and be confident that it will still work many
years from now. At the same time, it's possible to take that ideal too
far, and, to paraphrase
William Jennings Bryan, "crucify Ethereum on a cross of backwards
compatibility". If there are only two applications in all of Ethereum
that use a given feature, and one has had zero users for years and the
other is almost completely unused and secures a total of $57 of value,
then we should just remove the feature, and if needed pay the victims
$57 out of pocket.
The broader social problem is in creating a standardized pipeline for
making non-emergency backwards-compatibility-breaking changes. One way
to approach this is to examine and extend existing precedents, such as
the SELFDESTRUCT process. The pipeline looks something as follows:
- Step 1: start talking about removing feature X
- Step 2: do analysis to identify how much removing X
breaks applications, depending on the results either (i) abandon the
idea, (ii) proceed as planned, or (iii) identify a modified
"least-disruptive" way to remove X and proceed with that
- Step 3: make a formal EIP to deprecate X. Make sure
that popular higher-level infrastructure (eg. programming languages,
wallets) respect this and stop using that feature.
- Step 4: finally, actually remove X
There should be a multi-year-long pipeline between step 1 and step 4,
with clear information about which items are at which step. At that
point, there is a tradeoff between how vigorous and fast the
feature-removal pipeline is, versus being more conservative and putting
more resources into other areas of protocol development, but we are
still far from the Pareto frontier.
EOF
A major set of changes that has been proposed to the EVM is the EVM Object Format (EOF). EOF
introduces a large number of changes, such as banning gas observability,
code observability (ie. no CODECOPY), allowing static jumps only. The
goal is to allow the EVM to be upgraded more, in a way that has stronger
properties, while preserving backwards compatibility (as the pre-EOF EVM
will still exist).
This has the advantage that it creates a natural path to adding new
EVM features and encouraging migration to a more restrictive EVM with
stronger guarantees. It has the disadvantage that it significantly
increases protocol complexity, unless we can find a way to
eventually deprecate and remove the old EVM. One major question is:
what role does EOF play in EVM simplification proposals,
especially if the goal is to reduce the complexity of the EVM as a
whole?
How does
it interact with other parts of the roadmap?
Many of the "improvement" proposals in the rest of the roadmap are
also opportunities to do simplifications of old features. To repeat some
examples from above:
- Switching to single-slot finality gives us an opportunity to remove
committees, rework economics, and do other proof-of-stake-related
simplifications.
- Fully implementing account abstraction lets us remove a lot of
existing transaction-handling logic, by moving it into a piece of
"default account EVM code" that all EOAs could be replaced by.
- If we move the Ethereum state to binary hash trees, this could be
harmonized with a new version of SSZ, so that all Ethereum data
structures could be hashed in the same way.
A
more radical approach: turn big parts of the protocol into contract
code
A more radical Ethereum simplification strategy is to keep the
protocol as is, but move large parts of it from being protocol features
to being contract code.
The most extreme version of this would be to make the Ethereum L1
"technically" be just the beacon chain, and introduce a minimal VM (eg.
RISC-V, Cairo, or something even more
minimal specialized for proving systems) which allows anyone else to
create their own rollup. The EVM would then turn into the first one of
these rollups. This is ironically exactly the same outcome as the execution
environment proposals from 2019-20, though SNARKs make it
significantly more viable to actually implement.
A more moderate approach would be to keep the relationship between
the beacon chain and the current Ethereum execution environment as-is,
but do an in-place swap of the EVM. We could choose RISC-V, Cairo or
another VM to be the new "official Ethereum VM", and then force-convert
all EVM contracts into new-VM code that interprets the logic of the
original code (by compiling or interpreting it). Theoretically, this
could even be done with the "target VM" being a version of EOF.
Possible futures of the Ethereum protocol, part 5: The Purge
2024 Oct 26 See all postsSpecial thanks to Justin Drake, Tim Beiko, Matt Garnett, Piper Merriam, Marius van der Wijden and Tomasz Stanczak for feedback and review
One of Ethereum's challenges is that by default, any blockchain protocol's bloat and complexity grows over time. This happens in two places:
For Ethereum to sustain itself into the long term, we need a strong counter-pressure against both of these trends, reducing complexity and bloat over time. But at the same time, we need to preserve one of the key properties that make blockchains great: their permanence. You can put an NFT, a love note in transaction calldata, or a smart contract containing a million dollars onchain, go into a cave for ten years, come out and find it still there waiting for you to read and interact with. For dapps to feel comfortable going fully decentralized and removing their upgrade keys, they need to be confident that their dependencies are not going to upgrade in a way that breaks them - especially the L1 itself.
The Purge, 2023 roadmap.
Balancing between these two needs, and minimizing or reversing bloat, complexity and decay while preserving continuity, is absolutely possible if we put our minds to it. Living organisms can do it: while most age over time, a lucky few do not. Even social systems can have extreme longevity. On a few occasions, Ethereum has already shown successes: proof of work is gone, the
SELFDESTRUCT
opcode is mostly gone, and beacon chain nodes already store old data up to only six months. Figuring out this path for Ethereum in a more generalized way, and moving toward an eventual outcome that is stable for the long term, is the ultimate challenge of Ethereum's long term scalability, technical sustainability and even security.The Purge: key goals
In this chapter
History expiry
What problem does it solve?
As of the time of this writing, a full-synced Ethereum node requires roughly 1.1 terabytes of disk space for the execution client, plus another few hundred gigabytes for the consensus client. The great majority of this is history: data about historical blocks, transactions and receipts, the bulk of which are many years old. This means that the size of a node keeps increasing by hundreds of gigabytes each year, even if the gas limit does not increase at all.
What is it, and how does it work?
A key simplifying feature of the history storage problem is that because each block points to the previous block via a hash link (and other structures), having consensus on the present is enough to have consensus on history. As long as the network has consensus on the latest block, any historical block or transaction or state (account balance, nonce, code, storage) can be provided by any single actor along with a Merkle proof, and the proof allows anyone else to verify its correctness. While consensus is an N/2-of-N trust model, history is a 1-of-N trust model.
This opens up a lot of options for how we can store the history. One natural option is a network where each node only stores a small percentage of the data. This is how torrent networks have worked for decades: while the network altogether stores and distributes millions of files, each participant only stores and distributes a few of them. Perhaps counterintuitively, this approach does not even necessarily decrease the robustness of the data. If, by making node running more affordable, we can get to a network with 100,000 nodes, where each node stores a random 10% of the history, then each piece of data would get replicated 10,000 times - exactly the same replication factor as a 10,000-node network where each node stores everything.
Today, Ethereum has already started to move away from the model of all nodes storing all history forever. Consensus blocks (ie. the parts related to proof of stake consensus) are only stored for ~6 months. Blobs are only stored for ~18 days. EIP-4444 aims to introduce a one-year storage period for historical blocks and receipts. A long-term goal is to have a harmonized period (which could be ~18 days) during which each node is responsible for storing everything, and then have a peer-to-peer network made up of Ethereum nodes storing older data in a distributed way.
Erasure codes can be used to increase robustness while keeping the replication factor the same. In fact, blobs already come erasure-coded in order to support data availability sampling. The simplest solution may well be to re-use this erasure coding, and put execution and consensus block data into blobs as well.
What are some links to existing research?
What is left to do, and what are the tradeoffs?
The main remaining work involves building out and integrating a concrete distributed solution for storing history - at least execution history, but ultimately also consensus and blobs. The easiest solutions for this are (i) to simply introduce an existing torrent library, and (ii) an Ethereum-native solution called the Portal network. Once either of these is introduced, we can turn EIP-4444 on. EIP-4444 itself does not require a hard fork, though it does require a new network protocol version. For this reason, there is value in enabling it for all clients at the same time, because otherwise there are risks of clients malfunctioning from connecting to other nodes expecting to download the full history but not actually getting it.
The main tradeoff involves how hard we try to make "ancient" historical data available. The easiest solution would be to simply stop storing ancient history tomorrow, and rely on existing archive nodes and various centralized providers for replication. This is easy, but this weakens Ethereum's position as a place to make permanent records. The harder, but safer, path is to first build out and integrate the torrent network for storing history in a distributed way. Here, there are two dimensions of "how hard we try":
A maximally paranoid approach for (1) would involve proof of custody: actually requiring each proof of stake validator to store some percentage of history, and regularly cryptographically checking that they do so. A more moderate approach is to set a voluntary standard for what percentage of history each client stores.
For (2), a basic implementation involves just taking the work that is already done today: Portal already stores ERA files containing the entire Ethereum history. A more thorough implementation would involve actually hooking this up to the syncing process, so that if someone wanted to sync a full-history-storing node or an archive node, they could do so even if no other archive nodes existed online, by syncing straight from the Portal network.
How does it interact with other parts of the roadmap?
Reducing history storage requirements is arguably even more important than statelessness if we want to make it extremely easy to run or spin up a node: out of the 1.1 TB that a node needs to have, ~300 GB is state, and the remaining ~800 GB is history. The vision of an Ethereum node running on a smart watch and taking only a few minutes to set up is only achievable if both statelessness and EIP-4444 are implemented.
Limiting history storage also makes it more viable for newer Ethereum node implementations to only support recent versions of the protocol, which allows them to be much simpler. For example, many lines of code can be safely removed now that empty storage slots created during the 2016 DoS attacks have all been removed. Now that the switch to proof of stake is ancient history, clients can safely remove all proof-of-work-related code.
State expiry
What problem does it solve?
Even if we remove the need for clients to store history, a client's storage requirement will continue to grow, by around 50 GB per year, because of ongoing growth to the state: account balances and nonces, contract code and contract storage. Users are able to pay a one-time cost to impose a burden on present and future Ethereum clients forever.
State is much harder to "expire" than history, because the EVM is fundamentally designed around an assumption that once a state object is created, it will always be there and can be read by any transaction at any time. If we introduce statelessness, there is an argument that maybe this problem is not that bad: only a specialized class of block builders would need to actually store the state, and all other nodes (even inclusion list production!) can run statelessly. However, there is an argument that we don't want to lean on statelessness too much, and eventually we may want to expire state to keep Ethereum decentralized.
What is it, and how does it work?
Today, when you create a new state object (which can happen in one of three ways: (i) sending ETH to a new account, (ii) creating a new account with code, (iii) setting a previously-untouched storage slot), that state object is in the state forever. What we want instead, is for objects to automatically expire over time. The key challenge is doing this in a way that accomplishes three goals:
It's easy to solve the problem without satisfying these goals. For example, you could have each state object also store a counter for its expiry date (which could be extended by burning ETH, which could happen automatically any time it's read or written), and have a process that loops through the state to remove expired state objects. However, this introduces extra computation (and even storage requirements), and it definitely does not satisfy the user-friendliness requirement. Developers too would have a hard time reasoning about edge cases involving storage values sometimes resetting to zero. If you make the expiry timer contract-wide, this makes life technically easier for developers, but it makes the economics harder: developers would have to think about how to "pass through" the ongoing costs of storage to their users.
These are problems that the Ethereum core development community struggled with for many years, including proposals like "blockchain rent" and "regenesis". Eventually, we combined the best parts of the proposals and converged on two categories of "known least bad solutions":
Partial state expiry
Partial state expiry proposals all work along the same principle. We split the state into chunks. Everyone permanently stores the "top-level map" of which chunks are empty or nonempty. The data within each chunk is only stored if that data has been recently accessed. There is a "resurrection" mechanism where if a chunk is no longer stored, anyone can bring that data back by providing a proof of what the data was.
The main distinctions between these proposals are: (i) how do we define "recently", and (ii) how do we define "chunk"? One concrete proposal is EIP-7736, which builds upon the "stem-and-leaf" design introduced for Verkle trees (though compatible with any form of statelessness, eg. binary trees). In this design, header, code and storage slots that are adjacent to each other are stored under the same "stem". The data stored under a stem can be at most
256 * 31 = 7,936
bytes. In many cases, the entire header and code, and many key storage slots, of an account will all be stored under the same stem. If the data under a given stem is not read or written for 6 months, the data is no longer stored, and instead only a 32-byte commitment ("stub") to the data is stored. Future transactions that access that data would need to "resurrect" the data, with a proof that would be checked against the stub.There are other ways to implement a similar idea. For example, if account-level granularity is not enough, we could make a scheme where each 1/232 fraction of the tree is governed by a similar stem-and-leaf mechanism.
This is trickier because of incentives: an attacker could force clients to permanently store a very large amount of state by putting a very large amount of data into a single subtree and sending a single transaction every year to "renew the tree". If you make the renewal cost proportional (or renewal duration inversely-proportional) to the tree size, then someone could grief another user by putting a very large amount of data into the same subtree as them. One could try to limit both problems by making the granularity dynamic based on the subtree size: for example, each consecutive 216 = 65536 state objects could be treated as a "group". However, these ideas are more complex; the stem-based approach is simple, and it aligns incentives, because typically all the data under a stem is related to the same application or user.
Address-period-based state expiry proposals
What if we wanted to avoid any permanent state growth at all, even 32-byte stubs? This is a hard problem because of resurrection conflicts: what if a state object gets removed, later EVM execution puts another state object in the exact same position, but then after that someone who cares about the original state object comes back and tries to recover it? With partial state expiry, the "stub" prevents new data from being created. With full state expiry, we cannot afford to store even the stub.
The address-period-based design is the best known idea for solving this. Instead of having one state tree storing the whole state, we have a constantly growing list of state trees, and any state that gets read or written gets saved in the most recent state tree. A new empty state tree gets added once per period (think: 1 year). Older state trees are frozen solid. Full nodes are only expected to store the most recent two trees. If a state object was not touched for two periods and thus falls into an expired tree, it still can be read or written to, but the transaction would need to prove a Merkle proof for it - and once it does, a copy will be saved in the latest tree again.
A key idea for making this all user and developer-friendly is the concept of address periods. An address period is a number that is part of an address. A key rule is that an address with address period N can only be read or written to during or after period N (ie. when the state tree list reaches length N). If you're saving a new state object (eg. a new contract, or a new ERC20 balance), if you make sure to put the state object into a contract whose address period is either N or N-1, then you can save it immediately, without needing to provide proofs that there was nothing there before. Any additions or edits to state in older address periods, on the other hand, do require a proof.
This design preserves most of Ethereum's current properties, is very light on extra computation, allows applications to be written almost as they are today (ERC20s will need to rewrite, to ensure that balances of addresses with address period N are stored in a child contract which itself has address period N), and solves the "user goes into a cave for five years" problem. However, it has one big issue: addresses need to be expanded beyond 20 bytes to fit address periods.
Address space extension
One proposal is to introduce a new 32-byte address format, which includes a version number, an address period number and an expanded hash.
The red is a version number. The four zeroes colored orange here are intended as empty space, which could fit a shard number in the future. The green is an address period number. The blue is a 26-byte hash.
The key challenge here is backwards compatibility. Existing contracts are designed around 20 byte addresses, and often use tight byte-packing techniques that explicitly assume addresses are exactly 20 bytes long. One idea for solving this involves a translation map, where old-style contracts interacting with new-style addresses would see a 20-byte hash of the new-style address. However, there are significant complexities involved in making this safe.
Address space contraction
Another approach goes the opposite direction: we immediately forbid some 2128-sized sub-range of addresses (eg. all addresses starting with
0xffffffff
), and then use that range to introduce addresses with address periods and 14-byte hashes.The key sacrifice that this approach makes, is that it introduces security risks for counterfactual addresses: addresses that hold assets or permissions, but whose code has not yet been published to chain. The risk involves someone creating an address which claims to have one piece of (not-yet-published) code, but also has another valid piece of code which hashes to the same address. Computing such a collision requires 280 hashes today; address space contraction would reduce this number to a very accessible 256 hashes.
The key risk area, counterfactual addresses that are not wallets held by a single owner, is a relatively rare case today, but is likely to become more common as we enter a multi-L2 world. The only solution is to simply accept this risk, but identify all common use cases where this may be an issue, and come up with effective workarounds.
What are some links to existing research?
What is left to do, and what are the tradeoffs?
I see four viable paths for the future:
The one function that needs access to parts of the state is inclusion list production, but we can accomplish this in a decentralized way: each user is responsible for maintaining the portion of the state tree that contains their own accounts. When they broadcast a transaction, they broadcast it with a proof of the state objects accessed during the verification step (this works for both EOAs and ERC-4337 accounts). Stateless validators can then combine these proofs into a proof for the whole inclusion list.
One important point is that the difficult issues around address space expansion and contraction will eventually have to be addressed regardless of whether or not state expiry schemes that depend on address format changes are ever implemented. Today, it takes roughly 280 hashes to generate an address collision, a computational load that is already feasible for extremely well-resourced actors: a GPU can do around 227 hashes, so running for a year it can compute 252, so all ~230 GPUs in the world could compute a collision in ~1/4 of a year, and FPGAs and ASICs could accelerate this further. In the future, such attacks will become open to more and more people. Hence, the actual cost of implementing full state expiry may not be as high as it seems, since we have to solve this very challenging address problem regardless.
How does it interact with other parts of the roadmap?
Doing state expiry potentially makes transitions from one state tree format to another easier, because there will be no need for a transition procedure: you could simply start making new trees using a new format, and then later do a hard fork to convert the older trees. Hence, while state expiry is complex, it does have benefits in simplifying other aspects of the roadmap.
Feature cleanup
What problems does it solve?
One of the key preconditions of security, accessibility and credible neutrality is simplicity. If a protocol is beautiful and simple, it reduces the chance that there will be bugs. It increases the chance that new developers will be able to come in and work with any part of it. It's more likely to be fair and easier to defend against special interests. Unfortunately, protocols, like any social system, by default become more complex over time. If we do not want Ethereum to go into a black hole of ever-increasing complexity, we need to do one of two things: (i) stop making changes and ossify the protocol, (ii) be able to actually remove features and reduce complexity. An intermediate route, of making fewer changes to the protocol, and also removing at least a little complexity over time, is also possible. This section will talk how we can reduce or remove complexity.
What is it, and how does it work?
There is no big single fix that can reduce protocol complexity; the inherent nature of the problem is that there are many little fixes.
One example that is mostly finished already, and can serve as a blueprint for how to handle the others, is the removal of the SELFDESTRUCT opcode. The SELFDESTRUCT opcode was the only opcode that could modify an unlimited number of storage slots within a single block, requiring clients to implement significantly more complexity to avoid DoS attacks. The opcode's original purpose was to enable voluntary state clearing, allowing the state size to decrease over time. In practice, very few ended up using it. The opcode was nerfed to only allow self-destructing accounts created in the same transaction in the Dencun hardfork. This solves the DoS issue and allows for significant simplification in client code. In the future, it likely makes sense to eventually remove the opcode completely.
Some key examples of protocol simplification opportunities that have been identified so far include the following. First, some examples that are outside the EVM; these are relatively non-invasive, and thus easier to get consensus on and implement in a shorter timeframe.
Now, some examples that are inside the EVM:
What are some links to existing research?
What is left to do, and what are the tradeoffs?
The main tradeoff in doing this kind of feature simplification is (i) how much we simplify and how quickly vs (ii) backwards compatibility. Ethereum's value as a chain comes from it being a platform where you can deploy an application and be confident that it will still work many years from now. At the same time, it's possible to take that ideal too far, and, to paraphrase William Jennings Bryan, "crucify Ethereum on a cross of backwards compatibility". If there are only two applications in all of Ethereum that use a given feature, and one has had zero users for years and the other is almost completely unused and secures a total of $57 of value, then we should just remove the feature, and if needed pay the victims $57 out of pocket.
The broader social problem is in creating a standardized pipeline for making non-emergency backwards-compatibility-breaking changes. One way to approach this is to examine and extend existing precedents, such as the SELFDESTRUCT process. The pipeline looks something as follows:
There should be a multi-year-long pipeline between step 1 and step 4, with clear information about which items are at which step. At that point, there is a tradeoff between how vigorous and fast the feature-removal pipeline is, versus being more conservative and putting more resources into other areas of protocol development, but we are still far from the Pareto frontier.
EOF
A major set of changes that has been proposed to the EVM is the EVM Object Format (EOF). EOF introduces a large number of changes, such as banning gas observability, code observability (ie. no CODECOPY), allowing static jumps only. The goal is to allow the EVM to be upgraded more, in a way that has stronger properties, while preserving backwards compatibility (as the pre-EOF EVM will still exist).
This has the advantage that it creates a natural path to adding new EVM features and encouraging migration to a more restrictive EVM with stronger guarantees. It has the disadvantage that it significantly increases protocol complexity, unless we can find a way to eventually deprecate and remove the old EVM. One major question is: what role does EOF play in EVM simplification proposals, especially if the goal is to reduce the complexity of the EVM as a whole?
How does it interact with other parts of the roadmap?
Many of the "improvement" proposals in the rest of the roadmap are also opportunities to do simplifications of old features. To repeat some examples from above:
A more radical approach: turn big parts of the protocol into contract code
A more radical Ethereum simplification strategy is to keep the protocol as is, but move large parts of it from being protocol features to being contract code.
The most extreme version of this would be to make the Ethereum L1 "technically" be just the beacon chain, and introduce a minimal VM (eg. RISC-V, Cairo, or something even more minimal specialized for proving systems) which allows anyone else to create their own rollup. The EVM would then turn into the first one of these rollups. This is ironically exactly the same outcome as the execution environment proposals from 2019-20, though SNARKs make it significantly more viable to actually implement.
A more moderate approach would be to keep the relationship between the beacon chain and the current Ethereum execution environment as-is, but do an in-place swap of the EVM. We could choose RISC-V, Cairo or another VM to be the new "official Ethereum VM", and then force-convert all EVM contracts into new-VM code that interprets the logic of the original code (by compiling or interpreting it). Theoretically, this could even be done with the "target VM" being a version of EOF.