In today’s rapidly evolving blockchain technology, performance optimization has become a key issue. The Ethereum roadmap is already very clear, with Rollup at its core. However, the serial processing of transactions by the EVM is a constraint that cannot meet the future high-concurrency computing scenarios.
In a previous article - “The Road to Optimizing Parallel EVM from Reddio”, we briefly outlined the design ideas of Reddio’s parallel EVM. In today’s article, we will delve into its technical solutions and the scenarios where it combines with AI.
Due to Reddio’s technical solution adopting CuEVM, which is a project that uses GPU to improve EVM execution efficiency, let’s start with CuEVM.
Overview of CUDA
CuEVM is a project that accelerates EVM with GPUs. It converts the opcodes of ETH EVM into CUDA Kernels for parallel execution on NVIDIA GPUs. By utilizing the parallel computing power of GPUs, it improves the efficiency of EVM instructions. NVIDIA users may often hear the term CUDA.
**Compute Unified Device Architecture, this is actually a parallel computing platform and programming model developed by NVIDIA. It allows developers to utilize the parallel computing power of GPUs for general-purpose computing (such as Mining in Crypto, ZK operations, etc.) rather than just graphics processing.
As an open parallel computing framework, CUDA is essentially an extension of the C/C++ language, and any low-level programmer familiar with C/C++ can quickly get started. An important concept in CUDA is the Kernel, which is also a type of C++ function.
But unlike regular C++ functions that only execute once, these kernel functions are executed in parallel N times by N different CUDA threads when called by the launch syntax <<<…>>>.
Each thread of CUDA is assigned a unique thread ID, and adopts a thread hierarchy to allocate threads to blocks and grids for managing a large number of parallel threads. With NVIDIA’s nvcc compiler, we can compile CUDA code into programs that can run on the GPU.
The basic workflow of CuEVM
After understanding a series of basic concepts of CUDA, you can take a look at the workflow of CuEVM.
The main entrance of CuEVM is run_interpreter, from which the transactions to be processed in parallel are input in the form of a json file. From the project cases, it can be seen that the input is all standard EVM content, without the need for developers to handle, translate, etc.
In the run_interpreter() function, it can be seen that the kernel_evm() kernel function is called using the <<…>> syntax defined by CUDA. As mentioned earlier, the kernel function is called in parallel on the GPU.
The kernel_evm() method calls evm->run(), where we can see a large number of branch judgments to convert EVM opcodes to CUDA operations.
Taking the addition operation code OP_ADD in EVM as an example, it can be seen that it converts ADD to cgbn_add. CGBN (Cooperative Groups Big Numbers) is a high-performance CUDA library for multiple-precision integer arithmetic operations.
These two steps convert EVM opcodes to CUDA operations. It can be said that CuEVM is also an implementation of all EVM operations on CUDA. Finally, the run_interpreter() method returns the computation result, which is the world state and other information.
The basic operation logic of CuEVM has been introduced so far.
CuEVM has the ability to process transactions in parallel, but the purpose of CuEVM (or the main use case) is for Fuzzing testing: Fuzzing is an automated software testing technique that observes the response of a program by inputting a large amount of invalid, unexpected, or random data, in order to identify potential errors and security issues.
We can see that Fuzzing is very suitable for parallel processing. CuEVM does not deal with transaction conflicts and other issues, which is not its concern. If you want to integrate CuEVM, you still need to deal with conflicting transactions.
In our previous article, ‘The Road to Optimizing Parallel EVM from Reddio,’ we have already introduced the conflict resolution mechanism used by Reddio, so we will not go into detail here. After sorting transactions using the conflict resolution mechanism, Reddio can then send them to CuEVM for execution. In other words, Reddio L2’s transaction sorting mechanism consists of two parts: conflict resolution and parallel execution in CuEVM.
Layer2, parallel EVM, the three forks of AI
In the previous article, we mentioned that parallel EVM and L2 are just the beginning for Reddio, and its future roadmap will clearly integrate with AI narrative. Reddio, which uses GPU for high-speed parallel transactions, is inherently suitable for AI operations in many aspects:
GPUs have strong parallel processing capabilities, making them suitable for performing convolution operations in Depth learning. These operations are essentially large-scale matrix multiplications, and GPUs are optimized for such tasks.
The thread hierarchy of the GPU can correspond to different data structures in AI computation, improving computational efficiency and masking memory latency through thread over-subscription and Warp execution units.
Computing power is a key indicator for measuring the AI computing performance. GPUs optimize computing power, such as introducing Tensor Cores, to improve the performance of matrix multiplication in AI computing, achieving an effective balance between calculation and data transmission.
So how does AI combine with L2?
We know that in the architecture design of Rollup, the entire network is not only composed of sequencers, but also some roles such as supervisors and forwarders to verify or collect transactions. They essentially use the same client as the sequencer, but undertake different functions. In traditional Rollup, the functions and permissions of these secondary roles are very limited, such as the watcher role in Arbitrum, which is basically passive, defensive and public welfare, and its profit model is also questionable.
Reddio will adopt the architecture of Decentralization sorter, with Miners providing GPU as Nodes. The entire Reddio network can evolve from a simple L2 to a comprehensive L2+AI network, which can well implement some AI+Block chain use cases:
The Interactive Basic Network of AI Agent
With the continuous evolution of Blockchain technology, the potential application of AI Agents in the Blockchain network is enormous. Taking the AI Agents that execute financial transactions as an example, these intelligent agents can autonomously make complex decisions and execute transactions, and even react quickly under high-frequency conditions. However, L1 is basically impossible to bear such a huge transaction load when dealing with such intensive operations.
As an L2 project, Reddio can greatly improve transaction parallel processing capability through GPU acceleration. Compared with L1, L2 supporting parallel execution of transactions has higher throughput, which can efficiently process a large number of high-frequency transaction requests from AI Agents, ensuring the smooth operation of the network.
In high-frequency trading, AI Agents have extremely stringent requirements for transaction speed and response time. L2 reduces the verification and execution time of transactions, thus significantly dropping latency. This is crucial for AI Agents that require millisecond-level response. By migrating a large number of transactions to L2, it also effectively alleviates the congestion problem of the Mainnet, making the operation of AI Agents more economically efficient.
With the maturity of L2 projects such as Reddio, AI Agent will play a more important role in on-chain to promote the innovation of Decentralized Finance and other blockchain application scenarios combined with AI.
Decentralization Computing Power Market
Reddio will adopt the architecture of Decentralization sorter in the future, and Miners will determine the sorting rights based on GPU Computing Power. The performance of GPUs of all network participants will gradually improve with competition, and even reach the level used for AI training.
Build a GPU Computing Power market for Decentralization, providing lower-cost Computing Power resources for AI training and inference. Computing Power, from individual computers to data center clusters, can contribute their idle Computing Power to the market and earn profits at various levels of GPU Computing Power. This model can drop AI computing costs and enable more people to participate in AI model development and application.
In the use cases of Decentralization Computing Power market, the sorter may not be primarily responsible for the direct operation of AI, its main functions are to handle transactions and coordinate AI Computing Power in the entire network. And there are two modes for Computing Power and task allocation here:
Top-down centralized distribution. With a sorter, the sorter can allocate received Computing Power requests to Nodes that meet the requirements and have a good reputation. Although this allocation method theoretically has centralized and unfair issues, in practice, the efficiency advantages far outweigh its drawbacks. Moreover, in the long run, the sorter must satisfy the positivity of the entire network in order to develop in the long term. This means that there are implicit but direct constraints to ensure that the sorter does not have too serious biases.
Bottom-up spontaneous task selection. Users can also submit AI computing requests to third-party Nodes, which is obviously more efficient in specific AI application areas than submitting directly to the sorter, and can also prevent the sorter from reviewing and biasing. After the computation is completed, the Node will synchronize the computation result to the sorter and upload it to the chain.
We can see that in the architecture of L2 + AI, the Computing Power market has a high degree of flexibility, and it can gather Computing Power from two directions to maximize the utilization of resources.
on-chainAI reasoning
Currently, the maturity of the Open Source model is sufficient to meet diverse needs. With the standardization of AI inference services, exploring how to put Computing Power on the chain to achieve automated pricing becomes possible. However, this requires overcoming multiple technical challenges:
Efficient request distribution and recording: Large model inference has high latency requirements, and efficient request distribution mechanisms are crucial. Although the data volume of requests and responses is large and confidential, and should not be made public on the on-chain Block, a balance between recording and verification must be found - for example, by storing hashes.
Computing PowerNode Verification Output: Has the Node truly completed the specified computing task? For example, has the Node falsely reported using small model computing results to replace the large model.
Smart Contract Reasoning: It is necessary to combine AI models with Smart Contracts for computation in many scenarios. Due to the uncertainty of AI reasoning, it is not possible to use it for all aspects of on-chain, so the logic of future AI dApps is likely to be partially off-chain and partially on-chain contracts. On-chain contracts limit the validity and numerical legality of the inputs provided by off-chain. And in the Ethereum ecosystem, combining with Smart Contracts must face the inefficiency of the EVM’s serialization.
However, in the Reddio architecture, these are relatively easy to solve:
The sorter’s distribution of requests is far more efficient than L1’s, and can be considered equivalent to the efficiency of Web2. As for the recording location and retention method of the data, it can be solved by various inexpensive DA solutions.
The computation result of AI can be verified for correctness and good faith by ZKP in the end. The feature of ZKP is very fast verification, but slow proof generation. The generation of ZKP can also be accelerated by GPU or TEE.
Solidty → CUDA → GPU, this EVM parallel main line is the foundation of Reddio. So, on the surface, this is the simplest issue for Reddio. Currently, Reddio is cooperating with AiI6z’s eliza to introduce its module into Reddio, which is a very worthwhile direction to explore.
Summary
Overall, Layer2 solutions, parallel EVM, and AI technologies may seem unrelated, but Reddio cleverly combines these major innovative areas by fully leveraging the computational characteristics of GPUs.
By leveraging the parallel computing capabilities of GPUs, Reddio has improved transaction speed and efficiency on Layer2, enhancing the performance of the Ethereum Layer2. Integrating AI technology into blockchain is a novel and promising attempt. The introduction of AI can provide intelligent analysis and decision-making support for on-chain operations, enabling more intelligent and dynamic blockchain applications. This cross-disciplinary integration undoubtedly opens up new paths and opportunities for the entire industry.
However, it is important to note that this field is still in its early stages and requires a lot of research and exploration. The continuous iteration and optimization of technology, as well as the imagination and action of market pioneers, will be the key driving forces to push this innovation towards maturity. Reddio has taken an important and bold step at this intersection, and we look forward to seeing more breakthroughs and surprises in this integrated field in the future.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
Reddio Technology Overview: A Narrative Overview from Parallel EVM to AI
Author: Wuyue, Geek Web3
In today’s rapidly evolving blockchain technology, performance optimization has become a key issue. The Ethereum roadmap is already very clear, with Rollup at its core. However, the serial processing of transactions by the EVM is a constraint that cannot meet the future high-concurrency computing scenarios.
In a previous article - “The Road to Optimizing Parallel EVM from Reddio”, we briefly outlined the design ideas of Reddio’s parallel EVM. In today’s article, we will delve into its technical solutions and the scenarios where it combines with AI.
Due to Reddio’s technical solution adopting CuEVM, which is a project that uses GPU to improve EVM execution efficiency, let’s start with CuEVM.
Overview of CUDA
CuEVM is a project that accelerates EVM with GPUs. It converts the opcodes of ETH EVM into CUDA Kernels for parallel execution on NVIDIA GPUs. By utilizing the parallel computing power of GPUs, it improves the efficiency of EVM instructions. NVIDIA users may often hear the term CUDA.
**Compute Unified Device Architecture, this is actually a parallel computing platform and programming model developed by NVIDIA. It allows developers to utilize the parallel computing power of GPUs for general-purpose computing (such as Mining in Crypto, ZK operations, etc.) rather than just graphics processing.
As an open parallel computing framework, CUDA is essentially an extension of the C/C++ language, and any low-level programmer familiar with C/C++ can quickly get started. An important concept in CUDA is the Kernel, which is also a type of C++ function.
But unlike regular C++ functions that only execute once, these kernel functions are executed in parallel N times by N different CUDA threads when called by the launch syntax <<<…>>>.
Each thread of CUDA is assigned a unique thread ID, and adopts a thread hierarchy to allocate threads to blocks and grids for managing a large number of parallel threads. With NVIDIA’s nvcc compiler, we can compile CUDA code into programs that can run on the GPU.
The basic workflow of CuEVM
After understanding a series of basic concepts of CUDA, you can take a look at the workflow of CuEVM.
The main entrance of CuEVM is run_interpreter, from which the transactions to be processed in parallel are input in the form of a json file. From the project cases, it can be seen that the input is all standard EVM content, without the need for developers to handle, translate, etc.
In the run_interpreter() function, it can be seen that the kernel_evm() kernel function is called using the <<…>> syntax defined by CUDA. As mentioned earlier, the kernel function is called in parallel on the GPU.
The kernel_evm() method calls evm->run(), where we can see a large number of branch judgments to convert EVM opcodes to CUDA operations.
Taking the addition operation code OP_ADD in EVM as an example, it can be seen that it converts ADD to cgbn_add. CGBN (Cooperative Groups Big Numbers) is a high-performance CUDA library for multiple-precision integer arithmetic operations.
These two steps convert EVM opcodes to CUDA operations. It can be said that CuEVM is also an implementation of all EVM operations on CUDA. Finally, the run_interpreter() method returns the computation result, which is the world state and other information.
The basic operation logic of CuEVM has been introduced so far.
CuEVM has the ability to process transactions in parallel, but the purpose of CuEVM (or the main use case) is for Fuzzing testing: Fuzzing is an automated software testing technique that observes the response of a program by inputting a large amount of invalid, unexpected, or random data, in order to identify potential errors and security issues.
We can see that Fuzzing is very suitable for parallel processing. CuEVM does not deal with transaction conflicts and other issues, which is not its concern. If you want to integrate CuEVM, you still need to deal with conflicting transactions.
In our previous article, ‘The Road to Optimizing Parallel EVM from Reddio,’ we have already introduced the conflict resolution mechanism used by Reddio, so we will not go into detail here. After sorting transactions using the conflict resolution mechanism, Reddio can then send them to CuEVM for execution. In other words, Reddio L2’s transaction sorting mechanism consists of two parts: conflict resolution and parallel execution in CuEVM.
Layer2, parallel EVM, the three forks of AI
In the previous article, we mentioned that parallel EVM and L2 are just the beginning for Reddio, and its future roadmap will clearly integrate with AI narrative. Reddio, which uses GPU for high-speed parallel transactions, is inherently suitable for AI operations in many aspects:
So how does AI combine with L2?
We know that in the architecture design of Rollup, the entire network is not only composed of sequencers, but also some roles such as supervisors and forwarders to verify or collect transactions. They essentially use the same client as the sequencer, but undertake different functions. In traditional Rollup, the functions and permissions of these secondary roles are very limited, such as the watcher role in Arbitrum, which is basically passive, defensive and public welfare, and its profit model is also questionable.
Reddio will adopt the architecture of Decentralization sorter, with Miners providing GPU as Nodes. The entire Reddio network can evolve from a simple L2 to a comprehensive L2+AI network, which can well implement some AI+Block chain use cases:
The Interactive Basic Network of AI Agent
With the continuous evolution of Blockchain technology, the potential application of AI Agents in the Blockchain network is enormous. Taking the AI Agents that execute financial transactions as an example, these intelligent agents can autonomously make complex decisions and execute transactions, and even react quickly under high-frequency conditions. However, L1 is basically impossible to bear such a huge transaction load when dealing with such intensive operations.
As an L2 project, Reddio can greatly improve transaction parallel processing capability through GPU acceleration. Compared with L1, L2 supporting parallel execution of transactions has higher throughput, which can efficiently process a large number of high-frequency transaction requests from AI Agents, ensuring the smooth operation of the network.
In high-frequency trading, AI Agents have extremely stringent requirements for transaction speed and response time. L2 reduces the verification and execution time of transactions, thus significantly dropping latency. This is crucial for AI Agents that require millisecond-level response. By migrating a large number of transactions to L2, it also effectively alleviates the congestion problem of the Mainnet, making the operation of AI Agents more economically efficient.
With the maturity of L2 projects such as Reddio, AI Agent will play a more important role in on-chain to promote the innovation of Decentralized Finance and other blockchain application scenarios combined with AI.
Decentralization Computing Power Market
Reddio will adopt the architecture of Decentralization sorter in the future, and Miners will determine the sorting rights based on GPU Computing Power. The performance of GPUs of all network participants will gradually improve with competition, and even reach the level used for AI training.
Build a GPU Computing Power market for Decentralization, providing lower-cost Computing Power resources for AI training and inference. Computing Power, from individual computers to data center clusters, can contribute their idle Computing Power to the market and earn profits at various levels of GPU Computing Power. This model can drop AI computing costs and enable more people to participate in AI model development and application.
In the use cases of Decentralization Computing Power market, the sorter may not be primarily responsible for the direct operation of AI, its main functions are to handle transactions and coordinate AI Computing Power in the entire network. And there are two modes for Computing Power and task allocation here:
We can see that in the architecture of L2 + AI, the Computing Power market has a high degree of flexibility, and it can gather Computing Power from two directions to maximize the utilization of resources.
on-chainAI reasoning
Currently, the maturity of the Open Source model is sufficient to meet diverse needs. With the standardization of AI inference services, exploring how to put Computing Power on the chain to achieve automated pricing becomes possible. However, this requires overcoming multiple technical challenges:
However, in the Reddio architecture, these are relatively easy to solve:
Summary
Overall, Layer2 solutions, parallel EVM, and AI technologies may seem unrelated, but Reddio cleverly combines these major innovative areas by fully leveraging the computational characteristics of GPUs.
By leveraging the parallel computing capabilities of GPUs, Reddio has improved transaction speed and efficiency on Layer2, enhancing the performance of the Ethereum Layer2. Integrating AI technology into blockchain is a novel and promising attempt. The introduction of AI can provide intelligent analysis and decision-making support for on-chain operations, enabling more intelligent and dynamic blockchain applications. This cross-disciplinary integration undoubtedly opens up new paths and opportunities for the entire industry.
However, it is important to note that this field is still in its early stages and requires a lot of research and exploration. The continuous iteration and optimization of technology, as well as the imagination and action of market pioneers, will be the key driving forces to push this innovation towards maturity. Reddio has taken an important and bold step at this intersection, and we look forward to seeing more breakthroughs and surprises in this integrated field in the future.