publications

2026

Few-for-Many Personalized Federated Learning

Ping Guo, Tiantian Zhang, Xi Lin, and 3 more authors

In 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

@inproceedings{guo2026cvpr,
  author = {Guo, Ping and Zhang, Tiantian and Lin, Xi and Li, Xiang and Tang, Zhi-Ri and Zhang, Qingfu},
  title = {Few-for-Many Personalized Federated Learning},
  booktitle = {2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2026},
  dimensions = {true},
}

2025

CVPR
MOS-Attack: A Scalable Multi-Objective Adversarial Attack Framework

Ping Guo, Cheng Gong, Xi Lin, and 4 more authors

In 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2025

Abs DOI Bib PDF Code

Crafting adversarial examples is crucial for evaluating and enhancing the robustness of Deep Neural Networks (DNNs), presenting a challenge equivalent to maximizing a non-differentiable 0-1 loss function. However, existing single objective methods, namely adversarial attacks focus on a surrogate loss function, do not fully harness the benefits of engaging multiple loss functions, as a result of insufficient understanding of their synergistic and conflicting nature. To overcome these limitations, we propose the Multi-Objective Set-Based Attack (MOS Attack), a novel adversarial attack framework leveraging multiple loss functions and automatically uncovering their interrelations. The MOS Attack adopts a set-based multi-objective optimization strategy, enabling the incorporation of numerous loss functions without additional parameters. It also automatically mines synergistic patterns among various losses, facilitating the generation of potent adversarial attacks with fewer objectives. Extensive experiments have shown that our MOS Attack outperforms single-objective attacks. Furthermore, by harnessing the identified synergistic patterns, MOS Attack continues to show superior results with a reduced number of loss functions.
@inproceedings{guo2025mos, address = {Los Alamitos, CA, USA}, author = {Guo, Ping and Gong, Cheng and Lin, Xi and Liu, Fei and Lu, Zhichao and Zhang, Qingfu and Wang, Zhenkun}, booktitle = {2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, dimensions = {true}, doi = {10.1109/CVPR52734.2025.00475}, keywords = {Computer vision, Codes, Artificial neural networks, Robustness, Adversarial machine learning, Pattern recognition, Optimization}, month = jun, pages = {5041-5051}, publisher = {IEEE Computer Society}, title = {MOS-Attack: A Scalable Multi-Objective Adversarial Attack Framework}, year = {2025} }
AAAI
CoEvo: Continual Evolution of Symbolic Solutions Using Large Language Models

Ping Guo, Qingfu Zhang, and Xi Lin

In AAAI-26, Sponsored by the Association for the Advancement of Artificial Intelligence, Jun 2025

Abs Bib Code

The discovery of symbolic solutions—mathematical expressions, logical rules, and algorithmic structures—is fundamental to advancing scientific and engineering progress. However, traditional methods often struggle with search efficiency and fail to integrate knowledge effectively. While recent large language model-based (LLM-based) approaches have demonstrated improvements in search efficiency, they lack the ability to continually refine and expand upon discovered solutions and their underlying knowledge, limiting their potential for \textitopen-ended innovation. To address these limitations, we introduce CoEvo, a novel framework that leverages large language models within an evolutionary search methodology to continually generate and refine symbolic solutions. CoEvo integrates a dynamic knowledge library, enabling open-ended innovation of solutions through effective knowledge management. Additionally, CoEvo leverages multiple representations of solutions—including natural language, mathematical expressions, and code—to further enhance search efficiency. By combining the reasoning capabilities of LLMs with the exploratory power of evolutionary algorithms, CoEvo significantly improves the efficiency and scope of symbolic discovery. Our experimental results demonstrate that this method not only enhances the efficiency of searching for symbolic solutions but also supports the ongoing discovery process, akin to human scientific endeavors. This study represents a first effort in conceptualizing the search for symbolic solutions as a lifelong, iterative process, marking a significant step towards harnessing LLMs in the perpetual pursuit of scientific and engineering breakthroughs. Our code is available at \urlhttps://github.com/pgg3/CoEvo.
@inproceedings{guo2026aaai, author = {Guo, Ping and Zhang, Qingfu and Lin, Xi}, title = {CoEvo: Continual Evolution of Symbolic Solutions Using Large Language Models}, booktitle = {AAAI-26, Sponsored by the Association for the Advancement of Artificial Intelligence}, dimensions = {true}, publisher = {{AAAI} Press}, year = {2025} }
NeurIPS
SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning

Yiting Wang^†, Wanghao Ye^†, Ping Guo^†, and 11 more authors

In Advances in Neural Information Processing Systems 38 (NeurIPS 2025), Jun 2025

Abs Bib Code

Optimizing Register Transfer Level (RTL) code is crucial for improving the efficiency and performance of digital circuits in the early stages of synthesis. Manual rewriting, guided by synthesis feedback, can yield high-quality results but is time-consuming and error-prone. Most existing compiler-based approaches have difficulty handling complex design constraints. Large Language Model (LLM)-based methods have emerged as a promising alternative to address these challenges. However, LLM-based approaches often face difficulties in ensuring alignment between the generated code and the provided prompts. This paper introduces SymRTLO, a neuron-symbolic framework that integrates LLMs with symbolic reasoning for the efficient and effective optimization of RTL code. Our method incorporates a retrieval-augmented system of optimization rules and Abstract Syntax Tree (AST)-based templates, enabling LLM-based rewriting that maintains syntactic correctness while minimizing undesired circuit behaviors. A symbolic module is proposed for analyzing and optimizing finite state machine (FSM) logic, allowing fine-grained state merging and partial specification handling beyond the scope of pattern-based compilers. Furthermore, an efficient verification pipeline, combining formal equivalence checks with test-driven validation, further reduces the complexity of verification. Experiments on the RTL-Rewriter benchmark with Synopsys Design Compiler and Yosys show that SymRTLO improves power, performance, and area (PPA) by up to 43.9%, 62.5%, and 51.1%, respectively, compared to the state-of-the-art methods. Our code is available at \urlhttps://github.com/NellyW8/SymRTLO
@inproceedings{wang2025symrtlo, author = {Wang, Yiting and Ye, Wanghao and Guo, Ping and He, Yexiao and Wang, Ziyao and Tian, Bowei and He, Shwai and Sun, Guoheng and Shen, Zheyu and Chen, Sihan and Srivastava, Ankur and Zhang, Qingfu and Qu, Gang and Li, Ang}, booktitle = {Advances in Neural Information Processing Systems 38 (NeurIPS 2025)}, dimensions = {true}, title = {SymRTLO: Enhancing {RTL} Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning}, year = {2025} }
IEEE TETCI
Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume

Ping Guo, Cheng Gong, Xi Lin, and 2 more authors

IEEE Transactions on Emerging Topics in Computational Intelligence, Jun 2025

Abs DOI Bib PDF

The escalating threat of adversarial attacks on deep learning models, particularly in security-critical fields, has underscored the need for robust deep learning systems. Conventional robustness evaluations have relied on adversarial accuracy, which measures a model’s performance under a specific perturbation intensity. However, this singular metric does not fully encapsulate the overall resilience of a model against varying degrees of perturbation. To address this gap, we propose a new metric termed adversarial hypervolume, assessing the robustness of deep learning models comprehensively over a range of perturbation intensities from a multi-objective optimization standpoint. This metric allows for an in-depth comparison of defense mechanisms and recognizes the trivial improvements in robustness afforded by less potent defensive strategies. Additionally, we adopt a novel training algorithm that enhances adversarial robustness uniformly across various perturbation intensities, in contrast to methods narrowly focused on optimizing adversarial accuracy. Our extensive empirical studies validate the effectiveness of the adversarial hypervolume metric, demonstrating its ability to reveal subtle differences in robustness that adversarial accuracy overlooks. This research contributes a new measure of robustness and establishes a standard for assessing and benchmarking the resilience of current and future defensive models against adversarial threats.
@article{10885038, author = {Guo, Ping and Gong, Cheng and Lin, Xi and Yang, Zhiyuan and Zhang, Qingfu}, journal = {IEEE Transactions on Emerging Topics in Computational Intelligence}, title = {Exploring the Adversarial Frontier: Quantifying Robustness via Adversarial Hypervolume}, year = {2025}, volume = {9}, number = {2}, pages = {1367-1378}, keywords = {Robustness, Perturbation methods, Measurement, Accuracy, Optimization, Training, Computational modeling, Resilience, Probabilistic logic, Glass box, Adversarial attacks, multiobjective optimization, hypervolume}, doi = {10.1109/TETCI.2025.3535656}, dimensions = {true} }
ACM Comput. Surv.
A Systematic Survey on Large Language Models for Algorithm Design

Fei Liu, Yiming Yao, Ping Guo, and 8 more authors

ACM Computing Surveys, Jun 2025

Abs Bib HTML PDF

Algorithm Design (AD) is crucial for effective problem-solving across various domains. The advent of Large Language Models (LLMs) has notably enhanced the automation and innovation within this field, offering new perspectives and promising solutions. Over the past three years, the integration of LLMs into AD (LLM4AD) has seen substantial progress, with applications spanning optimization, machine learning, mathematical reasoning, and scientific discovery. Given the rapid advancements and expanding scope of this field, a systematic review is both timely and necessary. This paper provides a systematic review of LLM4AD. First, we offer an overview and summary of existing studies. Then, we introduce a taxonomy and review the literature across four dimensions: the roles of LLMs, search methods, prompt methods, and application domains with a discussion of potential and achievements of LLMs in AD. Finally, we identify current challenges and highlight several promising directions for future research.
@article{liu2024systematicsurveylargelanguage, author = {Liu, Fei and Yao, Yiming and Guo, Ping and Yang, Zhiyuan and Zhao, Zhe and Lin, Xi and Tong, Xialiang and Yuan, Mingxuan and Lu, Zhichao and Wang, Zhenkun and Zhang, Qingfu}, title = {A Systematic Survey on Large Language Models for Algorithm Design}, journal = {ACM Computing Surveys}, year = {2025}, dimensions = {true}, }

2024

GECCO
L-AutoDA: Large Language Models for Automatically Evolving Decision-based Adversarial Attacks

Ping Guo, Fei Liu, Xi Lin, and 2 more authors

In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Melbourne, VIC, Australia, Jun 2024

Abs DOI Bib Code

In the rapidly evolving field of machine learning, adversarial attacks pose a significant threat to the robustness and security of models. Amongst these, decision-based attacks are particularly insidious due to their nature of requiring only the model’s decision output, which makes them notably challenging to counteract. This paper presents L-AutoDA (Large Language Model-based Automated Decision-based Adversarial Attacks), an innovative methodology that harnesses the generative capabilities of large language models (LLMs) to streamline the creation of such attacks. L-AutoDA employs an evolutionary strategy, where iterative interactions with LLMs lead to the autonomous generation of potent attack algorithms, thereby reducing human intervention. The performance of L-AutoDA was evaluated on the CIFAR-10 dataset, where it demonstrated substantial superiority over existing baseline methods in terms of success rate and computational efficiency. Ultimately, our results highlight the formidable utility of language models in crafting adversarial attacks and reveal promising directions for constructing more resilient AI systems.
@inproceedings{10.1145/3638530.3664121, author = {Guo, Ping and Liu, Fei and Lin, Xi and Zhao, Qingchuan and Zhang, Qingfu}, title = {L-AutoDA: Large Language Models for Automatically Evolving Decision-based Adversarial Attacks}, year = {2024}, isbn = {9798400704956}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3638530.3664121}, doi = {10.1145/3638530.3664121}, booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference Companion}, dimensions = {true}, pages = {1846-1854}, numpages = {9}, keywords = {large language models, adversarial attacks, automated algorithm design, evolutionary algorithms}, location = {Melbourne, VIC, Australia}, series = {GECCO '24 Companion} }
PPSN
LTR-HSS: A Learning-to-Rank Based Framework for Hypervolume Subset Selection

Cheng Gong, Ping Guo, Tianye Shu, and 2 more authors

In Parallel Problem Solving from Nature – PPSN XVIII: 18th International Conference, PPSN 2024, Hagenberg, Austria, September 14–18, 2024, Proceedings, Part IV, Hagenberg, Austria, Jun 2024

Abs DOI Bib

Hypervolume subset selection (HSS) plays an important role in various aspects of the field of evolutionary multi-objective optimization, such as environmental selection and post-processing for decision-making. The goal of these problems is to find the optimal subset that maximizes the hypervolume from a given candidate solution set. Many methods have been developed to solve or approximately solve different types of HSS problems. However, existing approaches cannot effectively solve HSS problems with a large number of objectives within a short computation time. This drawback directly limits their applicability as a component for developing new EMO algorithms. In this paper, we propose a novel learning-to-rank based framework, named LTR-HSS, for solving the challenging HSS problems with a large number of objectives. The experimental results show that, compared to other state-of-the-art HSS methods, our proposed LTR-HSS requires a shorter computation time to solve HSS problems with large numbers of objectives while achieving superior or competitive hypervolume performance. This demonstrates the potential of our method to be integrated into algorithms for many-objective optimization.
@inproceedings{10.1007/978-3-031-70085-9_3, author = {Gong, Cheng and Guo, Ping and Shu, Tianye and Zhang, Qingfu and Ishibuchi, Hisao}, title = {LTR-HSS: A Learning-to-Rank Based Framework for Hypervolume Subset Selection}, year = {2024}, isbn = {978-3-031-70084-2}, publisher = {Springer-Verlag}, address = {Berlin, Heidelberg}, url = {https://doi.org/10.1007/978-3-031-70085-9_3}, doi = {10.1007/978-3-031-70085-9_3}, booktitle = {Parallel Problem Solving from Nature – PPSN XVIII: 18th International Conference, PPSN 2024, Hagenberg, Austria, September 14–18, 2024, Proceedings, Part IV}, pages = {36--51}, numpages = {16}, keywords = {Hypervolume subset selection, Multi-objective optimization, Machine learning}, location = {Hagenberg, Austria}, dimensions = {true} }
IEEE TEVC
DPP-HSS: Towards Fast and Scalable Hypervolume Subset Selection for Many-objective Optimization

Cheng Gong, Yang Nan, Ke Shang, and 3 more authors

IEEE Transactions on Evolutionary Computation, Jun 2024

Abs DOI Bib

Hypervolume subset selection (HSS) has received significant attention since it has a strong connection with evolutionary multi-objective optimization (EMO), such as environment selection and post-processing to identify representative solutions for decision-makers. The goal of HSS is to find the optimal subset that maximizes the hypervolume indicator subject to a given cardinality constraint. However, existing HSS algorithms or related methods are not efficient in achieving good performance in high-dimensional objective spaces. This is primarily because HSS problems become NP-hard when the number of objectives exceeds two, and the calculation of hypervolume contribution is very time-consuming. To efficiently solve HSS problems while maintaining a good solution quality, we propose a fast and scalable hypervolume subset selection method for many-objective optimization based on the determinantal point process (DPP), named DPP-HSS, which is fully free of hypervolume contribution calculation. Specifically, DPP-HSS constructs a hypervolume kernel matrix by extracting the convergence and diversity representations of each solution for a given HSS problem. This matrix is then used to build a DPP model. Subsequently, the original HSS problem is reformulated as a new maximization optimization problem based on the constructed model. A greedy DPP-based hypervolume subset selection algorithm is implemented to solve this transformed problem. Extensive experiments show that the proposed DPP-HSS achieves significant speedup and good hypervolume performance in comparison with state-of-the-art HSS algorithms on benchmark problems. Furthermore, DPP-HSS demonstrates very good scalability with respect to the number of objectives.
@article{10742945, author = {Gong, Cheng and Nan, Yang and Shang, Ke and Guo, Ping and Ishibuchi, Hisao and Zhang, Qingfu}, journal = {IEEE Transactions on Evolutionary Computation}, title = {DPP-HSS: Towards Fast and Scalable Hypervolume Subset Selection for Many-objective Optimization}, year = {2024}, volume = {}, number = {}, pages = {1-1}, keywords = {Optimization, Approximation algorithms, Search problems, Evolutionary computation, Pareto optimization, Kernel, Convergence, Urban areas, Scalability, Greedy algorithms, Hypervolume subset selection, many-objective optimization, evolutionary multi-objective optimization (EMO)}, doi = {10.1109/TEVC.2024.3491155}, dimensions = {true} }

2023

EMO
Approximation of a Pareto Set Segment Using a Linear Model with Sharing Variables

Ping Guo, Qingfu Zhang, and Xi Lin

In Evolutionary Multi-Criterion Optimization, Jun 2023

Abs DOI Bib

In many real-world applications, the Pareto Set (PS) of a continuous multiobjective optimization problem can be a piecewise continuous manifold. A decision maker may want to find a solution set that approximates a small part of the PS and requires the solutions in this set share some similarities. This paper makes a first attempt to address this issue. We first develop a performance metric that considers both optimality and variable sharing. Then we design an algorithm for finding the model that minimizes the metric to meet the user’s requirements. Experimental results illustrate that we can obtain a linear model that approximates the mapping from the preference vectors to solutions in a local area well.
@inproceedings{10.1007/978-3-031-27250-9_18, author = {Guo, Ping and Zhang, Qingfu and Lin, Xi}, editor = {Emmerich, Michael and Deutz, Andr{\'e} and Wang, Hao and Kononova, Anna V. and Naujoks, Boris and Li, Ke and Miettinen, Kaisa and Yevseyeva, Iryna}, title = {Approximation of a Pareto Set Segment Using a Linear Model with Sharing Variables}, booktitle = {Evolutionary Multi-Criterion Optimization}, year = {2023}, publisher = {Springer Nature Switzerland}, address = {Cham}, pages = {247--259}, doi = {https://doi.org/10.1007/978-3-031-27250-9_18}, dimensions = {true}, isbn = {978-3-031-27250-9} }

2021

USENIX Security
Stars Can Tell: A Robust Method to Defend against GPS Spoofing Attacks using Off-the-shelf Chipset

Shinan Liu, Xiang Cheng, Hanchao Yang, and 6 more authors

In 30th USENIX Security Symposium (USENIX Security 21), Aug 2021

Abs Bib

The GPS has empowered billions of users and various critical infrastructures with its positioning and time services. However, GPS spoofing attacks also become a growing threat to GPS-dependent systems. Existing detection methods either require expensive hardware modifications to current GPS devices or lack the basic robustness against sophisticated attacks, hurting their adoption and usage in practice. In this paper, we propose a novel GPS spoofing detection framework that works with off-the-shelf GPS chipsets. Our basic idea is to rotate a one-side-blocked GPS receiver to derive the angle-of-arrival (AoAs) of received signals and compare them with the GPS constellation (consists of tens of GPS satellites). We first demonstrate the effectiveness of this idea by implementing a smartphone prototype and evaluating it against a real spoofer in various field experiments (in both open air and urban canyon environments). Our method achieves a high accuracy (95%–100%) in 5 seconds. Then we implement an adaptive attack, assuming the attacker becomes aware of our defense method and actively modulates the spoofing signals accordingly. We study this adaptive attack and propose enhancement methods (using the rotation speed as the "secret key") to fortify the defense. Further experiments are conducted to validate the effectiveness of the enhanced defense.
@inproceedings{274743, author = {Liu, Shinan and Cheng, Xiang and Yang, Hanchao and Shu, Yuanchao and Weng, Xiaoran and Guo, Ping and Zeng, Kexiong (Curtis) and Wang, Gang and Yang, Yaling}, dimensions = {true}, title = {Stars Can Tell: A Robust Method to Defend against {GPS} Spoofing Attacks using Off-the-shelf Chipset}, booktitle = {30th USENIX Security Symposium (USENIX Security 21)}, year = {2021}, isbn = {978-1-939133-24-3}, publisher = {USENIX Association}, pages = {3935--3952}, url = {https://www.usenix.org/conference/usenixsecurity21/presentation/liu-shinan}, month = aug, }

Preprints

2025

arXiv
PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks

Ping Guo, Xiang Li, Zhiyuan Yang, and 3 more authors

2025

Abs Bib HTML PDF

Black-box query-based attacks constitute significant threats to Machine Learning as a Service (MLaaS) systems since they can generate adversarial examples without accessing the target model’s architecture and parameters. Traditional defense mechanisms, such as adversarial training, gradient masking, and input transformations, either impose substantial computational costs or compromise the test accuracy of non-adversarial inputs. To address these challenges, we propose an efficient defense mechanism, PuriDefense, that employs random patch-wise purifications with an ensemble of lightweight purification models at a low level of inference cost. These models leverage the local implicit function and rebuild the natural image manifold. Our theoretical analysis suggests that this approach slows down the convergence of query-based attacks by incorporating randomness into purifications. Extensive experiments on CIFAR-10 and ImageNet validate the effectiveness of our proposed purifier-based defense mechanism, demonstrating significant improvements in robustness against query-based attacks.
@misc{guo2025puridefenserandomizedlocalimplicit, author = {Guo, Ping and Li, Xiang and Yang, Zhiyuan and Lin, Xi and Zhao, Qingchuan and Zhang, Qingfu}, title = {PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based Attacks}, year = {2025}, eprint = {2401.10586}, archiveprefix = {arXiv}, primaryclass = {cs.CR}, dimensions = {true}, }
arXiv
EvoVerilog: Large Langugage Model Assisted Evolution of Verilog Code

Ping Guo, Yiting Wang, Wanghao Ye, and 5 more authors

2025

Abs Bib HTML PDF

Large Language Models (LLMs) have demonstrated great potential in automating the generation of Verilog hardware description language code for hardware design. This automation is critical to reducing human effort in the complex and error-prone process of hardware design. However, existing approaches predominantly rely on human intervention and fine-tuning using curated datasets, limiting their scalability in automated design workflows. Although recent iterative search techniques have emerged, they often fail to explore diverse design solutions and may underperform simpler approaches such as repeated prompting. To address these limitations, we introduce EvoVerilog, a novel framework that combines the reasoning capabilities of LLMs with evolutionary algorithms to automatically generate and refine Verilog code. EvoVerilog utilizes a multiobjective, population-based search strategy to explore a wide range of design possibilities without requiring human intervention. Extensive experiments demonstrate that EvoVerilog achieves state-of-the-art performance, with pass@10 scores of 89.1 and 80.2 on the VerilogEval-Machine and VerilogEval-Human benchmarks, respectively. Furthermore, the framework showcases its ability to explore diverse designs by simultaneously generating a variety of functional Verilog code while optimizing resource utilization.
@misc{guo2025evoveriloglargelangugagemodel, author = {Guo, Ping and Wang, Yiting and Ye, Wanghao and He, Yexiao and Wang, Ziyao and Dai, Xiaopeng and Li, Ang and Zhang, Qingfu}, title = {EvoVerilog: Large Langugage Model Assisted Evolution of Verilog Code}, year = {2025}, eprint = {2508.13156}, archiveprefix = {arXiv}, primaryclass = {cs.AR}, dimensions = {true}, }
arXiv
EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models

Ping Guo, Chenyu Zhu, Siyuan Chen, and 4 more authors

2025

Abs Bib HTML PDF

CUDA kernel optimization has become a critical bottleneck for AI performance, as deep learning training and inference efficiency directly depends on highly optimized GPU kernels. Despite the promise of Large Language Models (LLMs) for automating kernel optimization, this field suffers from a fragmented ecosystem of isolated and incomparable approaches with unclear problem formulations. Furthermore, general-purpose LLM code evolution methods cannot meet strict correctness requirements of CUDA kernel optimization. We address these fundamental challenges by first formalizing CUDA kernel optimization as a code optimization task with a clear objective, constraints, and evaluation metrics. We then establish the first systematic LLM-based code evolution framework, EvoEngineer, that provides guidance for designing and adapting optimization strategies to achieve a balance between performance and correctness. Finally, we implement a kernel optimization system based on this framework and conduct extensive experiments on 91 real-world CUDA kernels. Our results demonstrate that EvoEngineer achieves a principled balance between performance and correctness, with the highest averaged median speedup of \textbf2.72\times over baseline CUDA kernels and a code validity rate of \textbf69.8%, outperforming existing methods on both dimensions. Our method achieves a maximum speedup of \textbf36.75\times among all operations over PyTorch kernels and delivers the highest speedup on \textbf28 (\textbf56.0%) of 50 operations that achieve over \textbf2\times acceleration.
@misc{guo2025evoengineermasteringautomatedcuda, author = {Guo, Ping and Zhu, Chenyu and Chen, Siyuan and Liu, Fei and Lin, Xi and Lu, Zhichao and Zhang, Qingfu}, title = {EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models}, year = {2025}, eprint = {2510.03760}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, dimensions = {true}, }