Arrow Research search

Author name cluster

Junfeng Long

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
2 author rows

Possible papers

4

AAAI Conference 2026 Conference Paper

LLMs Unleashed: Generating Protocol Code from RFC Specifications

  • Junfeng Long
  • Jinshu Su
  • Biao Han

RFC (Request for Comments) documents constitute the foundation of network protocol standardization. However, they are expressed in natural language, they tend to be lengthy and ambiguous, forcing protocol implementers to rely on extensive manual parsing and coding—a process that is both labor-intensive and prone to errors. This makes the automated parsing and comprehension of RFC documents a major challenge in network protocol research. To address this gap, we introduce large language models (LLMs) into the task of automatic network protocol code generation from RFC documents (RFC2Code) and propose a comprehensive evaluation framework to quantitatively assess LLM performance. We develop an end-to-end automated protocol generation system, APG (Automated Protocol-Generation), which supports implementations of ICMP, IGMP, NTP, and TCP. Compared to prior NLP (Natural language processing) methods, APG achieves a fully automated workflow with approximately 3.17× faster processing, 95% compile success and behavioral correctness for stateless protocols like ICMP, and 90% interoperability for complex stateful protocols such as TCP, requiring only minimal manual intervention.

ICRA Conference 2025 Conference Paper

Learning Humanoid Locomotion with Perceptive Internal Model

  • Junfeng Long
  • Junli Ren
  • Moji Shi
  • Zirui Wang
  • Tao Huang
  • Ping Luo 0002
  • Jiangmiao Pang

In contrast to quadruped robots that can navigate diverse terrains using a “blind” policy, humanoid robots require accurate perception for stable locomotion due to their high degrees of freedom and inherently unstable morphology. However, incorporating perceptual signals often introduces additional disturbances to the system, potentially reducing its robustness, generalizability, and efficiency. This paper presents the Perceptive Internal Model (PIM), which relies on onboard, continuously updated elevation maps centered around the robot to perceive its surroundings. We train the policy using ground-truth obstacle heights surrounding the robot in simulation, optimizing it based on the Hybrid Internal Model (HIM), and perform inference with heights sampled from the constructed elevation map. Unlike previous methods that directly encode depth maps or raw point clouds, our approach allows the robot to perceive the terrain beneath its feet clearly and is less affected by camera movement or noise. Furthermore, since depth map rendering is not required in simulation, our method introduces minimal additional computational costs and can train the policy in 3 hours on an RTX 4090 GPU. We verify the effectiveness of our method across various humanoid robots, various indoor and outdoor terrains, stairs, and various sensor configurations. Our method can enable a humanoid robot to continuously climb stairs and has the potential to serve as a foundational algorithm for the development of future humanoid control methods.

ICLR Conference 2024 Conference Paper

Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response

  • Junfeng Long
  • Zirui Wang
  • Quanyi Li
  • Liu Cao
  • Jiawei Gao 0004
  • Jiangmiao Pang

Robust locomotion control depends on accurate state estimations. However, the sensors of most legged robots can only provide partial and noisy observations, making the estimation particularly challenging, especially for external states like terrain frictions and elevation maps. Inspired by the classical Internal Model Control principle, we consider these external states as disturbances and introduce Hybrid Internal Model (HIM) to estimate them according to the response of the robot. The response, which we refer to as the hybrid internal embedding, contains the robot’s explicit velocity and implicit stability representation, corresponding to two primary goals for locomotion tasks: explicitly tracking velocity and implicitly maintaining stability. We use contrastive learning to optimize the embedding to be close to the robot’s successor state, in which the response is naturally embedded. HIM has several appealing benefits: It only needs the robot’s proprioceptions, i.e., those from joint encoders and IMU as observations. It innovatively maintains consistent observations between simulation reference and reality that avoids information loss in mimicking learning. It exploits batch-level information that is more robust to noises and keeps better sample efficiency. It only requires 1 hour of training on an RTX 4090 to enable a quadruped robot to traverse any terrain under any disturbances. A wealth of real-world experiments demonstrates its agility, even in high-difficulty tasks and cases never occurred during the training process, revealing remarkable open-world generalizability.

NeurIPS Conference 2024 Conference Paper

Parallelizing Model-based Reinforcement Learning Over the Sequence Length

  • Zirui Wang
  • Yue Deng
  • Junfeng Long
  • Yin Zhang

Recently, Model-based Reinforcement Learning (MBRL) methods have demonstrated stunning sample efficiency in various RL domains. However, achieving this extraordinary sample efficiency comes with additional training costs in terms of computations, memory, and training time. To address these challenges, we propose the Pa rallelized Mo del-based R einforcement L earning ( PaMoRL ) framework. PaMoRL introduces two novel techniques: the P arallel W orld M odel ( PWM ) and the P arallelized E ligibility T race E stimation ( PETE ) to parallelize both model learning and policy learning stages of current MBRL methods over the sequence length. Our PaMoRL framework is hardware-efficient and stable, and it can be applied to various tasks with discrete or continuous action spaces using a single set of hyperparameters. The empirical results demonstrate that the PWM and PETE within PaMoRL significantly increase training speed without sacrificing inference efficiency. In terms of sample efficiency, PaMoRL maintains an MBRL-level sample efficiency that outperforms other no-look-ahead MBRL methods and model-free RL methods, and it even exceeds the performance of planning-based MBRL methods and methods with larger networks in certain tasks.