Author name cluster

Ling Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

52 papers

2 author rows

YNIMG Journal 2026 Journal Article

Distinct frontal lobe subregions mediate the emergence and reporting of visual consciousness

Yan Zhang
Xiarong Li
Zhenlan Jin
Junjun Zhang
Ling Li

Details DOI

AAAI Conference 2026 Conference Paper

Efficient Diffusion Planning with Temporal Diffusion

Jiaming Guo
Rui Zhang
Zerun Li
Yunkai Gao
Shaohui Peng
Siming Lan
Xing Hu
Zidong Du

Diffusion planning is a promising method for learning high-performance policies from offline data. To avoid the impact of discrepancies between planning and reality on performance, previous works generate new plans at each time step. However, this incurs significant computational overhead and leads to lower decision frequencies, and frequent plan switching may also affect performance. In contrast, humans might create detailed short-term plans and more general, sometimes vague, long-term plans, and adjust them over time. Inspired by this, we propose the Temporal Diffusion Planner (TDP) which improves decision efficiency by distributing the denoising steps across the time dimension. TDP begins by generating an initial plan that becomes progressively more vague over time. At each subsequent time step, rather than generating an entirely new plan, TDP updates the previous one with a small number of denoising steps. This reduces the average number of denoising steps, improving decision efficiency. Additionally, we introduce an automated replanning mechanism to prevent significant deviations between the plan and reality. Experiments on D4RL show that, compared to previous works that generate new plans every time step, TDP significantly improves the decision-making frequency by 11-24.8 times while achieving higher or comparable performance.

PDF Details DOI

YNIMG Journal 2026 Journal Article

Gray matter volume predicts decision speed and reveals stage-specific contributions of large-scale brain networks in gambling tasks

Tingting Zhang
Qiuzhu Zhang
Ronglong Xiong
Junjun Zhang
Zhenlan Jin
Ling Li

Details DOI

AAAI Conference 2026 Conference Paper

QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation

Xinguo Zhu
Shaohui Peng
Jiaming Guo
Yunji Chen
Qi Guo
Yuanbo Wen
Hang Qin
Ruizhi Chen

Developing high-performance GPU kernels is critical for AI and scientific computing, but remains challenging due to its reliance on expert crafting and poor portability. While large language models (LLMs) offer promise for automation, both general-purpose and finetuned LLMs suffer from two fundamental and conflicting limitations: correctness and efficiency. The key reason is that existing LLM-based approaches directly generate the entire optimized low-level programs, requiring exploration of an extremely vast space encompassing both optimization policies and implementation codes. To address the challenge of exploring an intractable space, we propose Macro Thinking Micro Coding (MTMC), a hierarchical framework inspired by the staged optimization strategy of human experts. It decouples optimization strategy from implementation details, ensuring efficiency through high-level strategy and correctness through low-level implementation. Specifically, Macro Thinking employs reinforcement learning to guide lightweight LLMs in efficiently exploring and learning semantic optimization strategies that maximize hardware utilization. Micro Coding leverages general-purpose LLMs to incrementally implement the stepwise optimization proposals from Macro Thinking, avoiding full-kernel generation errors. Together, they effectively navigate the vast optimization space and intricate implementation details, enabling LLMs for high-performance GPU kernel generation. Comprehensive results on widely adopted benchmarks demonstrate the superior performance of MTMC on GPU kernel generation in both accuracy and running time. On KernelBench, MTMC achieves near 100% and 70% accuracy at Levels 1-2 and 3, over 50% than SOTA general-purpose and domain-finetuned LLMs, with up to 7.3× speedup over LLMs, and 2.2× over expert-optimized PyTorch Eager kernels. On the more challenging TritonBench, MTMC attains up to 59.64% accuracy and 34× speedup. All models and datasets will be made publicly available.

PDF Details DOI

YNIMG Journal 2026 Journal Article

The Erlangen Program in lateral occipital cortex: Hierarchical encoding of emergent features

Junjun Zhang
Shi Zeng
Baochen Wang
Jingyu He
Zhenlan Jin
Ling Li

Details DOI

EAAI Journal 2025 Journal Article

A hyperparameter-fusion neural networks for deposition prediction

Li Ding
Kun Pang
Junjie Li
Hua Shao
Nan Liu
Rui Chen
Zhiqiang Li
Zhenjie Yao

Details DOI

IROS Conference 2025 Conference Paper

A Safety-Enhanced Autonomous Resection Method for Precision Laparoscopic Surgery amid Tissue Deformation

Yudong Shi
Hangjie Mo
Xilin Xiao
Ruiming Duan
Ling Li
Xiaojian Li

Resection of pathological tissue is a common procedure in surgical oncology for treating tumors. In robot-assisted electrosurgery, the use of predefined markers to guide autonomous robotic resection is gaining traction. Accurate tracking of these markers and minimizing electrocautery damage are critical for the safe and effective autonomous resection of tumors. This paper introduces a safety enhanced autonomous resection method for laparoscopic surgery, designed to mitigate the risks posed by tissue deformation during the resection process. Initially, we pre-plan the cutting path and design a switching strategy for navigation waypoints based on a preview tracking mechanism. Then, we develop a depth-fused navigation controller and a safe withdrawal motion controller. Next, an inertial tracking mechanism is established to evaluate tissue deformation over short periods. Finally, we develop a confidence generator to fuse the two controllers, ensuring that tissue deformation during the resection process does not cause additional electrocautery damage. Simulation and phantom experiments were conducted, demonstrating the effectiveness of our proposed method. This work represents a significant step toward achieving autonomous robotic resection.

Details

IJCAI Conference 2025 Conference Paper

Automated Superscalar Processor Design by Learning Data Dependencies

Shuyao Cheng
Rui Zhang
Wenkai He
Pengwei Jin
Chongxiao Li
Zidong Du
Xing Hu
Yifan Hao

Automated processor design, which can significantly reduce human efforts and accelerate design cycles, has received considerable attention. While recent advancements have automatically designed single-cycle processors that execute one instruction per cycle, their performance cannot compete with modern superscalar processors that execute multiple instructions per cycle. Previous methods fail on superscalar processor design because they cannot address inter-instruction data dependencies, leading to inefficient sequential instruction execution. This paper proposes a novel approach to automatically designing superscalar processors using a hardware-friendly model called the Stateful Binary Speculation Diagram (State-BSD). We observe that processor parallelism can be enhanced through on-the-fly inter-instruction dependent data predictors, reusing the processor's internal states to learn the data dependency. To meet the challenge of both hardware-resource limitation and design functional correctness, State-BSD consists of two components: 1) a lightweight state-selector trained by simulated annealing method to detect the most reusable processor states and store them in a small buffer; and 2) a highly precise state-speculator trained by BSD expansion method to predict the inter-instruction dependent data using the selected states. It is the first work to achieve the automated superscalar processor design, i. e. QiMeng-CPU-v2, which improves the performance by about 380x than the state-of-the-art automated design and is comparable to human-designed superscalar processors such as ARM Cortex A53.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization

Yize Wu
Ke Gao
Ling Li
Yanjun Wu

Speculative decoding is an effective and lossless method for Large Language Model (LLM) inference acceleration. It employs a smaller model to generate a draft token sequence, which is then verified by the original base model. In multi-GPU systems, inference latency can be further reduced through tensor parallelism (TP), while the optimal TP size of the draft model is typically smaller than that of the base model, leading to GPU idling during the drafting stage. We observe that such inefficiency stems from the sequential execution of layers, which is seemingly natural but actually unnecessary. Therefore, we propose EasySpec, a layer-parallel speculation strategy that optimizes the efficiency of multi-GPU utilization. EasySpec breaks the inter-layer data dependencies in the draft model, enabling multiple layers to run simultaneously across multiple devices as ``fuzzy'' speculation. After each drafting-and-verification iteration, the draft model’s key-value cache is calibrated in a single forward pass, preventing long-term fuzzy-error accumulation at minimal additional latency. EasySpec is a training-free and plug-in method. We evaluated EasySpec on several mainstream open-source LLMs, using smaller versions of models from the same series as drafters. The results demonstrate that EasySpec can achieve a peak speedup of 4. 17x compared to vanilla decoding, while preserving the original distributions of the base LLMs. Specifically, the drafting stage can be accelerated by up to 1. 62x with a maximum speculation accuracy drop of only 7\%. The code is available at https: //github. com/Yize-Wu/EasySpec.

PDF Details

ICML Conference 2025 Conference Paper

Equivalence is All: A Unified View for Self-supervised Graph Learning

Yejiang Wang
Yuhai Zhao
Zhengkui Wang
Ling Li
Jiapu Wang
Fangting Li
Miaomiao Huang
Shirui Pan

Node equivalence is common in graphs, such as computing networks, encompassing automorphic equivalence (preserving adjacency under node permutations) and attribute equivalence (nodes with identical attributes). Despite their importance for learning node representations, these equivalences are largely ignored by existing graph models. To bridge this gap, we propose a GrAph self-supervised Learning framework with Equivalence (GALE) and analyze its connections to existing techniques. Specifically, we: 1) unify automorphic and attribute equivalence into a single equivalence class; 2) enforce the equivalence principle to make representations within the same class more similar while separating those across classes; 3) introduce approximate equivalence classes with linear time complexity to address the NP-hardness of exact automorphism detection and handle node-feature variation; 4) analyze existing graph encoders, noting limitations in message passing neural networks and graph transformers regarding equivalence constraints; 5) show that graph contrastive learning are a degenerate form of equivalence constraint; and 6) demonstrate that GALE achieves superior performance over baselines.

Details

EAAI Journal 2025 Journal Article

Etching process prediction based on cascade recurrent neural network

Zhenjie Yao
Ziyi Hu
Panpan Lai
Fengling Qin
Wenrui Wang
Zhicheng Wu
Lingfei Wang
Hua Shao

Details DOI

JBHI Journal 2025 Journal Article

Exploring Neural Mechanisms of Visual Working Memory for Real-World Stimuli Categories: Insights from the Fusiform Gyrus

Ronglong Xiong
Xiaotong Wei
Junjun Zhang
Zhenlan Jin
Ling Li

Objective: To dissect the neural mechanisms underlying visual working memory (VWM) processing of real-world stimuli (Body, Face, Place, Tool). Methods: This study leveraged task-fMRI data from the Human Connectome Project (HCP) n-back paradigm. The Neurofunctional Integration and Specificity Analysis (NISA) framework was proposed to synergistically combine Representational Similarity Analysis (RSA) and multivoxel machine learning classification and regression, enabling distinct characterization of visual perception and VWM processes. Functional connectivity (FC) patterns of NISA-selected regions of interest were further integrated with transcriptomic data to probe molecular substrates. Results: Bilateral fusiform gyrus (FFG) voxel patterns showed maximal stimulus representation fidelity ( r = -0. 43 to -0. 42, q r = 0. 42 to, 0. 56, q ⁻7 ). Transcriptomic decoding revealed associations between bilateral FFG FC profiles and genes implicated in mental and psychiatric disorders ( q < 0. 05). Conclusion: The FFG operates as a dual-process hub, concurrently mediating visual perceptual categorization and working memory maintenance. Its synergistic excitation-inhibition in FFG may optimize the behavior performance through dynamic resource allocation, while FC-transcriptome coupling further revealed gene networks implicated in cognitive vulnerability across VWM categories.

Details DOI

AAAI Conference 2025 Conference Paper

Multi-View 3D Human Pose Estimation with Weakly Synchronized Images

Ling Li
Ruiwen Gu
Chongyang Wang
Junliang Xing
Xinchun Yu
Xiao-Ping Zhang

Multi-view 3D human pose estimation (MHPE) is an important research task in computer vision. To maintain consistency during the data collection, hardware synchronization devices are commonly used to connect cameras, ensuring that images from different views are captured simultaneously. However, synchronizing with extra devices has two apparent limitations: the hardware is i) usually expensive and ii) less flexible for deployment in outdoor open scenarios. Suppose the model can improve its tolerance for the time differences in multi-view image capture. In that case, the difficulty and cost of deployment will be greatly reduced, and MHPE will become more widespread. In this paper, we try to answer how to build a model that performs pose estimation directly using ''weakly synchronized images" from multiple views, where the captured images shift from each other within a frame. To this end, we introduce a new multi-view 3D human pose estimation task given weakly synchronized image inputs. Apart from existing well-synchronized datasets, we present the first weakly synchronized dataset comprising 800k images. Thereon, we propose SyncDiffPose, a novel model based on the diffusion method for pose estimation to denoise the error in such data. By combining simple synchronization strategies, e.g., the timer method, our approach can perform pose estimation without hardware calibration.

PDF Details DOI

ICML Conference 2025 Conference Paper

N2GON: Neural Networks for Graph-of-Net with Position Awareness

Yejiang Wang
Yuhai Zhao
Zhengkui Wang
Wen Shan
Ling Li
Qian Li 0043
Miaomiao Huang
Meixia Wang

Graphs, fundamental in modeling various research subjects such as computing networks, consist of nodes linked by edges. However, they typically function as components within larger structures in real-world scenarios, such as in protein-protein interactions where each protein is a graph in a larger network. This study delves into the Graph-of-Net (GON), a structure that extends the concept of traditional graphs by representing each node as a graph itself. It provides a multi-level perspective on the relationships between objects, encapsulating both the detailed structure of individual nodes and the broader network of dependencies. To learn node representations within the GON, we propose a position-aware neural network for Graph-of-Net which processes both intra-graph and inter-graph connections and incorporates additional data like node labels. Our model employs dual encoders and graph constructors to build and refine a constraint network, where nodes are adaptively arranged based on their positions, as determined by the network’s constraint system. Our model demonstrates significant improvements over baselines in empirical evaluations on various datasets.

Details

YNIMG Journal 2025 Journal Article

Noise and artifact suppression in SQUID and wearable OPM-MEG: A systematic review of background, physiological, and Technical interference

Ruonan Wang
Yujie Ma
Ruochen Zhao
Jin Ding
Ling Li
Yanfei Yang
Fulong Wang
Zhiqiang Cao

Details DOI

AAAI Conference 2025 Conference Paper

QiMeng-GEMM: Automatically Generating High-Performance Matrix Multiplication Code by Exploiting Large Language Models

Qirui Zhou
Yuanbo Wen
Ruizhi Chen
Ke Gao
Weiqiang Xiong
Ling Li
Qi Guo
Yanjun Wu

As a crucial operator in numerous scientific and engineering computing applications, the automatic optimization of General Matrix Multiplication (GEMM) with full utilization of ever-evolving hardware architectures (e.g. GPUs and RISC-V) is of paramount importance. While Large Language Models (LLMs) can generate functionally correct code for simple tasks, they have yet to produce high-performance code. The key challenge resides in deeply understanding diverse hardware architectures and crafting prompts that effectively unleash the potential of LLMs to generate high-performance code. In this paper, we propose a novel prompt mechanism called QiMeng-GEMM which enables LLMs to comprehend the architectural characteristics of different hardware platforms and automatically search for the optimization combinations for GEMM. The key of QiMeng-GEMM is a set of informative, adaptive, and iterative meta-prompts. Based on this, a searching strategy for optimal combinations of meta-prompts is used to iteratively generate high-performance code. Extensive experiments conducted on 4 leading LLMs, various paradigmatic hardware platforms, and representative matrix dimensions unequivocally demonstrate QiMeng-GEMM’s superior performance in auto-generating optimized GEMM code. Compared to vanilla prompts, our method achieves a performance enhancement of up to 113×. Even when compared to human experts, our method can reach 115% of cuBLAS on NVIDIA GPUs and 211% of OpenBLAS on RISC-V CPUs. Notably, while human experts often take months to optimize GEMM, our approach reduces the development cost by over 240×.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

QiMeng-MuPa: Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Changxin Ke
Rui Zhang
Shuo Wang
Li Ding
Guangli Li
Yuanbo Wen
Shuoming Zhang
Ruiyuan Xu

The rise of GPU-based high-performance computing (HPC) has driven the widespread adoption of parallel programming models such as CUDA. Yet, the inherent complexity of parallel programming creates a demand for the automated sequential-to-parallel approaches. However, data scarcity poses a significant challenge for machine learning-based sequential-to-parallel code translation. Although recent back-translation methods show promise, they still fail to ensure functional equivalence in the translated code. In this paper, we propose \textbf{QiMeng-MuPa}, a novel \textbf{Mu}tual-Supervised Learning framework for Sequential-to-\textbf{Pa}rallel code translation, to address the functional equivalence issue. QiMeng-MuPa consists of two models, a Translator and a Tester. Through an iterative loop consisting of Co-verify and Co-evolve steps, the Translator and the Tester mutually generate data for each other and improve collectively. The Tester generates unit tests to verify and filter functionally equivalent translated code, thereby evolving the Translator, while the Translator generates translated code as augmented input to evolve the Tester. Experimental results demonstrate that QiMeng-MuPa significantly enhances the performance of the base models: when applied to Qwen2. 5-Coder, it not only improves Pass@1 by up to 28. 91\% and boosts Tester performance by 68. 90\%, but also outperforms the previous state-of-the-art method CodeRosetta by 1. 56 and 6. 92 in BLEU and CodeBLEU scores, while achieving performance comparable to DeepSeek-R1 and GPT-4. 1. Our code is available at \url{https: //github. com/kcxain/mupa}.

PDF Details

IJCAI Conference 2025 Conference Paper

QiMeng-TensorOp: One-Line Prompt is Enough for High-Performance Tensor Operator Generation with Hardware Primitives

Xuzhi Zhang
Shaohui Peng
Qirui Zhou
Yuanbo Wen
Qi Guo
Ruizhi Chen
Xinguo Zhu
Weiqiang Xiong

Computation-intensive tensor operators constitute over 90% of the computations in Large Language Models (LLMs) and Deep Neural Networks. Automatically and efficiently generating high-performance tensor operators with hardware primitives is crucial for diverse and ever-evolving hardware architectures like RISC-V, ARM, and GPUs, as manually optimized implementation takes at least months and lacks portability. LLMs excel at generating high-level language codes, but they struggle to fully comprehend hardware characteristics and produce high-performance tensor operators. We introduce a tensor-operator auto-generation framework with a one-line user prompt (QiMeng-TensorOp), which enables LLMs to automatically exploit hardware characteristics to generate tensor operators with hardware primitives, and tune parameters for optimal performance across diverse hardware. Experimental results on various hardware platforms, SOTA LLMs, and typical tensor operators demonstrate that QiMeng-TensorOp effectively unleashes the computing capability of various hardware platforms, and automatically generates tensor operators of superior performance. Compared with vanilla LLMs, QiMeng-TensorOp achieves up to 1291× performance improvement. Even compared with human experts, QiMeng-TensorOp could reach 251% of OpenBLAS on RISC-V CPUs, and 124% of cuBLAS on NVIDIA GPUs. Additionally, QiMeng-TensorOp also significantly reduces development costs by 200× compared with human experts.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models

Ling Li
Yao Zhou
Yuxuan Liang
Fugee Tsung
Jiaheng Wei

Previous methods for image geo-localization have typically treated the task as either classification or retrieval, often relying on black-box decisions that lack interpretability. The rise of large vision-language models (LVLMs) has enabled a rethinking of geo-localization as a reasoning-driven task grounded in visual cues. However, two major challenges persist. On the data side, existing reasoning-focused datasets are primarily based on street-view imagery, offering limited scene diversity and constrained viewpoints. On the modeling side, current approaches predominantly rely on supervised fine-tuning, which yields only marginal improvements in reasoning capabilities. To address these challenges, we propose a novel pipeline that constructs a reasoning-oriented geo-localization dataset, $\textit{MP16-Reason}$, using diverse social media images. We introduce $\textit{GLOBE}$, $\textbf{G}$roup-relative policy optimization for $\textbf{L}$ocalizability assessment and $\textbf{O}$ptimized visual-cue reasoning, yielding $\textbf{B}$i-objective geo-$\textbf{E}$nhancement for the VLM in recognition and reasoning. $\textit{GLOBE}$ incorporates task-specific rewards that jointly enhance localizability assessment, visual-cue reasoning, and geolocation accuracy. Both qualitative and quantitative results demonstrate that $\textit{GLOBE}$ outperforms state-of-the-art open-source LVLMs on geo-localization tasks, particularly in diverse visual scenes, while also generating more insightful and interpretable reasoning trajectories. The data and code are available at https: //github. com/lingli1996/GLOBE.

PDF Details

YNIMG Journal 2025 Journal Article

Study on individual differences in visual working memory tasks based on spatiotemporal brain functional metrics and biological perspectives

Ronglong Xiong
Qiuzhu Zhang
Junjun Zhang
Zhenlan Jin
Ling Li

Details DOI

ICRA Conference 2024 Conference Paper

A Force-driven and Vision-driven Hybrid Control Method of Autonomous Laparoscope-Holding Robot

Jin Fang
Ling Li
Xiaojian Li
Hangjie Mo
Pengxin Guo
Xilin Xiao
Yanwei Qu

Laparoscope-holding robots significantly enhance the stability and precision of visualization in minimally invasive surgeries. Most existing robots of this kind depend on visual servo systems and struggle with efficient, rapid adjustments in the field-of-view (FOV), especially when identifying organs and needles outside the FOV. This paper presents a laparoscope-holding robot system capable of employing both vision-driven and force-driven mechanisms for continuous and large-scale FOV adjustments, respectively. The system features an integrated tactile handle, enabling the reception of human-robot interaction forces during surgical navigation. We propose a hybrid control method that leverages both force and vision inputs for laparoscopic FOV adjustments. This approach integrates a virtual wrench, generated from visual information, and an interaction wrench, obtained from the tactile handle, into the robot's dynamic model, which complies with remote center of motion constraints. The interaction wrench's gain is adjusted with the gripping force on the integrated tactile handle, ensuring that unintended movements caused by accidental contacts are prevented, thus safeguarding operational safety. The proposed method eliminates the need to switch control modes, enabling simultaneous visual tracking and tactile interaction guidance. Experimental results demonstrate that the proposed method not only allows for FOV adjustments with surgical instrument guiding but also adapts well to large-scale FOV adjustment tasks.

Details

NeurIPS Conference 2024 Conference Paper

DA-Ada: Learning Domain-Aware Adapter for Domain Adaptive Object Detection

Haochen Li
Rui Zhang
Hantao Yao
Xin Zhang
Yifan Hao
Xinkai Song
Xiaqing Li
Yongwei Zhao

Domain adaptive object detection (DAOD) aims to generalize detectors trained on an annotated source domain to an unlabelled target domain. As the visual-language models (VLMs) can provide essential general knowledge on unseen images, freezing the visual encoder and inserting a domain-agnostic adapter can learn domain-invariant knowledge for DAOD. However, the domain-agnostic adapter is inevitably biased to the source domain. It discards some beneficial knowledge discriminative on the unlabelled domain, \ie domain-specific knowledge of the target domain. To solve the issue, we propose a novel Domain-Aware Adapter (DA-Ada) tailored for the DAOD task. The key point is exploiting domain-specific knowledge between the essential general knowledge and domain-invariant knowledge. DA-Ada consists of the Domain-Invariant Adapter (DIA) for learning domain-invariant knowledge and the Domain-Specific Adapter (DSA) for injecting the domain-specific knowledge from the information discarded by the visual encoder. Comprehensive experiments over multiple DAOD tasks show that DA-Ada can efficiently infer a domain-aware visual encoder for boosting domain adaptive object detection. Our code is available at https: //github. com/Therock90421/DA-Ada.

PDF Details DOI

IROS Conference 2024 Conference Paper

DESectBot: Design and Validation of a Novel Two-Segment Decoupled Continuum Robotic System for Endoscopic Submucosal Dissection

Wenjie Liu
Yuancheng Shao
Yao Zhang 0029
Zixi Chen
Di Wu 0053
Yuqiao Chen
Cesare Stefanini
Ling Li

Endoscopic Submucosal Dissection (ESD) is a minimally invasive procedure designed to remove precancerous and cancerous lesions from the gastrointestinal (GI) tract. Given the GI tract’s tortuous and narrow shape, along with the need for varied movements during dissection, this requires highly flexible and compact instruments, making flexible continuum robots suitable candidates. In this paper, we propose a novel two-segment continuum robot system named DESectBot, featuring a diameter of 5. 5 mm and a total length of the active bending module of 48 mm, while the robot’s total length exceeds 1 m. We designed a novel joint combination structure called the spatial cross-curved disk skeleton for the robot, which addresses the mechanical coupling problem between flexible robot actuators. The DESectBot boasts six degrees of freedom, and its kinematic modeling has been derived and utilized in the closed-loop control of the DESectBot. The validation of the DESectBot was conducted through a two-stage test: first, the decoupling performance of the DESectBot was validated. The results show that when one active bending segment bends, the other segment remains almost uninfluenced, with a maximum variation of 1. 15 degrees, demonstrating the robot’s effective decoupling capability. Secondly, the accuracy of DESectBot was validated through trajectory-following experiments. The results reveal that the average tracking error for both trajectories is less than 2 mm, and the maximum tracking error is below 2. 5 mm. Taking marking, one of the ESD procedures with a 5mm tolerance, as an example, the DESectBot has the potential to be utilized for ESD procedure.

Details

ICML Conference 2024 Conference Paper

GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

Ling Li
Yu Ye 0002
Bingchuan Jiang
Wei Zeng 0004

This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) augmented with human inference knowledge. A primary challenge here is the scarcity of data for training the LVLM - existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise a CLIP-based network to quantify the degree of street-view images being locatable, leading to the creation of a new dataset comprising highly locatable street views. To enhance reasoning inference, we integrate external knowledge obtained from real geo-localization games, tapping into valuable human inference capabilities. The data are utilized to train GeoReasoner, which undergoes fine-tuning through dedicated reasoning and location-tuning stages. Qualitative and quantitative evaluations illustrate that GeoReasoner outperforms counterpart LVLMs by more than 25% at country-level and 38% at city-level geo-localization tasks, and surpasses StreetCLIP performance while requiring fewer training resources. The data and code are available at https: //github. com/lingli1996/GeoReasoner.

Details

AAAI Conference 2024 Conference Paper

Hypothesis, Verification, and Induction: Grounding Large Language Models with Self-Driven Skill Learning

Shaohui Peng
Xing Hu
Qi Yi
Rui Zhang
Jiaming Guo
Di Huang
Zikang Tian
Ruizhi Chen

Large language models (LLMs) show their powerful automatic reasoning and planning capability with a wealth of semantic knowledge about the human world. However, the grounding problem still hinders the applications of LLMs in the real-world environment. Existing studies try to fine-tune the LLM or utilize pre-defined behavior APIs to bridge the LLMs and the environment, which not only costs huge human efforts to customize for every single task but also weakens the generality strengths of LLMs. To autonomously ground the LLM onto the environment, we proposed the Hypothesis, Verification, and Induction (HYVIN) framework to automatically and progressively ground the LLM with self-driven skill learning. HYVIN first employs the LLM to propose the hypothesis of sub-goals to achieve tasks and then verify the feasibility of the hypothesis via interacting with the underlying environment. Once verified, HYVIN can then learn generalized skills with the guidance of these successfully grounded subgoals. These skills can be further utilized to accomplish more complex tasks that fail to pass the verification phase. Verified in the famous instruction following task set, BabyAI, HYVIN achieves comparable performance in the most challenging tasks compared with imitation learning methods that cost millions of demonstrations, proving the effectiveness of learned skills and showing the feasibility and efficiency of our framework.

PDF Details DOI

YNIMG Journal 2024 Journal Article

Identifying individual's distractor suppression using functional connectivity between anatomical large-scale brain regions

Lei Zhuo
Zhenlan Jin
Ke Xie
Simeng Li
Feng Lin
Junjun Zhang
Ling Li

Details DOI

AAAI Conference 2024 Conference Paper

OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning

Fan Wu
Rui Zhang
Qi Yi
Yunkai Gao
Jiaming Guo
Shaohui Peng
Siming Lan
Husheng Han

Model-based offline reinforcement learning (RL) algorithms have emerged as a promising paradigm for offline RL. These algorithms usually learn a dynamics model from a static dataset of transitions, use the model to generate synthetic trajectories, and perform conservative policy optimization within these trajectories. However, our observations indicate that policy optimization methods used in these model-based offline RL algorithms are not effective at exploring the learned model and induce biased exploration, which ultimately impairs the performance of the algorithm. To address this issue, we propose Offline Conservative ExplorAtioN (OCEAN), a novel rollout approach to model-based offline RL. In our method, we incorporate additional exploration techniques and introduce three conservative constraints based on uncertainty estimation to mitigate the potential impact of significant dynamic errors resulting from exploratory transitions. Our work is a plug-in method and can be combined with classical model-based RL algorithms, such as MOPO, COMBO, and RAMBO. Experiment results of our method on the D4RL MuJoCo benchmark show that OCEAN significantly improves the performance of existing algorithms.

PDF Details DOI

YNIMG Journal 2024 Journal Article

Task functional networks predict individual differences in the speed of emotional facial discrimination

Toluwani Joan Amos
Bishal Guragai
Qianru Rao
Wenjuan Li
Zhenlan Jin
Junjun Zhang
Ling Li

Details DOI

IJCAI Conference 2023 Conference Paper

ALL-E: Aesthetics-guided Low-light Image Enhancement

Ling Li
Dong Liang
Yuanhang Gao
Sheng-Jun Huang
Songcan Chen

Evaluating the performance of low-light image enhancement (LLE) is highly subjective, thus making integrating human preferences into image enhancement a necessity. Existing methods fail to consider this and present a series of potentially valid heuristic criteria for training enhancement models. In this paper, we propose a new paradigm, i. e. , aesthetics-guided low-light image enhancement (ALL-E), which introduces aesthetic preferences to LLE and motivates training in a reinforcement learning framework with an aesthetic reward. Each pixel, functioning as an agent, refines itself by recursive actions, i. e. , its corresponding adjustment curve is estimated sequentially. Extensive experiments show that integrating aesthetic assessment improves both subjective experience and objective evaluation. Our results on various benchmarks demonstrate the superiority of ALL-E over state-of-the-art methods. Source code: https: //dongl-group. github. io/project pages/ALLE. html

PDF Details DOI

AAAI Conference 2023 Conference Paper

Conceptual Reinforcement Learning for Language-Conditioned Tasks

Shaohui Peng
Xing Hu
Rui Zhang
Jiaming Guo
Qi Yi
Ruizhi Chen
Zidong Du
Ling Li

Despite the broad application of deep reinforcement learning (RL), transferring and adapting the policy to unseen but similar environments is still a significant challenge. Recently, the language-conditioned policy is proposed to facilitate policy transfer through learning the joint representation of observation and text that catches the compact and invariant information across various environments. Existing studies of language-conditioned RL methods often learn the joint representation as a simple latent layer for the given instances (episode-specific observation and text), which inevitably includes noisy or irrelevant information and cause spurious correlations that are dependent on instances, thus hurting generalization performance and training efficiency. To address the above issue, we propose a conceptual reinforcement learning (CRL) framework to learn the concept-like joint representation for language-conditioned policy. The key insight is that concepts are compact and invariant representations in human cognition through extracting similarities from numerous instances in real-world. In CRL, we propose a multi-level attention encoder and two mutual information constraints for learning compact and invariant concepts. Verified in two challenging environments, RTFM and Messenger, CRL significantly improves the training efficiency (up to 70%) and generalization ability (up to 30%) to the new environment dynamics.

PDF Details DOI