YNIMG Journal 2026 Journal Article
Distinct frontal lobe subregions mediate the emergence and reporting of visual consciousness
- Yan Zhang
- Xiarong Li
- Zhenlan Jin
- Junjun Zhang
- Ling Li
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
YNIMG Journal 2026 Journal Article
AAAI Conference 2026 Conference Paper
Diffusion planning is a promising method for learning high-performance policies from offline data. To avoid the impact of discrepancies between planning and reality on performance, previous works generate new plans at each time step. However, this incurs significant computational overhead and leads to lower decision frequencies, and frequent plan switching may also affect performance. In contrast, humans might create detailed short-term plans and more general, sometimes vague, long-term plans, and adjust them over time. Inspired by this, we propose the Temporal Diffusion Planner (TDP) which improves decision efficiency by distributing the denoising steps across the time dimension. TDP begins by generating an initial plan that becomes progressively more vague over time. At each subsequent time step, rather than generating an entirely new plan, TDP updates the previous one with a small number of denoising steps. This reduces the average number of denoising steps, improving decision efficiency. Additionally, we introduce an automated replanning mechanism to prevent significant deviations between the plan and reality. Experiments on D4RL show that, compared to previous works that generate new plans every time step, TDP significantly improves the decision-making frequency by 11-24.8 times while achieving higher or comparable performance.
YNIMG Journal 2026 Journal Article
AAAI Conference 2026 Conference Paper
Developing high-performance GPU kernels is critical for AI and scientific computing, but remains challenging due to its reliance on expert crafting and poor portability. While large language models (LLMs) offer promise for automation, both general-purpose and finetuned LLMs suffer from two fundamental and conflicting limitations: correctness and efficiency. The key reason is that existing LLM-based approaches directly generate the entire optimized low-level programs, requiring exploration of an extremely vast space encompassing both optimization policies and implementation codes. To address the challenge of exploring an intractable space, we propose Macro Thinking Micro Coding (MTMC), a hierarchical framework inspired by the staged optimization strategy of human experts. It decouples optimization strategy from implementation details, ensuring efficiency through high-level strategy and correctness through low-level implementation. Specifically, Macro Thinking employs reinforcement learning to guide lightweight LLMs in efficiently exploring and learning semantic optimization strategies that maximize hardware utilization. Micro Coding leverages general-purpose LLMs to incrementally implement the stepwise optimization proposals from Macro Thinking, avoiding full-kernel generation errors. Together, they effectively navigate the vast optimization space and intricate implementation details, enabling LLMs for high-performance GPU kernel generation. Comprehensive results on widely adopted benchmarks demonstrate the superior performance of MTMC on GPU kernel generation in both accuracy and running time. On KernelBench, MTMC achieves near 100% and 70% accuracy at Levels 1-2 and 3, over 50% than SOTA general-purpose and domain-finetuned LLMs, with up to 7.3× speedup over LLMs, and 2.2× over expert-optimized PyTorch Eager kernels. On the more challenging TritonBench, MTMC attains up to 59.64% accuracy and 34× speedup. All models and datasets will be made publicly available.
YNIMG Journal 2026 Journal Article
EAAI Journal 2025 Journal Article
IROS Conference 2025 Conference Paper
Resection of pathological tissue is a common procedure in surgical oncology for treating tumors. In robot-assisted electrosurgery, the use of predefined markers to guide autonomous robotic resection is gaining traction. Accurate tracking of these markers and minimizing electrocautery damage are critical for the safe and effective autonomous resection of tumors. This paper introduces a safety enhanced autonomous resection method for laparoscopic surgery, designed to mitigate the risks posed by tissue deformation during the resection process. Initially, we pre-plan the cutting path and design a switching strategy for navigation waypoints based on a preview tracking mechanism. Then, we develop a depth-fused navigation controller and a safe withdrawal motion controller. Next, an inertial tracking mechanism is established to evaluate tissue deformation over short periods. Finally, we develop a confidence generator to fuse the two controllers, ensuring that tissue deformation during the resection process does not cause additional electrocautery damage. Simulation and phantom experiments were conducted, demonstrating the effectiveness of our proposed method. This work represents a significant step toward achieving autonomous robotic resection.
IJCAI Conference 2025 Conference Paper
Automated processor design, which can significantly reduce human efforts and accelerate design cycles, has received considerable attention. While recent advancements have automatically designed single-cycle processors that execute one instruction per cycle, their performance cannot compete with modern superscalar processors that execute multiple instructions per cycle. Previous methods fail on superscalar processor design because they cannot address inter-instruction data dependencies, leading to inefficient sequential instruction execution. This paper proposes a novel approach to automatically designing superscalar processors using a hardware-friendly model called the Stateful Binary Speculation Diagram (State-BSD). We observe that processor parallelism can be enhanced through on-the-fly inter-instruction dependent data predictors, reusing the processor's internal states to learn the data dependency. To meet the challenge of both hardware-resource limitation and design functional correctness, State-BSD consists of two components: 1) a lightweight state-selector trained by simulated annealing method to detect the most reusable processor states and store them in a small buffer; and 2) a highly precise state-speculator trained by BSD expansion method to predict the inter-instruction dependent data using the selected states. It is the first work to achieve the automated superscalar processor design, i. e. QiMeng-CPU-v2, which improves the performance by about 380x than the state-of-the-art automated design and is comparable to human-designed superscalar processors such as ARM Cortex A53.
NeurIPS Conference 2025 Conference Paper
Speculative decoding is an effective and lossless method for Large Language Model (LLM) inference acceleration. It employs a smaller model to generate a draft token sequence, which is then verified by the original base model. In multi-GPU systems, inference latency can be further reduced through tensor parallelism (TP), while the optimal TP size of the draft model is typically smaller than that of the base model, leading to GPU idling during the drafting stage. We observe that such inefficiency stems from the sequential execution of layers, which is seemingly natural but actually unnecessary. Therefore, we propose EasySpec, a layer-parallel speculation strategy that optimizes the efficiency of multi-GPU utilization. EasySpec breaks the inter-layer data dependencies in the draft model, enabling multiple layers to run simultaneously across multiple devices as ``fuzzy'' speculation. After each drafting-and-verification iteration, the draft model’s key-value cache is calibrated in a single forward pass, preventing long-term fuzzy-error accumulation at minimal additional latency. EasySpec is a training-free and plug-in method. We evaluated EasySpec on several mainstream open-source LLMs, using smaller versions of models from the same series as drafters. The results demonstrate that EasySpec can achieve a peak speedup of 4. 17x compared to vanilla decoding, while preserving the original distributions of the base LLMs. Specifically, the drafting stage can be accelerated by up to 1. 62x with a maximum speculation accuracy drop of only 7\%. The code is available at https: //github. com/Yize-Wu/EasySpec.
ICML Conference 2025 Conference Paper
Node equivalence is common in graphs, such as computing networks, encompassing automorphic equivalence (preserving adjacency under node permutations) and attribute equivalence (nodes with identical attributes). Despite their importance for learning node representations, these equivalences are largely ignored by existing graph models. To bridge this gap, we propose a GrAph self-supervised Learning framework with Equivalence (GALE) and analyze its connections to existing techniques. Specifically, we: 1) unify automorphic and attribute equivalence into a single equivalence class; 2) enforce the equivalence principle to make representations within the same class more similar while separating those across classes; 3) introduce approximate equivalence classes with linear time complexity to address the NP-hardness of exact automorphism detection and handle node-feature variation; 4) analyze existing graph encoders, noting limitations in message passing neural networks and graph transformers regarding equivalence constraints; 5) show that graph contrastive learning are a degenerate form of equivalence constraint; and 6) demonstrate that GALE achieves superior performance over baselines.
EAAI Journal 2025 Journal Article
JBHI Journal 2025 Journal Article
Objective: To dissect the neural mechanisms underlying visual working memory (VWM) processing of real-world stimuli (Body, Face, Place, Tool). Methods: This study leveraged task-fMRI data from the Human Connectome Project (HCP) n-back paradigm. The Neurofunctional Integration and Specificity Analysis (NISA) framework was proposed to synergistically combine Representational Similarity Analysis (RSA) and multivoxel machine learning classification and regression, enabling distinct characterization of visual perception and VWM processes. Functional connectivity (FC) patterns of NISA-selected regions of interest were further integrated with transcriptomic data to probe molecular substrates. Results: Bilateral fusiform gyrus (FFG) voxel patterns showed maximal stimulus representation fidelity ( r = -0. 43 to -0. 42, q r = 0. 42 to, 0. 56, q ⁻7 ). Transcriptomic decoding revealed associations between bilateral FFG FC profiles and genes implicated in mental and psychiatric disorders ( q < 0. 05). Conclusion: The FFG operates as a dual-process hub, concurrently mediating visual perceptual categorization and working memory maintenance. Its synergistic excitation-inhibition in FFG may optimize the behavior performance through dynamic resource allocation, while FC-transcriptome coupling further revealed gene networks implicated in cognitive vulnerability across VWM categories.
AAAI Conference 2025 Conference Paper
Multi-view 3D human pose estimation (MHPE) is an important research task in computer vision. To maintain consistency during the data collection, hardware synchronization devices are commonly used to connect cameras, ensuring that images from different views are captured simultaneously. However, synchronizing with extra devices has two apparent limitations: the hardware is i) usually expensive and ii) less flexible for deployment in outdoor open scenarios. Suppose the model can improve its tolerance for the time differences in multi-view image capture. In that case, the difficulty and cost of deployment will be greatly reduced, and MHPE will become more widespread. In this paper, we try to answer how to build a model that performs pose estimation directly using ''weakly synchronized images" from multiple views, where the captured images shift from each other within a frame. To this end, we introduce a new multi-view 3D human pose estimation task given weakly synchronized image inputs. Apart from existing well-synchronized datasets, we present the first weakly synchronized dataset comprising 800k images. Thereon, we propose SyncDiffPose, a novel model based on the diffusion method for pose estimation to denoise the error in such data. By combining simple synchronization strategies, e.g., the timer method, our approach can perform pose estimation without hardware calibration.
ICML Conference 2025 Conference Paper
Graphs, fundamental in modeling various research subjects such as computing networks, consist of nodes linked by edges. However, they typically function as components within larger structures in real-world scenarios, such as in protein-protein interactions where each protein is a graph in a larger network. This study delves into the Graph-of-Net (GON), a structure that extends the concept of traditional graphs by representing each node as a graph itself. It provides a multi-level perspective on the relationships between objects, encapsulating both the detailed structure of individual nodes and the broader network of dependencies. To learn node representations within the GON, we propose a position-aware neural network for Graph-of-Net which processes both intra-graph and inter-graph connections and incorporates additional data like node labels. Our model employs dual encoders and graph constructors to build and refine a constraint network, where nodes are adaptively arranged based on their positions, as determined by the network’s constraint system. Our model demonstrates significant improvements over baselines in empirical evaluations on various datasets.
YNIMG Journal 2025 Journal Article
AAAI Conference 2025 Conference Paper
As a crucial operator in numerous scientific and engineering computing applications, the automatic optimization of General Matrix Multiplication (GEMM) with full utilization of ever-evolving hardware architectures (e.g. GPUs and RISC-V) is of paramount importance. While Large Language Models (LLMs) can generate functionally correct code for simple tasks, they have yet to produce high-performance code. The key challenge resides in deeply understanding diverse hardware architectures and crafting prompts that effectively unleash the potential of LLMs to generate high-performance code. In this paper, we propose a novel prompt mechanism called QiMeng-GEMM which enables LLMs to comprehend the architectural characteristics of different hardware platforms and automatically search for the optimization combinations for GEMM. The key of QiMeng-GEMM is a set of informative, adaptive, and iterative meta-prompts. Based on this, a searching strategy for optimal combinations of meta-prompts is used to iteratively generate high-performance code. Extensive experiments conducted on 4 leading LLMs, various paradigmatic hardware platforms, and representative matrix dimensions unequivocally demonstrate QiMeng-GEMM’s superior performance in auto-generating optimized GEMM code. Compared to vanilla prompts, our method achieves a performance enhancement of up to 113×. Even when compared to human experts, our method can reach 115% of cuBLAS on NVIDIA GPUs and 211% of OpenBLAS on RISC-V CPUs. Notably, while human experts often take months to optimize GEMM, our approach reduces the development cost by over 240×.
NeurIPS Conference 2025 Conference Paper
The rise of GPU-based high-performance computing (HPC) has driven the widespread adoption of parallel programming models such as CUDA. Yet, the inherent complexity of parallel programming creates a demand for the automated sequential-to-parallel approaches. However, data scarcity poses a significant challenge for machine learning-based sequential-to-parallel code translation. Although recent back-translation methods show promise, they still fail to ensure functional equivalence in the translated code. In this paper, we propose \textbf{QiMeng-MuPa}, a novel \textbf{Mu}tual-Supervised Learning framework for Sequential-to-\textbf{Pa}rallel code translation, to address the functional equivalence issue. QiMeng-MuPa consists of two models, a Translator and a Tester. Through an iterative loop consisting of Co-verify and Co-evolve steps, the Translator and the Tester mutually generate data for each other and improve collectively. The Tester generates unit tests to verify and filter functionally equivalent translated code, thereby evolving the Translator, while the Translator generates translated code as augmented input to evolve the Tester. Experimental results demonstrate that QiMeng-MuPa significantly enhances the performance of the base models: when applied to Qwen2. 5-Coder, it not only improves Pass@1 by up to 28. 91\% and boosts Tester performance by 68. 90\%, but also outperforms the previous state-of-the-art method CodeRosetta by 1. 56 and 6. 92 in BLEU and CodeBLEU scores, while achieving performance comparable to DeepSeek-R1 and GPT-4. 1. Our code is available at \url{https: //github. com/kcxain/mupa}.
IJCAI Conference 2025 Conference Paper
Computation-intensive tensor operators constitute over 90% of the computations in Large Language Models (LLMs) and Deep Neural Networks. Automatically and efficiently generating high-performance tensor operators with hardware primitives is crucial for diverse and ever-evolving hardware architectures like RISC-V, ARM, and GPUs, as manually optimized implementation takes at least months and lacks portability. LLMs excel at generating high-level language codes, but they struggle to fully comprehend hardware characteristics and produce high-performance tensor operators. We introduce a tensor-operator auto-generation framework with a one-line user prompt (QiMeng-TensorOp), which enables LLMs to automatically exploit hardware characteristics to generate tensor operators with hardware primitives, and tune parameters for optimal performance across diverse hardware. Experimental results on various hardware platforms, SOTA LLMs, and typical tensor operators demonstrate that QiMeng-TensorOp effectively unleashes the computing capability of various hardware platforms, and automatically generates tensor operators of superior performance. Compared with vanilla LLMs, QiMeng-TensorOp achieves up to 1291× performance improvement. Even compared with human experts, QiMeng-TensorOp could reach 251% of OpenBLAS on RISC-V CPUs, and 124% of cuBLAS on NVIDIA GPUs. Additionally, QiMeng-TensorOp also significantly reduces development costs by 200× compared with human experts.
NeurIPS Conference 2025 Conference Paper
Previous methods for image geo-localization have typically treated the task as either classification or retrieval, often relying on black-box decisions that lack interpretability. The rise of large vision-language models (LVLMs) has enabled a rethinking of geo-localization as a reasoning-driven task grounded in visual cues. However, two major challenges persist. On the data side, existing reasoning-focused datasets are primarily based on street-view imagery, offering limited scene diversity and constrained viewpoints. On the modeling side, current approaches predominantly rely on supervised fine-tuning, which yields only marginal improvements in reasoning capabilities. To address these challenges, we propose a novel pipeline that constructs a reasoning-oriented geo-localization dataset, $\textit{MP16-Reason}$, using diverse social media images. We introduce $\textit{GLOBE}$, $\textbf{G}$roup-relative policy optimization for $\textbf{L}$ocalizability assessment and $\textbf{O}$ptimized visual-cue reasoning, yielding $\textbf{B}$i-objective geo-$\textbf{E}$nhancement for the VLM in recognition and reasoning. $\textit{GLOBE}$ incorporates task-specific rewards that jointly enhance localizability assessment, visual-cue reasoning, and geolocation accuracy. Both qualitative and quantitative results demonstrate that $\textit{GLOBE}$ outperforms state-of-the-art open-source LVLMs on geo-localization tasks, particularly in diverse visual scenes, while also generating more insightful and interpretable reasoning trajectories. The data and code are available at https: //github. com/lingli1996/GLOBE.
YNIMG Journal 2025 Journal Article
ICRA Conference 2024 Conference Paper
Laparoscope-holding robots significantly enhance the stability and precision of visualization in minimally invasive surgeries. Most existing robots of this kind depend on visual servo systems and struggle with efficient, rapid adjustments in the field-of-view (FOV), especially when identifying organs and needles outside the FOV. This paper presents a laparoscope-holding robot system capable of employing both vision-driven and force-driven mechanisms for continuous and large-scale FOV adjustments, respectively. The system features an integrated tactile handle, enabling the reception of human-robot interaction forces during surgical navigation. We propose a hybrid control method that leverages both force and vision inputs for laparoscopic FOV adjustments. This approach integrates a virtual wrench, generated from visual information, and an interaction wrench, obtained from the tactile handle, into the robot's dynamic model, which complies with remote center of motion constraints. The interaction wrench's gain is adjusted with the gripping force on the integrated tactile handle, ensuring that unintended movements caused by accidental contacts are prevented, thus safeguarding operational safety. The proposed method eliminates the need to switch control modes, enabling simultaneous visual tracking and tactile interaction guidance. Experimental results demonstrate that the proposed method not only allows for FOV adjustments with surgical instrument guiding but also adapts well to large-scale FOV adjustment tasks.
NeurIPS Conference 2024 Conference Paper
Domain adaptive object detection (DAOD) aims to generalize detectors trained on an annotated source domain to an unlabelled target domain. As the visual-language models (VLMs) can provide essential general knowledge on unseen images, freezing the visual encoder and inserting a domain-agnostic adapter can learn domain-invariant knowledge for DAOD. However, the domain-agnostic adapter is inevitably biased to the source domain. It discards some beneficial knowledge discriminative on the unlabelled domain, \ie domain-specific knowledge of the target domain. To solve the issue, we propose a novel Domain-Aware Adapter (DA-Ada) tailored for the DAOD task. The key point is exploiting domain-specific knowledge between the essential general knowledge and domain-invariant knowledge. DA-Ada consists of the Domain-Invariant Adapter (DIA) for learning domain-invariant knowledge and the Domain-Specific Adapter (DSA) for injecting the domain-specific knowledge from the information discarded by the visual encoder. Comprehensive experiments over multiple DAOD tasks show that DA-Ada can efficiently infer a domain-aware visual encoder for boosting domain adaptive object detection. Our code is available at https: //github. com/Therock90421/DA-Ada.
IROS Conference 2024 Conference Paper
Endoscopic Submucosal Dissection (ESD) is a minimally invasive procedure designed to remove precancerous and cancerous lesions from the gastrointestinal (GI) tract. Given the GI tract’s tortuous and narrow shape, along with the need for varied movements during dissection, this requires highly flexible and compact instruments, making flexible continuum robots suitable candidates. In this paper, we propose a novel two-segment continuum robot system named DESectBot, featuring a diameter of 5. 5 mm and a total length of the active bending module of 48 mm, while the robot’s total length exceeds 1 m. We designed a novel joint combination structure called the spatial cross-curved disk skeleton for the robot, which addresses the mechanical coupling problem between flexible robot actuators. The DESectBot boasts six degrees of freedom, and its kinematic modeling has been derived and utilized in the closed-loop control of the DESectBot. The validation of the DESectBot was conducted through a two-stage test: first, the decoupling performance of the DESectBot was validated. The results show that when one active bending segment bends, the other segment remains almost uninfluenced, with a maximum variation of 1. 15 degrees, demonstrating the robot’s effective decoupling capability. Secondly, the accuracy of DESectBot was validated through trajectory-following experiments. The results reveal that the average tracking error for both trajectories is less than 2 mm, and the maximum tracking error is below 2. 5 mm. Taking marking, one of the ESD procedures with a 5mm tolerance, as an example, the DESectBot has the potential to be utilized for ESD procedure.
ICML Conference 2024 Conference Paper
This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) augmented with human inference knowledge. A primary challenge here is the scarcity of data for training the LVLM - existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise a CLIP-based network to quantify the degree of street-view images being locatable, leading to the creation of a new dataset comprising highly locatable street views. To enhance reasoning inference, we integrate external knowledge obtained from real geo-localization games, tapping into valuable human inference capabilities. The data are utilized to train GeoReasoner, which undergoes fine-tuning through dedicated reasoning and location-tuning stages. Qualitative and quantitative evaluations illustrate that GeoReasoner outperforms counterpart LVLMs by more than 25% at country-level and 38% at city-level geo-localization tasks, and surpasses StreetCLIP performance while requiring fewer training resources. The data and code are available at https: //github. com/lingli1996/GeoReasoner.
AAAI Conference 2024 Conference Paper
Large language models (LLMs) show their powerful automatic reasoning and planning capability with a wealth of semantic knowledge about the human world. However, the grounding problem still hinders the applications of LLMs in the real-world environment. Existing studies try to fine-tune the LLM or utilize pre-defined behavior APIs to bridge the LLMs and the environment, which not only costs huge human efforts to customize for every single task but also weakens the generality strengths of LLMs. To autonomously ground the LLM onto the environment, we proposed the Hypothesis, Verification, and Induction (HYVIN) framework to automatically and progressively ground the LLM with self-driven skill learning. HYVIN first employs the LLM to propose the hypothesis of sub-goals to achieve tasks and then verify the feasibility of the hypothesis via interacting with the underlying environment. Once verified, HYVIN can then learn generalized skills with the guidance of these successfully grounded subgoals. These skills can be further utilized to accomplish more complex tasks that fail to pass the verification phase. Verified in the famous instruction following task set, BabyAI, HYVIN achieves comparable performance in the most challenging tasks compared with imitation learning methods that cost millions of demonstrations, proving the effectiveness of learned skills and showing the feasibility and efficiency of our framework.
YNIMG Journal 2024 Journal Article
AAAI Conference 2024 Conference Paper
Model-based offline reinforcement learning (RL) algorithms have emerged as a promising paradigm for offline RL. These algorithms usually learn a dynamics model from a static dataset of transitions, use the model to generate synthetic trajectories, and perform conservative policy optimization within these trajectories. However, our observations indicate that policy optimization methods used in these model-based offline RL algorithms are not effective at exploring the learned model and induce biased exploration, which ultimately impairs the performance of the algorithm. To address this issue, we propose Offline Conservative ExplorAtioN (OCEAN), a novel rollout approach to model-based offline RL. In our method, we incorporate additional exploration techniques and introduce three conservative constraints based on uncertainty estimation to mitigate the potential impact of significant dynamic errors resulting from exploratory transitions. Our work is a plug-in method and can be combined with classical model-based RL algorithms, such as MOPO, COMBO, and RAMBO. Experiment results of our method on the D4RL MuJoCo benchmark show that OCEAN significantly improves the performance of existing algorithms.
YNIMG Journal 2024 Journal Article
IJCAI Conference 2023 Conference Paper
Evaluating the performance of low-light image enhancement (LLE) is highly subjective, thus making integrating human preferences into image enhancement a necessity. Existing methods fail to consider this and present a series of potentially valid heuristic criteria for training enhancement models. In this paper, we propose a new paradigm, i. e. , aesthetics-guided low-light image enhancement (ALL-E), which introduces aesthetic preferences to LLE and motivates training in a reinforcement learning framework with an aesthetic reward. Each pixel, functioning as an agent, refines itself by recursive actions, i. e. , its corresponding adjustment curve is estimated sequentially. Extensive experiments show that integrating aesthetic assessment improves both subjective experience and objective evaluation. Our results on various benchmarks demonstrate the superiority of ALL-E over state-of-the-art methods. Source code: https: //dongl-group. github. io/project pages/ALLE. html
AAAI Conference 2023 Conference Paper
Despite the broad application of deep reinforcement learning (RL), transferring and adapting the policy to unseen but similar environments is still a significant challenge. Recently, the language-conditioned policy is proposed to facilitate policy transfer through learning the joint representation of observation and text that catches the compact and invariant information across various environments. Existing studies of language-conditioned RL methods often learn the joint representation as a simple latent layer for the given instances (episode-specific observation and text), which inevitably includes noisy or irrelevant information and cause spurious correlations that are dependent on instances, thus hurting generalization performance and training efficiency. To address the above issue, we propose a conceptual reinforcement learning (CRL) framework to learn the concept-like joint representation for language-conditioned policy. The key insight is that concepts are compact and invariant representations in human cognition through extracting similarities from numerous instances in real-world. In CRL, we propose a multi-level attention encoder and two mutual information constraints for learning compact and invariant concepts. Verified in two challenging environments, RTFM and Messenger, CRL significantly improves the training efficiency (up to 70%) and generalization ability (up to 30%) to the new environment dynamics.
NeurIPS Conference 2023 Conference Paper
Offline meta-reinforcement learning (OMRL) utilizes pre-collected offline datasets to enhance the agent's generalization ability on unseen tasks. However, the context shift problem arises due to the distribution discrepancy between the contexts used for training (from the behavior policy) and testing (from the exploration policy). The context shift problem leads to incorrect task inference and further deteriorates the generalization ability of the meta-policy. Existing OMRL methods either overlook this problem or attempt to mitigate it with additional information. In this paper, we propose a novel approach called Context Shift Reduction for OMRL (CSRO) to address the context shift problem with only offline datasets. The key insight of CSRO is to minimize the influence of policy in context during both the meta-training and meta-test phases. During meta-training, we design a max-min mutual information representation learning mechanism to diminish the impact of the behavior policy on task representation. In the meta-test phase, we introduce the non-prior context collection strategy to reduce the effect of the exploration policy. Experimental results demonstrate that CSRO significantly reduces the context shift and improves the generalization ability, surpassing previous methods across various challenging domains.
NeurIPS Conference 2023 Conference Paper
In the field of multi-task reinforcement learning, the modular principle, which involves specializing functionalities into different modules and combining them appropriately, has been widely adopted as a promising approach to prevent the negative transfer problem that performance degradation due to conflicts between tasks. However, most of the existing multi-task RL methods only combine shared modules at the task level, ignoring that there may be conflicts within the task. In addition, these methods do not take into account that without constraints, some modules may learn similar functions, resulting in restricting the model's expressiveness and generalization capability of modular methods. In this paper, we propose the Contrastive Modules with Temporal Attention(CMTA) method to address these limitations. CMTA constrains the modules to be different from each other by contrastive learning and combining shared modules at a finer granularity than the task level with temporal attention, alleviating the negative transfer within the task and improving the generalization ability and the performance for multi-task RL. We conducted the experiment on Meta-World, a multi-task RL benchmark containing various robotics manipulation tasks. Experimental results show that CMTA outperforms learning each task individually for the first time and achieves substantial performance improvements over the baselines.
NeurIPS Conference 2023 Conference Paper
In recent years, Multi-Agent Reinforcement Learning (MARL) techniques have made significant strides in achieving high asymptotic performance in single task. However, there has been limited exploration of model transferability across tasks. Training a model from scratch for each task can be time-consuming and expensive, especially for large-scale Multi-Agent Systems. Therefore, it is crucial to develop methods for generalizing the model across tasks. Considering that there exist task-independent subtasks across MARL tasks, a model that can decompose such subtasks from the source task could generalize to target tasks. However, ensuring true task-independence of subtasks poses a challenge. In this paper, we propose to \textbf{d}ecompose a \textbf{t}ask in\textbf{to} a series of \textbf{g}eneralizable \textbf{s}ubtasks (DT2GS), a novel framework that addresses this challenge by utilizing a scalable subtask encoder and an adaptive subtask semantic module. We show that these components endow subtasks with two properties critical for task-independence: avoiding overfitting to the source task and maintaining consistent yet scalable semantics across tasks. Empirical results demonstrate that DT2GS possesses sound zero-shot generalization capability across tasks, exhibits sufficient transferability, and outperforms existing methods in both multi-task and single-task problems.
NeurIPS Conference 2023 Conference Paper
Deep reinforcement learning (DRL) has led to a wide range of advances in sequential decision-making tasks. However, the complexity of neural network policies makes it difficult to understand and deploy with limited computational resources. Currently, employing compact symbolic expressions as symbolic policies is a promising strategy to obtain simple and interpretable policies. Previous symbolic policy methods usually involve complex training processes and pre-trained neural network policies, which are inefficient and limit the application of symbolic policies. In this paper, we propose an efficient gradient-based learning method named Efficient Symbolic Policy Learning (ESPL) that learns the symbolic policy from scratch in an end-to-end way. We introduce a symbolic network as the search space and employ a path selector to find the compact symbolic policy. By doing so we represent the policy with a differentiable symbolic expression and train it in an off-policy manner which further improves the efficiency. In addition, in contrast with previous symbolic policies which only work in single-task RL because of complexity, we expand ESPL on meta-RL to generate symbolic policies for unseen tasks. Experimentally, we show that our approach generates symbolic policies with higher performance and greatly improves data efficiency for single-task RL. In meta-RL, we demonstrate that compared with neural network policies the proposed symbolic policy achieves higher performance and efficiency and shows the potential to be interpretable.
NeurIPS Conference 2023 Conference Paper
Self-supervised learning on graph aims to learn graph representations in an unsupervised manner. While graph contrastive learning (GCL - relying on graph augmentation for creating perturbation views of anchor graphs and maximizing/minimizing similarity for positive/negative pairs) is a popular self-supervised method, it faces challenges in finding label-invariant augmented graphs and determining the exact extent of similarity between sample pairs to be achieved. In this work, we propose an alternative self-supervised solution that (i) goes beyond the label invariance assumption without distinguishing between positive/negative samples, (ii) can calibrate the encoder for preserving not only the structural information inside the graph, but the matching information between different graphs, (iii) learns isometric embeddings that preserve the distance between graphs, a by-product of our objective. Motivated by optimal transport theory, this scheme relays on an observation that the optimal transport plans between node representations at the output space, which measure the matching probability between two distributions, should be consistent to the plans between the corresponding graphs at the input space. The experimental findings include: (i) The plan alignment strategy significantly outperforms the counterpart using the transport distance; (ii) The proposed model shows superior performance using only node attributes as calibration signals, without relying on edge information; (iii) Our model maintains robust results even under high perturbation rates; (iv) Extensive experiments on various benchmarks validate the effectiveness of the proposed method.
NeurIPS Conference 2023 Conference Paper
Domain adaptive object detection (DAOD) aims to generalize detectors trained on an annotated source domain to an unlabelled target domain. However, existing methods focus on reducing the domain bias of the detection backbone by inferring a discriminative visual encoder, while ignoring the domain bias in the detection head. Inspired by the high generalization of vision-language models (VLMs), applying a VLM as the robust detection backbone following a domain-aware detection head is a reasonable way to learn the discriminative detector for each domain, rather than reducing the domain bias in traditional methods. To achieve the above issue, we thus propose a novel DAOD framework named Domain-Aware detection head with Prompt tuning (DA-Pro), which applies the learnable domain-adaptive prompt to generate the dynamic detection head for each domain. Formally, the domain-adaptive prompt consists of the domain-invariant tokens, domain-specific tokens, and the domain-related textual description along with the class label. Furthermore, two constraints between the source and target domains are applied to ensure that the domain-adaptive prompt can capture the domains-shared and domain-specific knowledge. A prompt ensemble strategy is also proposed to reduce the effect of prompt disturbance. Comprehensive experiments over multiple cross-domain adaptation tasks demonstrate that using the domain-adaptive prompt can produce an effectively domain-related detection head for boosting domain-adaptive object detection. Our code is available at https: //github. com/Therock90421/DA-Pro.
JBHI Journal 2023 Journal Article
Temporal attention is the concentration of perceptual resources at a specific point in time, which can help individuals get prepared to improve their behavioral performance, whereas the neural mechanism of temporal attention is yet to be well understood. In this study, behavioral measurement, transcranial direct current stimulation (tDCS), and electroencephalography (EEG) were combined to explore the effects of task performance and whole-brain functional connectivities (FCs) during temporal attention with different time intervals after applying anodal and sham tDCS over the right posterior parietal cortex (PPC). Although anodal tDCS, compared with sham tDCS, did not induce a significant effect on the task performance of temporal attention, it could effectively increase long-range FCs of gamma rhythms between the right frontal and parieto-occipital regions during temporal attention, and most of the increased FCs were in the right hemisphere with certain hemispheric laterality. Meanwhile, there were intensively more increased long-range FCs at short-time intervals than those at long-time intervals, and the increased FCs at neutral long-time intervals were the least and mainly inter-hemispheric FCs. The current study not only further enriched the evidence on the key role of the right PPC during temporal attention but also proved that anodal tDCS could indeed enhance whole-brain functional connectivity architecture involving intra- and inter-hemispheric long-range FCs, which would provide ideas and references for subsequent studies of temporal attention as well as attention deficit disorder.
EAAI Journal 2023 Journal Article
NeurIPS Conference 2022 Conference Paper
Hierarchical reinforcement learning (HRL) has been proven to be effective for tasks with sparse rewards, for it can improve the agent's exploration efficiency by discovering high-quality hierarchical structures (e. g. , subgoals or options). However, automatically discovering high-quality hierarchical structures is still a great challenge. Previous HRL methods can only find the hierarchical structures in simple environments, as they are mainly achieved through the randomness of agent's policies during exploration. In complicated environments, such a randomness-driven exploration paradigm can hardly discover high-quality hierarchical structures because of the low exploration efficiency. In this paper, we propose CDHRL, a causality-driven hierarchical reinforcement learning framework, to build high-quality hierarchical structures efficiently in complicated environments. The key insight is that the causalities among environment variables are naturally fit for modeling reachable subgoals and their dependencies; thus, the causality is suitable to be the guidance in building high-quality hierarchical structures. Roughly, we build the hierarchy of subgoals based on causality autonomously, and utilize the subgoal-based policies to unfold further causality efficiently. Therefore, CDHRL leverages a causality-driven discovery instead of a randomness-driven exploration for high-quality hierarchical structure construction. The results in two complex environments, 2D-Minecraft and Eden, show that CDHRL can discover high-quality hierarchical structures and significantly enhance exploration efficiency.
AAAI Conference 2022 Conference Paper
Spatio-temporal forecasting is challenging attributing to the high nonlinearity in temporal dynamics as well as complex location-characterized patterns in spatial domains, especially in fields like weather forecasting. Graph convolutions are usually used for modeling the spatial dependency in meteorology to handle the irregular distribution of sensors' spatial location. In this work, a novel graph-based convolution for imitating the meteorological flows is proposed to capture the local spatial patterns. Based on the assumption of smoothness of location-characterized patterns, we propose conditional local convolution whose shared kernel on nodes' local space is approximated by feedforward networks, with local representations of coordinate obtained by horizon maps into cylindrical-tangent space as its input. The established united standard of local coordinate system preserves the orientation on geography. We further propose the distance and orientation scaling terms to reduce the impacts of irregular spatial distribution. The convolution is embedded in a Recurrent Neural Network architecture to model the temporal dynamics, leading to the Conditional Local Convolution Recurrent Network (CLCRN). Our model is evaluated on real-world weather benchmark datasets, achieving state-of-the-art performance with obvious improvements. We conduct further analysis on local pattern visualization, model's framework choice, advantages of horizon maps and etc. The source code is available at https://github.com/BIRD-TAO/CLCRN.
AAAI Conference 2022 Conference Paper
Low-light image enhancement (LLE) remains challenging due to the unfavorable prevailing low-contrast and weakvisibility problems of single RGB images. In this paper, we respond to the intriguing learning-related question – if leveraging both accessible unpaired over/underexposed images and high-level semantic guidance, can improve the performance of cutting-edge LLE models? Here, we propose an effective semantically contrastive learning paradigm for LLE (namely SCL-LLE). Beyond the existing LLE wisdom, it casts the image enhancement task as multi-task joint learning, where LLE is converted into three constraints of contrastive learning, semantic brightness consistency, and feature preservation for simultaneously ensuring the exposure, texture, and color consistency. SCL-LLE allows the LLE model to learn from unpaired positives (normal-light)/negatives (over/underexposed), and enables it to interact with the scene semantics to regularize the image enhancement network, yet the interaction of high-level semantic knowledge and the lowlevel signal prior is seldom investigated in previous methods. Training on readily available open data, extensive experiments demonstrate that our method surpasses the state-of-thearts LLE models over six independent cross-scenes datasets. Moreover, SCL-LLE’s potential to benefit the downstream semantic segmentation under extremely dark conditions is discussed. Source Code: https: //github. com/LingLIx/SCL-LLE.
YNIMG Journal 2022 Journal Article
YNICL Journal 2021 Journal Article
YNICL Journal 2021 Journal Article
YNIMG Journal 2021 Journal Article
YNIMG Journal 2020 Journal Article
YNICL Journal 2020 Journal Article
YNIMG Journal 2017 Journal Article
TIST Journal 2013 Journal Article
Ever-increasing design complexity and advances of technology impose great challenges on the design of modern microprocessors. One such challenge is to determine promising microprocessor configurations to meet specific design constraints, which is called Design Space Exploration (DSE). In the computer architecture community, supervised learning techniques have been applied to DSE to build regression models for predicting the qualities of design configurations. For supervised learning, however, considerable simulation costs are required for attaining the labeled design configurations. Given limited resources, it is difficult to achieve high accuracy. In this article, inspired by recent advances in semisupervised learning and active learning, we propose the COAL approach which can exploit unlabeled design configurations to significantly improve the models. Empirical study demonstrates that COAL significantly outperforms a state-of-the-art DSE technique by reducing mean squared error by 35% to 95%, and thus, promising architectures can be attained more efficiently.
JMLR Journal 2008 Journal Article
Ensemble learning algorithms such as boosting can achieve better performance by averaging over the predictions of some base hypotheses. Nevertheless, most existing algorithms are limited to combining only a finite number of hypotheses, and the generated ensemble is usually sparse. Thus, it is not clear whether we should construct an ensemble classifier with a larger or even an infinite number of hypotheses. In addition, constructing an infinite ensemble itself is a challenging task. In this paper, we formulate an infinite ensemble learning framework based on the support vector machine (SVM). The framework can output an infinite and nonsparse ensemble through embedding infinitely many hypotheses into an SVM kernel. We use the framework to derive two novel kernels, the stump kernel and the perceptron kernel. The stump kernel embodies infinitely many decision stumps, and the perceptron kernel embodies infinitely many perceptrons. We also show that the Laplacian radial basis function kernel embodies infinitely many decision trees, and can thus be explained through infinite ensemble learning. Experimental results show that SVM with these kernels is superior to boosting with the same base hypothesis set. In addition, SVM with the stump kernel or the perceptron kernel performs similarly to SVM with the Gaussian radial basis function kernel, but enjoys the benefit of faster parameter selection. These properties make the novel kernels favorable choices in practice. [abs] [ pdf ][ bib ] © JMLR 2008. ( edit, beta )
NeurIPS Conference 2006 Conference Paper
We present a reduction framework from ordinal regression to binary classification based on extended examples. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a ranking rule from the binary classifier. A weighted 0/1 loss of the binary classifier would then bound the mislabeling cost of the ranking rule. Our framework allows not only to design good ordinal regression algorithms based on well-tuned binary classification approaches, but also to derive new generalization bounds for ordinal regression from known bounds for binary classification. In addition, our framework unifies many existing ordinal regression algorithms, such as perceptron ranking and support vector ordinal regression. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages in terms of both training speed and generalization performance over existing algorithms, which demonstrates the usefulness of our framework.