Author name cluster

Lei Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

2 author rows

AAAI Conference 2026 Conference Paper

SCALAR: Scale-wise Controllable Visual Autoregressive Learning

Ryan Xu
Dongyang Jin
Yancheng Bai
Rui Lan
Xu Duan
Lei Sun
Xiangxiang Chu

Controllable image synthesis, which enables fine-grained control over generated outputs, has emerged as a key focus in visual generative modeling. However, controllable generation remains challenging for Visual Autoregressive (VAR) models due to their hierarchical, next-scale prediction style. Existing VAR-based methods often suffer from inefficient control encoding and disruptive injection mechanisms that compromise both fidelity and efficiency. In this work, we present SCALAR, a controllable generation method based on VAR, incorporating a Scale-wise Conditional Decoding mechanism. SCALAR leverages a pretrained image encoder to extract semantic control signal encodings, which are projected into scale-specific representations and injected into the corresponding layers of the VAR backbone. This design provides persistent and structurally aligned guidance throughout the generation process. Building on SCALAR, we develop SCALAR-Uni, a unified extension that aligns multiple control modalities into a shared latent space, supporting flexible multi-conditional guidance in a single model. Extensive experiments show that SCALAR achieves superior generation quality and control precision across various tasks.

PDF Details DOI

IROS Conference 2025 Conference Paper

Learning-based Keypoints Detection with Topological Order on Deformable Linear Objects from Incomplete Point Clouds

Can Li
Jingyang Liu
Lei Sun

Detection of deformable linear objects (DLOs) in three-dimensional space is essential for robotic manipulation of DLOs. However, their complex deformations and high degrees of freedom make perception highly susceptible to occlusions, noise, and data missing. To address these challenges, we propose a deep learning-based method that leverages the topological properties of DLOs to robustly detect keypoints from incomplete point clouds while preserving the topological order of keypoints. Our approach initializes a sequence of keypoints that adheres to the topological structure of DLOs. Then, these ordered keypoints are refined through bidirectional sequence learning. Simulation results demonstrate that our method generates accurate, uniform, and smooth keypoint sequences under varying levels of occlusion. Compared to existing baselines, our approach achieves superior performance. Real-world experiments further validate the generalization capability of our method in unseen and challenging scenarios involving occlusion and self-occlusion while maintaining real-time performance.

Details

ICRA Conference 2024 Conference Paper

Bio-Inspired Pupal-Mode Actuator with Ultra-Crossing Capability for Soft Robots

Zhenxing Wang
Xiao He
Yuhang Zhang
Cheng Zhang
Lei Sun
Zhidong Wang
Shun Xu
Hao Liu

Robot-assisted Natural Orifice Translu-minal Endoscopic Surgery (NOTES) represents a paradigm shift in surgical practice, significantly mini-mizing patient morbidity. However, the variability of inner diameter and the inter-luminal crossing within the luminal tracts lead to challenge for effective robotic intervention. Inspired by the motion of the chrysalis during its transformation, we designed an innovative pupal-mode actuator for NOTES robots. Through the manipulation of its internal air chambers, this actuator is capable of replicating wriggle-like movements. Through experimental analysis, we have acquired the constitutive characteristics of this actuator. Subsequently, an innovative gastric endoscopy robot is developed base the actuator and tested in a phantom. The results of the task simulations substantiate that the pupal-mode actuator has the capability to reduce resistance and enhance the safety of the endoscopic intervention.

Details

TCS Journal 2024 Journal Article

Constructions of rotation symmetric Boolean functions satisfying almost all cryptographic criteria

Lei Sun
Zexia Shi
Jian Liu
Fang-Wei Fu

Constructions of Boolean functions with various cryptographic properties have always been an important challenge in cryptography. This paper proposes systematic constructions of even-variable rotation symmetric Boolean functions satisfying almost all cryptographic criteria, that is, resiliency, optimal algebraic degree, strict avalanche criterion, high nonlinearity, nonexistence of nonzero linear structures, good global avalanche characteristics. Moreover, some of the constructions also have high algebraic immunity. This is the first time that Boolean functions having such cryptographic properties are obtained, which can be considered as good candidates for the design of real-life encryption schemes.

Details DOI

TMLR Journal 2024 Journal Article

Meta-Learning under Task Shift

Lei Sun
Yusuke Tanaka
Tomoharu Iwata

A common assumption in meta-learning is that meta-training and meta-test tasks are drawn from the same distribution. However, this assumption is often not fulfilled. Under such task shift, standard meta-learning algorithms do not work as desired since their unbiasedness is no longer maintained. In this paper, we propose a new meta-learning method called Importance Weighted Meta-Learning (IWML), which preserves unbiasedness even under task shift. Our approach uses both labeled meta-training datasets and unlabeled datasets in tasks obtained from the meta-test task distribution to assign weights to each meta-training task. These weights are determined by the ratio of meta-test and meta-training task densities. Our method enables the model to focus more on the meta-training tasks that closely align with meta-test tasks during the meta-training process. We meta-learn neural network-based models by minimizing the expected weighted meta-training error, which is an unbiased estimator of the expected error over meta-test tasks. The task density ratio is estimated using kernel density estimation, where the distance between tasks is measured by the maximum mean discrepancy. Our empirical evaluation of few-shot classification datasets demonstrates a significant improvement of IWML over existing approaches.

PDF Details

IJCAI Conference 2024 Conference Paper

MuEP: A Multimodal Benchmark for Embodied Planning with Foundation Models

Kanxue Li
Baosheng Yu
Qi Zheng
Yibing Zhan
Yuhui Zhang
Tianle Zhang
Yijun Yang
Yue Chen

Foundation models have demonstrated significant emergent abilities, holding great promise for enhancing embodied agents' reasoning and planning capacities. However, the absence of a comprehensive benchmark for evaluating embodied agents with multimodal observations in complex environments remains a notable gap. In this paper, we present MuEP, a comprehensive Multimodal benchmark for Embodied Planning. MuEP facilitates the evaluation of multimodal and multi-turn interactions of embodied agents in complex scenes, incorporating fine-grained evaluation metrics that provide insights into the performance of embodied agents throughout each task. Furthermore, we evaluate embodied agents with recent state-of-the-art foundation models, including large language models (LLMs) and large multimodal models (LMMs), on the proposed benchmark. Experimental results show that foundation models based on textual representations of environments usually outperform their visual counterparts, suggesting a gap in embodied planning abilities with multimodal observations. We also find that control language generation is an indispensable ability beyond common-sense knowledge for accurate embodied task completion. We hope the proposed MuEP benchmark can contribute to the advancement of embodied AI with foundation models.

PDF Details DOI

IROS Conference 2024 Conference Paper

Outlier-Robust Geometric Perception: A Novel Thresholding-Based Estimator with Intra-Class Variance Maximization

Lei Sun

Geometric perception problems are fundamental tasks in robotics and computer vision. In real-world applications, they often encounter the inevitable issue of outliers, preventing traditional algorithms from making correct estimates. In this paper, we present a novel general-purpose robust estimator TIVM (Thresholding with Intra-class Variance Maximization) that can collaborate with standard non-minimal solvers to efficiently reject outliers for geometric perception problems. First, we introduce the technique of intra-class variance maximization to design a dynamic 2-group thresholding method on the measurement residuals, aiming to distinctively separate inliers from outliers. Then, we develop an iterative framework that robustly optimizes the model by approaching the pure-inlier group using a multi-layered dynamic thresholding strategy as subroutine, in which a self-adaptive mechanism for layer-number tuning is further employed to minimize the user-defined parameters. We validate the proposed estimator on 3 classic geometric perception problems: rotation averaging, point cloud registration and category-level perception, and experiments show that it is robust against 7090% of outliers and can converge typically in only 315 iterations, much faster than state-of-the-art robust solvers such as RANSAC, GNC and ADAPT. Furthermore, another highlight is that: our estimator can retain approximately the same level of robustness even when the inlier-noise statistics of the problem are fully unknown.

Details

AAAI Conference 2023 Conference Paper

A Question-Answering Approach to Key Value Pair Extraction from Form-Like Document Images

Kai Hu
Zhuoyuan Wu
Zhuoyao Zhong
Weihong Lin
Lei Sun
Qiang Huo

In this paper, we present a new question-answering (QA) based key-value pair extraction approach, called KVPFormer, to robustly extracting key-value relationships between entities from form-like document images. Specifically, KVPFormer first identifies key entities from all entities in an image with a Transformer encoder, then takes these key entities as questions and feeds them into a Transformer decoder to predict their corresponding answers (i.e., value entities) in parallel. To achieve higher answer prediction accuracy, we propose a coarse-to-fine answer prediction approach further, which first extracts multiple answer candidates for each identified question in the coarse stage and then selects the most likely one among these candidates in the fine stage. In this way, the learning difficulty of answer prediction can be effectively reduced so that the prediction accuracy can be improved. Moreover, we introduce a spatial compatibility attention bias into the self-attention/cross-attention mechanism for KVPFormer to better model the spatial interactions between entities. With these new techniques, our proposed KVPFormer achieves state-of-the-art results on FUNSD and XFUND datasets, outperforming the previous best-performing method by 7.2% and 13.2% in F1 score, respectively.

PDF Details DOI

ICLR Conference 2022 Conference Paper

On the Connection between Local Attention and Dynamic Depth-wise Convolution

Qi Han 0007
Zejia Fan
Qi Dai 0001
Lei Sun
Ming-Ming Cheng
Jiaying Liu 0001
Jingdong Wang 0001

Vision Transformer (ViT) attains state-of-the-art performance in visual recognition, and the variant, Local Vision Transformer, makes further improvements. The major component in Local Vision Transformer, local attention, performs the attention separately over small local windows. We rephrase local attention as a channel-wise locally-connected layer and analyze it from two network regularization manners, sparse connectivity and weight sharing, as well as dynamic weight computation. We point out that local attention resembles depth-wise convolution and its dynamic variants in sparse connectivity: there is no connection across channels, and each position is connected to the positions within a small local window. The main differences lie in (i) weight sharing - depth-wise convolution shares connection weights (kernel weights) across spatial positions and attention shares the connection weights across channels, and (ii) dynamic weight computation manners - local attention is based on dot-products between pairwise positions in the local window, and dynamic convolution is based on linear projections conducted on the center representation or the globally pooled representation. The connection between local attention and dynamic depth-wise convolution is empirically verified by the ablation study about weight sharing and dynamic weight computation in Local Vision Transformer and (dynamic) depth-wise convolution. We empirically observe that the models based on depth-wise convolution and the dynamic variants with lower computation complexity perform on-par with or slightly better than Swin Transformer, an instance of Local Vision Transformer, for ImageNet classification, COCO object detection and ADE semantic segmentation. Code is available at https://github.com/Atten4Vis/DemystifyLocalViT.

Details

AIIM Journal 2021 Journal Article

Estimating sparse functional connectivity networks via hyperparameter-free learning model

Lei Sun
Yanfang Xue
Yining Zhang
Lishan Qiao
Limei Zhang
Mingxia Liu

Functional connectivity networks (FCNs) provide a potential way for understanding the brain organizational patterns and diagnosing neurological diseases. Currently, researchers have proposed many methods for FCN construction, among which the most classic example is Pearson's correlation (PC). Despite its simplicity and popularity, PC always results in dense FCNs, and thus a thresholding strategy is usually needed in practice to sparsify the estimated FCNs prior to the network analysis, which undoubtedly causes the problem of threshold parameter selection. As an alternative to PC, sparse representation (SR) can directly generate sparse FCNs due to the l 1 regularizer in the estimation model. However, similar to the thresholding scheme used in PC, it is also challenging to determine suitable values for the regularization parameter in SR. To circumvent the difficulty of parameter selection involved in these traditional methods, we propose a hyperparameter-free method for FCN construction based on the global representation among fMRI time courses. Interestingly, the proposed method can automatically generate sparse FCNs, without any thresholding or regularization parameters. To verify the effectiveness of the proposed method, we conduct experiments to identify subjects with mild cognitive impairment (MCI) and Autism spectrum disorder (ASD) from normal controls (NCs) based on the estimated FCNs. Experimental results on two benchmark databases demonstrate that the achieved classification performance of our proposed scheme is comparable to four conventional methods.

Details DOI

IJCAI Conference 2020 Conference Paper

P-KDGAN: Progressive Knowledge Distillation with GANs for One-class Novelty Detection

Zhiwei Zhang
Shifeng Chen
Lei Sun

One-class novelty detection is to identify anomalous instances that do not conform to the expected normal instances. In this paper, the Generative Adversarial Networks (GANs) based on encoder-decoder-encoder pipeline are used for detection and achieve state-of-the-art performance. However, deep neural networks are too over-parameterized to deploy on resource-limited devices. Therefore, Progressive Knowledge Distillation with GANs (P-KDGAN) is proposed to learn compact and fast novelty detection networks. The P-KDGAN is a novel attempt to connect two standard GANs by the designed distillation loss for transferring knowledge from the teacher to the student. The progressive learning of knowledge distillation is a two-step approach that continuously improves the performance of the student GAN and achieves better performance than single step methods. In the first step, the student GAN learns the basic knowledge totally from the teacher via guiding of the pre-trained teacher GAN with fixed weights. In the second step, joint fine-training is adopted for the knowledgeable teacher and student GANs to further improve the performance and stability. The experimental results on CIFAR-10, MNIST, and FMNIST show that our method improves the performance of the student GAN by 2. 44%, 1. 77%, and 1. 73% when compressing the computation at ratios of 24. 45: 1, 311. 11: 1, and 700: 1, respectively.

PDF Details DOI

EAAI Journal 2020 Journal Article

Robust nonnegative matrix factorization with local coordinate constraint for image clustering

Siyuan Peng
Wee Ser
Badong Chen
Lei Sun
Zhiping Lin

Nonnegative matrix factorization (NMF) has attracted increasing attention in data mining and machine learning. However, existing NMF methods have some limitations. For example, some NMF methods seriously suffer from noisy data contaminated by outliers, or fail to preserve the geometric information of the data and guarantee the sparse parts-based representation. To overcome these issues, in this paper, a robust and sparse NMF method, called correntropy based dual graph regularized nonnegative matrix factorization with local coordinate constraint (LCDNMF) is proposed. Specifically, LCDNMF incorporates the geometrical information of both the data manifold and the feature manifold, and the local coordinate constraint into the correntropy based objective function. The half-quadratic optimization technique is utilized to solve the nonconvex optimization problem of LCDNMF, and the multiplicative update rules are obtained. Furthermore, some properties of LCDNMF including the convergence, relation with gradient descent method, robustness, and computational complexity are analyzed. Experiments of clustering demonstrate the effectiveness and robustness of the proposed LCDNMF method in comparison to several state-of-the-art methods on six real world image datasets.

Details DOI

TCS Journal 2018 Journal Article

Constructions of balanced odd-variable rotation symmetric Boolean functions with optimal algebraic immunity and high nonlinearity

Lei Sun
Fang-Wei Fu

Rotation symmetric Boolean functions have been used as components of different cryptosystems. In this paper, two classes of balanced rotation symmetric Boolean functions having optimal algebraic immunity on odd number of variables are constructed. We give a lower bound on the algebraic degree of the first class of functions, and prove that the n-variable functions in the second class has optimal algebraic degree if n ≠ 2 m + 1 for m > 2. Moreover, it is shown that both classes of functions have much better nonlinearity than all the previously obtained rotation symmetric Boolean functions with optimal algebraic immunity, and have good behavior against fast algebraic attacks at least for small numbers of input variables.

Details DOI

JMLR Journal 2016 Journal Article

Mutual Information Based Matching for Causal Inference with Observational Data

Lei Sun
Alexander G. Nikolaev

This paper presents an information theory-driven matching methodology for making causal inference from observational data. The paper adopts a âpotential outcomes frameworkâ view on evaluating the strength of cause-effect relationships: the population-wide average effects of binary treatments are estimated by comparing two groups of units -- the treated and untreated (control). To reduce the bias in such treatment effect estimation, one has to compose a control group in such a way that across the compared groups of units, treatment is independent of the units' covariates. This requirement gives rise to a subset selection / matching problem. This paper presents the models and algorithms that solve the matching problem by minimizing the mutual information (MI) between the covariates and the treatment variable. Such a formulation becomes tractable thanks to the derived optimality conditions that tackle the non-linearity of the sample-based MI function. Computational experiments with mixed integer-programming formulations and four matching algorithms demonstrate the utility of MI based matching for causal inference studies. The algorithmic developments culminate in a matching heuristic that allows for balancing the compared groups in polynomial (close to linear) time, thus allowing for treatment effect estimation with large data sets. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

PDF Details

IROS Conference 2005 Conference Paper

The design and fabrication of a flexible three-dimensional force sensor skin

Jian Him Shan
Tao Mei 0003
Lei Sun
Deyi Kong
Zhengyong Zhang
Lin Ni
Max Q. -H. Meng
Jia Ru Chu

To obtain shear and normal stress information on non-planar surfaces has long been a significant challenge. This paper described a new version of flexible tactile sensor skin for three-dimensional force measurement. The sensor array has been fabricated by MEMS technology. The sensor skin which can be bended 90/spl deg/ includes 4/spl times/4 force sensor cells. Each cell is hybrid integrated to flexible printing circuit board which consists of a E-shape diaphragm (50 /spl mu/m thick and 4000 /spl mu/m wide). Each cell exhibits a sensitivity of 228 mv/N to normal force and 34 mv/N to shear force in the designed force range of 2N. Design analysis, fabrication processes, and experimental results are presented in this paper.

Details