Arrow Research search

Author name cluster

Jun Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

59 papers
2 author rows

Possible papers

59

TMLR Journal 2026 Journal Article

MiniGPT-Med: A Unified Vision-Language Model for Radiology Image Understanding

  • Asma Alkhaldi
  • Raneem Alnajim
  • Layan Alabdullatef
  • Rawan Alyahya
  • Jun Chen
  • Deyao Zhu
  • Ahmed Z. Alsinan
  • Mohamed Elhoseiny

Recent advances in artificial intelligence (AI) have precipitated significant breakthroughs in healthcare, particularly in the refinement of diagnostic procedures. However, existing studies have been limited in terms of functional coverage. This study introduces MiniGPT-Med, a vision-language model adapted from MiniGPT-v2 for medical applications through domain-specific fine-tuning on medical datasets. MiniGPT-Med demonstrates remarkable versatility across various imaging modalities, including X-rays, CT scans, and MRIs, enhancing its utility. The model is capable of performing tasks such as medical report generation, visual question answering (VQA), and disease identification within medical imagery. Its integrated processing of both image and textual clinical data markedly improves diagnostic accuracy. Our empirical assessments confirm the superior performance of MiniGPT-Med in disease detection, medical report generation, and VQA benchmarks, representing a significant step towards reducing the gap in assisting radiology practice. Furthermore, it achieves state-of-the-art performance in medical report generation, with substantial gains in BERT-Sim over both specialist and generalist baselines, improving by 17 and 12 points, respectively. MiniGPT-Med promises to become a unified Vision-Language model for radiology diagnoses, enhancing diagnostic efficiency across a wide range of medical imaging applications.

AAAI Conference 2026 Conference Paper

Robust Pedestrian Detection with Uncertain Modality

  • Qian Bie
  • Xiao Wang
  • Bin Yang
  • Zhixi Yu
  • Jun Chen
  • Xin Xu

Existing cross-modal pedestrian detection (CMPD) employs complementary information from RGB and thermal-infrared (TIR) modalities to detect pedestrians in 24h-surveillance systems. RGB captures rich pedestrian details under daylight, while TIR excels at night. However, TIR focuses primarily on the person's silhouette, neglecting critical texture details essential for detection. While the near-infrared (NIR) captures texture under low-light conditions, which effectively alleviates performance issues of RGB and detail loss in TIR, thereby reducing missed detections. To this end, we construct a new Triplet RGB–NIR–TIR (TRNT) dataset, comprising 8,281 pixel-aligned image triplets, establishing a comprehensive foundation for algorithmic research. However, due to the variable nature of real-world scenarios, imaging devices may not always capture all three modalities simultaneously. This results in input data with unpredictable combinations of modal types, which challenge existing CMPD methods that fail to extract robust pedestrian information under arbitrary input combinations, leading to significant performance degradation. To address these challenges, we propose the Adaptive Uncertainty-aware Network (AUNet) for accurately discriminating modal availability and fully utilizing the available information under uncertain inputs. Specifically, we introduce Unified Modality Validation Refinement (UMVR), which includes an uncertainty-aware router to validate modal availability and a semantic refinement to ensure the reliability of information within the modality. Furthermore, we design a Modality-Aware Interaction (MAI) module to adaptively activate or deactivate its internal interaction mechanisms per UMVR output, enabling effective complementary information fusion from available modalities. AUNet enables accurate modality validation and robust inference without fixed modality pairings, facilitating the effective fusion of RGB, NIR, and TIR information across diverse inputs.

AAAI Conference 2026 Conference Paper

Zero-Reference Joint Low-Light Enhancement and Deblurring via Visual Autoregressive Modeling with VLM-Derived Modulation

  • Wei Dong
  • Han Zhou
  • Junwei Lin
  • Jun Chen

Real-world dark images commonly exhibit not only low visibility and contrast but also complex noise and blur, posing significant restoration challenges. Existing methods often rely on paired data or fail to model dynamic illumination and blur characteristics, leading to poor generalization. To tackle this, we propose a generative framework based on visual autoregressive (VAR) modeling, guided by perceptual priors from the vision-language model (VLM). Specifically, to supply informative conditioning cues for VAR models, we deploy an adaptive curve estimation scheme to modulate the diverse illumination based on VLM-derived visibility scores. In addition, we integrate dynamic and spatial-frequency-aware Rotary Positional Encodings (SF-RoPE) into VAR to enhance its ability to model structures degraded by blur. Furthermore, we propose a recursive phase-domain modulation strategy that mitigates blur-induced artifacts in the phase domain via bounded iterative refinement guided by VLM-assessed blur scores. Our framework is fully unsupervised and achieves state-of-the-art performance on benchmark datasets.

AAAI Conference 2025 Conference Paper

Balancing Privacy and Performance: A Many-in-One Approach for Image Anonymization

  • Xuemei Jia
  • Jiawei Du
  • Hui Wei
  • Ruinian Xue
  • Zheng Wang
  • Hongyuan Zhu
  • Jun Chen

The effective utilization of data through Deep Neural Networks (DNNs) has profoundly influenced various aspects of society. The growing demand for high-quality, particularly personalized, data has spurred research efforts to prevent data leakage and protect privacy in recent years. Early privacy-preserving methods primarily relied on instance-wise modifications, such as erasing or obfuscating essential features for de-identification. However, this approach highlights an inherent trade-off: minimal modification offers insufficient privacy protection, while excessive modification significantly degrades task performance. In this paper, we propose a novel Recombining for Obfuscation (FRO) approach to address this trade-off. Unlike existing methods that generate one anonymized instance by perturbing the original data on a one-to-one basis, our FRO approach generates an anonymized instance by reassembling mixed ID-related features from multiple original data sources on a many-in-one basis. Instead of introducing additional noise for de-identification, our approach leverages the existing non-polluted features from other instances to anonymize data. Extensive experiments on identity identification tasks demonstrate that FRO outperforms previous state-of-the-art methods, not only in utility performance but also in visual anonymization.

JBHI Journal 2025 Journal Article

Bidirectional Prototype-Guided Consistency Constraint for Semi-Supervised Fetal Ultrasound Image Segmentation

  • Chongwen Lyu
  • Kai Han
  • Lu Liu
  • Jun Chen
  • Lele Ma
  • Zheng Pang
  • Zhe Liu

Fetal ultrasound (US) image segmentation plays an important role in fetal development assessment, maternal pregnancy management, and intrauterine surgery planning. However, obtaining large-scale, accurately annotated fetal US imaging data is time-consuming and labor-intensive, posing challenges to the application of deep learning in this field. To address this challenge, we propose a semi-supervised fetal US image segmentation method based on bidirectional prototype-guided consistency constraint (BiPCC). BiPCC utilizes the prototype to bridge labeled and unlabeled data and establishes interaction between them. Specifically, the model generates pseudo-labels using prototypes from labeled data and then utilizes these pseudo-labels to generate pseudo-prototypes for segmenting the labeled data inversely, thereby achieving bidirectional consistency. Additionally, uncertainty-based cross-supervision is incorporated to provide additional supervision signals, thereby enhancing the quality of pseudo-labels. Extensive experiments on two fetal US datasets demonstrate that BiPCC outperforms state-of-the-art methods for semi-supervised fetal US segmentation. Furthermore, experimental results on two additional medical segmentation datasets exhibit BiPCC's outstanding generalization capability for diverse medical image segmentation tasks. Our proposed method offers a novel insight for semi-supervised fetal US image segmentation and holds promise for further advancing the development of intelligent healthcare.

AAAI Conference 2025 Conference Paper

Error Analysis Affected by Heavy-Tailed Gradients for Non-Convex Pairwise Stochastic Gradient Descent

  • Jun Chen
  • Hong Chen
  • Bin Gu
  • Guodong Liu
  • Yingjie Wang
  • Weifu Li

In recent years, there have been a growing number of works studying the generalization properties of stochastic gradient descent (SGD) from the perspective of algorithmic stability. However, few of them devote to simultaneously studying the generalization and optimization for the non-convex setting, especially pairwise SGD with heavy-tailed gradient noise. This paper considers the impact of the heavy-tailed gradient noise obeying sub-Weibull distribution on the stability-based learning guarantees for non-convex pairwise SGD by investigating its generalization and optimization jointly. Specifically, based on two novel pairwise uniform model stability tools, we firstly bound the generalization error of pairwise SGD in the general non-convex setting after bridging the quantitative relationships between stability and generalization error. Then, we further consider the practical heavy-tailed sub-Weibull gradient noise condition to establish a refined generalization bound without the bounded gradient condition. Finally, sharper error bounds for generalization and optimization are built by introducing the gradient dominance condition. Comparing these results reveals that sub-Weibull gradient noise brings some positive dependencies on the heavy-tailed strength for generalization and optimization. Furthermore, we extend our analysis to the corresponding pairwise minibatch SGD and derive the first stability-based near-optimal generalization and optimization bounds which are consistent with many empirical observations.

ICLR Conference 2025 Conference Paper

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models

  • Zewei Zhang
  • Huan Liu
  • Jun Chen
  • Xiangyu Xu

In this paper, we introduce GoodDrag, a novel approach to improve the stability and image quality of drag editing. Unlike existing methods that struggle with accumulated perturbations and often result in distortions, GoodDrag introduces an AlDD framework that alternates between drag and denoising operations within the diffusion process, effectively improving the fidelity of the result. We also propose an information-preserving motion supervision operation that maintains the original features of the starting point for precise manipulation and artifact reduction. In addition, we contribute to the benchmarking of drag editing by introducing a new dataset, Drag100, and developing dedicated quality assessment metrics, Dragging Accuracy Index and Gemini Score, utilizing Large Multimodal Models. Extensive experiments demonstrate that the proposed GoodDrag compares favorably against the state-of-the-art approaches both qualitatively and quantitatively. The source code and data are available at https://gooddrag.github.io.

ICML Conference 2025 Conference Paper

How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction

  • Jun Chen
  • Hong Chen
  • Yonghua Yu
  • Yiming Ying

In recent years, contrastive learning has achieved state-of-the-art performance in the territory of self-supervised representation learning. Many previous works have attempted to provide the theoretical understanding underlying the success of contrastive learning. Almost all of them rely on a default assumption, i. e. , the label consistency assumption, which may not hold in practice (the probability of failure is called labeling error) due to the strength and randomness of common augmentation strategies, such as random resized crop (RRC). This paper investigates the theoretical impact of labeling error on the downstream classification performance of contrastive learning. We first reveal several significant negative impacts of labeling error on downstream classification risk. To mitigate these impacts, data dimensionality reduction method (e. g. , singular value decomposition, SVD) is applied on original data to reduce false positive samples, and establish both theoretical and empirical evaluations. Moreover, it is also found that SVD acts as a double-edged sword, which may lead to the deterioration of downstream classification accuracy due to the reduced connectivity of the augmentation graph. Based on the above observations, we give the augmentation suggestion that we should use some moderate embedding dimension (such as $512, 1024$ in our experiments), data inflation, weak augmentation, and SVD to ensure large graph connectivity and small labeling error to improve model performance.

AAAI Conference 2025 Conference Paper

Improving Retrieval Augmented Language Model with Self-Reasoning

  • Yuan Xia
  • Jingbo Zhou
  • Zhenhui Shi
  • Jun Chen
  • Haifeng Huang

The Retrieval-Augmented Language Model (RALM) has demonstrated remarkable performance on knowledge-intensive tasks by integrating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models (LLMs). Despite these advancements, challenges persist in the implementation of RALMs, particularly in terms of reliability and traceability. Specifically, the irrelevant document retrieval may result in unhelpful responses or even deteriorate the performance of LLMs, while the lack of appropriate citations in outputs complicates efforts to verify the trustworthiness of the models. To this end, we propose a novel self-reasoning framework aimed at improving the reliability and traceability of RALMs, whose core idea is to leverage reasoning trajectories generated by the LLM itself. The framework involves constructing self-reasoning trajectories through three processes: a relevance-aware process, an evidence-aware selective process, and a trajectory analysis process. We evaluated our framework across four public datasets (two short-form QA datasets, one long-form QA dataset, and one fact verification dataset) to demonstrate its superiority. Our method can outperform existing state-of-the-art models and achieve performance comparable with GPT-4, using only 2,000 training samples.

JBHI Journal 2025 Journal Article

LiMT: A Multi-Task Liver Image Benchmark Dataset

  • Zhe Liu
  • Kai Han
  • Siqi Ma
  • Yan Zhu
  • Jun Chen
  • Chongwen Lyu
  • Xinyi Qiu
  • Chengxuan Qian

Computer-aided diagnosis (CAD) technology can assist clinicians in evaluating liver lesions and intervening with treatment in time. Although CAD technology has advanced in recent years, the application scope of existing datasets remains relatively limited, typically supporting only single tasks, which has somewhat constrained the development of CAD technology. To address the above limitation, in this paper, we construct a multi-task liver dataset (LiMT) used for liver and tumor segmentation, multi-label lesion classification, and lesion detection based on arterial phase-enhanced computed tomography (CT), potentially providing an exploratory solution that is able to explore the correlation between tasks and does not need to worry about the heterogeneity between task-specific datasets during training. The dataset includes CT volumes from 150 different cases, comprising four types of liver diseases as well as normal cases. Each volume has been carefully annotated and calibrated by experienced clinicians. This public multi-task dataset may become a valuable resource for the medical imaging research community in the future. In addition, this paper not only provides relevant baseline experimental results but also reviews existing datasets and methods related to liver-related tasks. Our dataset is available at https://drive.google.com/drive/folders/1l9HRK13uaOQTNShf5pwgSz3OTanWjkag? usp=sharing.

AAAI Conference 2025 Conference Paper

Low-Light Image Enhancement via Generative Perceptual Priors

  • Han Zhou
  • Wei Dong
  • Xiaohong Liu
  • Yulun Zhang
  • Guangtao Zhai
  • Jun Chen

Although significant progress has been made in enhancing visibility, retrieving texture details, and mitigating noise in Low-Light (LL) images, the challenge persists in applying current Low-Light Image Enhancement (LLIE) methods to real-world scenarios, primarily due to the diverse illumination conditions encountered. Furthermore, the quest for generating enhancements that are visually realistic and attractive remains an underexplored realm. In response to these challenges, we present a novel LLIE framework with the guidance of Generative Perceptual Priors (GPP-LLIE) derived from vision-language models (VLMs). Specifically, we first propose a pipeline that guides VLMs to assess multiple visual attributes of the LL image and quantify the assessment to output the global and local perceptual priors. Subsequently, to incorporate these generative perceptual priors to benefit LLIE, we introduce a transformer-based backbone in the diffusion process, and develop a new layer normalization (GPP-LN) and an attention mechanism (LPP-Attn) guided by global and local perceptual priors. Extensive experiments demonstrate that our model outperforms current SOTA methods on paired LL datasets and exhibits superior generalization on real-world data.

NeurIPS Conference 2025 Conference Paper

Unbiased Prototype Consistency Learning for Multi-Modal and Multi-Task Object Re-Identification

  • Zhongao Zhou
  • Bin Yang
  • Wenke Huang
  • Jun Chen
  • Mang Ye

In object re-identification (ReID) task, both cross-modal and multi-modal retrieval methods have achieved notable progress. However, existing approaches are designed for specific modality and category (person or vehicle) retrieval task, lacking generalizability to others. Acquiring multiple task-specific models would result in wasteful allocation of both training and deployment resources. To address the practical requirements for unified retrieval, we introduce Multi-Modal and Multi-Task object ReID ($\rm {M^3T}$-ReID). The $\rm {M^3T}$-ReID task aims to utilize a unified model to simultaneously achieve retrieval tasks across different modalities and different categories. Specifically, to tackle the challenges of modality distibution divergence and category semantics discrepancy posed in $\rm {M^3T}$-ReID, we design a novel Unbiased Prototype Consistency Learning (UPCL) framework, which consists of two main modules: Unbiased Prototypes-guided Modality Enhancement (UPME) and Cluster Prototype Consistency Regularization (CPCR). UPME leverages modality-unbiased prototypes to simultaneously enhance cross-modal shared features and multi-modal fused features. Additionally, CPCR regulates discriminative semantics learning with category-consistent information through prototypes clustering. Under the collaborative operation of these two modules, our model can simultaneously learn robust cross-modal shared feature and multi-modal fused feature spaces, while also exhibiting strong category-discriminative capabilities. Extensive experiments on multi-modal datasets RGBNT201 and RGBNT100 demonstrates our UPCL framework showcasing exceptional performance for $\rm {M^3T}$-ReID. The code is available at https: //github. com/ZhouZhongao/UPCL.

NeurIPS Conference 2025 Conference Paper

Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding

  • Xiaoqian Shen
  • Wenxuan Zhang
  • Jun Chen
  • Mohamed Elhoseiny

Understanding and reasoning over long videos pose significant challenges for large video language models (LVLMs) due to the difficulty in processing intensive video tokens beyond context window and retaining long-term sequential information. Retrieval-Augmented Generation (RAG) has demonstrated effectiveness in processing long context for Large Language Models (LLMs); however, applying RAG to long video faces challenges such as disrupted temporal dependencies and inclusion of irrelevant information that can hinder accurate reasoning. To address these limitations, we propose Vgent, a novel \textbf{graph-based retrieval-reasoning-augmented generation framework} to enhance LVLMs for long video understanding. Our approach introduces two key innovations: (i) It represents videos by structured graphs with semantic relationships across video clips preserved to improve retrieval effectiveness. (ii) It introduces an intermediate reasoning step to mitigate the reasoning limitation of LVLMs, which leverages structured verification to reduce retrieval noise and facilitate the explicit aggregation of relevant information across clips, resulting in more accurate and context-aware responses. We comprehensively evaluate our framework with various open-source LVLMs on three long-video understanding benchmarks. Our approach yielded an overall performance improvement of $3. 0\%\sim 5. 4\%$ over base models on MLVU, and outperformed state-of-the-art video RAG methods by $8. 6\%$. Our code is publicly available at https: //xiaoqian-shen. github. io/Vgent.

AAAI Conference 2024 Conference Paper

A Multimodal, Multi-Task Adapting Framework for Video Action Recognition

  • Mengmeng Wang
  • Jiazheng Xing
  • Boyuan Jiang
  • Jun Chen
  • Jianbiao Mei
  • Xingxing Zuo
  • Guang Dai
  • Jingdong Wang

Recently, the rise of large-scale vision-language pretrained models like CLIP, coupled with the technology of Parameter-Efficient FineTuning (PEFT), has captured substantial attraction in video action recognition. Nevertheless, prevailing approaches tend to prioritize strong supervised performance at the expense of compromising the models' generalization capabilities during transfer. In this paper, we introduce a novel Multimodal, Multi-task CLIP adapting framework named M2-CLIP to address these challenges, preserving both high supervised performance and robust transferability. Firstly, to enhance the individual modality architectures, we introduce multimodal adapters to both the visual and text branches. Specifically, we design a novel visual TED-Adapter, that performs global Temporal Enhancement and local temporal Difference modeling to improve the temporal representation capabilities of the visual encoder. Moreover, we adopt text encoder adapters to strengthen the learning of semantic label information. Secondly, we design a multi-task decoder with a rich set of supervisory signals, including the original contrastive learning head, a cross-modal classification head, a cross-modal masked language modeling head, and a visual classification head. This multi-task decoder adeptly satisfies the need for strong supervised performance within a multimodal framework. Experimental results validate the efficacy of our approach, demonstrating exceptional performance in supervised learning while maintaining strong generalization in zero-shot scenarios.

JBHI Journal 2024 Journal Article

A Weakly Supervised Segmentation Network Embedding Cross-Scale Attention Guidance and Noise-Sensitive Constraint for Detecting Tertiary Lymphoid Structures of Pancreatic Tumors

  • Bingxue Wang
  • Liwen Zou
  • Jun Chen
  • Yingying Cao
  • Zhenghua Cai
  • Yudong Qiu
  • Liang Mao
  • Zhongqiu Wang

The presence of tertiary lymphoid structures (TLSs) on pancreatic pathological images is an important prognostic indicator of pancreatic tumors. Therefore, TLSs detection on pancreatic pathological images plays a crucial role in diagnosis and treatment for patients with pancreatic tumors. However, fully supervised detection algorithms based on deep learning usually require a large number of manual annotations, which is time-consuming and labor-intensive. In this paper, we aim to detect the TLSs in a manner of few-shot learning by proposing a weakly supervised segmentation network. We firstly obtain the lymphocyte density maps by combining a pretrained model for nuclei segmentation and a domain adversarial network for lymphocyte nuclei recognition. Then, we establish a cross-scale attention guidance mechanism by jointly learning the coarse-scale features from the original histopathology images and fine-scale features from our designed lymphocyte density attention. A noise-sensitive constraint is introduced by an embedding signed distance function loss in the training procedure to reduce tiny prediction errors. Experimental results on two collected datasets demonstrate that our proposed method significantly outperforms the state-of-the-art segmentation-based algorithms in terms of TLSs detection accuracy. Additionally, we apply our method to study the congruent relationship between the density of TLSs and peripancreatic vascular invasion and obtain some clinically statistical results.

TMLR Journal 2024 Journal Article

ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions

  • Deyao Zhu
  • Jun Chen
  • Kilichbek Haydarov
  • Xiaoqian Shen
  • Wenxuan Zhang
  • Mohamed Elhoseiny

Asking insightful questions is crucial for acquiring knowledge and expanding our understanding of the world. However, the importance of questioning has been largely overlooked in AI research, where models have been primarily developed to answer questions. With the recent advancements of large language models (LLMs) like ChatGPT, we discover their capability to ask high-quality questions when provided with a suitable prompt. This discovery presents a new opportunity to develop an automatic questioning system. In this paper, we introduce ChatCaptioner, a novel automatic-questioning method deployed in image captioning. Here, ChatGPT is prompted to ask a series of informative questions about images to BLIP-2, a strong vision question-answering model. In ChatCaptioner, we investigate whether two AI models, unable to individually describe images in detail, can collaborate through an automated, visually guided dialogue to generate a better and more enriched image description than a single AI model. We conduct human-subject evaluations on common image caption datasets such as COCO, Conceptual Caption, and WikiArt, and compare ChatCaptioner with BLIP-2 as well as ground truth. Our results demonstrate that ChatCaptioner's captions are significantly more informative, receiving three times as many votes from human evaluators as BLIP-2 alone for providing the most image information. Besides, ChatCaptioner identifies 53% more objects within the image than BLIP-2 alone measured by WordNet synset matching.

IJCAI Conference 2024 Conference Paper

Cross-Scale Domain Adaptation with Comprehensive Information for Pansharpening

  • Meiqi Gong
  • Hao Zhang
  • Hebaixu Wang
  • Jun Chen
  • Jun Huang
  • Xin Tian
  • Jiayi Ma

Deep learning-based pansharpening methods typically use simulated data at the reduced-resolution scale for training. It limits their performance when generalizing the trained model to the full-resolution scale due to incomprehensive information utilization of panchromatic (PAN) images at the full-resolution scale and low generalization ability. In this paper, we adopt two targeted strategies to address the above two problems. On the one hand, we introduce a cross-scale comprehensive information capture module, which improves the information utilization of the original PAN image through fully-supervised reconstruction. On the other hand, we pioneer a domain adaptation strategy to tackle the problem of low generalization across different scales. Considering the instinct domain gap between different scales, we leverage the maximum mean discrepancy loss and the inherent pixel-level correlations between features at different scales to reduce the scale variance, thus boosting the generalization ability of our model. Experiments on various satellites demonstrate the superiority of our method over the state-of-the-arts in terms of information retention. Our code is publicly available at https: //github. com/Meiqi-Gong/SDIPS.

NeurIPS Conference 2024 Conference Paper

ECMamba: Consolidating Selective State Space Model with Retinex Guidance for Efficient Multiple Exposure Correction

  • Wei Dong
  • Han Zhou
  • Yulun Zhang
  • Xiaohong Liu
  • Jun Chen

Exposure Correction (EC) aims to recover proper exposure conditions for images captured under over-exposure or under-exposure scenarios. While existing deep learning models have shown promising results, few have fully embedded Retinex theory into their architecture, highlighting a gap in current methodologies. Additionally, the balance between high performance and efficiency remains an under-explored problem for exposure correction task. Inspired by Mamba which demonstrates powerful and highly efficient sequence modeling, we introduce a novel framework based on \textbf{Mamba} for \textbf{E}xposure \textbf{C}orrection (\textbf{ECMamba}) with dual pathways, each dedicated to the restoration of reflectance and illumination map, respectively. Specifically, we firstly derive the Retinex theory and we train a Retinex estimator capable of mapping inputs into two intermediary spaces, each approximating the target reflectance and illumination map, respectively. This setup facilitates the refined restoration process of the subsequent \textbf{E}xposure \textbf{C}orrection \textbf{M}amba \textbf{M}odule (\textbf{ECMM}). Moreover, we develop a novel \textbf{2D S}elective \textbf{S}tate-space layer guided by \textbf{Retinex} information (\textbf{Retinex-SS2D}) as the core operator of \textbf{ECMM}. This architecture incorporates an innovative 2D scanning strategy based on deformable feature aggregation, thereby enhancing both efficiency and effectiveness. Extensive experiment results and comprehensive ablation studies demonstrate the outstanding performance and the importance of each component of our proposed ECMamba. Code is available at \url{https: //github. com/LowlevelAI/ECMamba}.

AAAI Conference 2024 Conference Paper

Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations

  • Zilin Wang
  • Haolin Zhuang
  • Lu Li
  • Yinmin Zhang
  • Junjie Zhong
  • Jun Chen
  • Yu Yang
  • Boshi Tang

This paper presents an Exploratory 3D Dance generation framework, E3D2, designed to address the exploration capability deficiency in existing music-conditioned 3D dance generation models. Current models often generate monotonous and simplistic dance sequences that misalign with human preferences because they lack exploration capabilities.The E3D2 framework involves a reward model trained from automatically-ranked dance demonstrations, which then guides the reinforcement learning process. This approach encourages the agent to explore and generate high quality and diverse dance movement sequences. The soundness of the reward model is both theoretically and experimentally validated. Empirical experiments demonstrate the effectiveness of E3D2 on the AIST++ dataset.

NeurIPS Conference 2024 Conference Paper

How Does Black-Box Impact the Learning Guarantee of Stochastic Compositional Optimization?

  • Jun Chen
  • Hong Chen
  • Bin Gu

Stochastic compositional optimization (SCO) problem constitutes a class of optimization problems characterized by the objective function with a compositional form, including the tasks with known derivatives, such as AUC maximization, and the derivative-free tasks exemplified by black-box vertical federated learning (VFL). From the learning theory perspective, the learning guarantees of SCO algorithms with known derivatives have been studied in the literature. However, the potential impacts of the derivative-free setting on the learning guarantees of SCO remains unclear and merits further investigation. This paper aims to reveal the impacts by developing a theoretical analysis for two derivative-free algorithms, black-box SCGD and SCSC. Specifically, we first provide the sharper generalization upper bounds of convex SCGD and SCSC based on a new stability analysis framework more effective than prior work under some milder conditions, which is further developed to the non-convex case using the almost co-coercivity property of smooth function. Then, we derive the learning guarantees of three black-box variants of non-convex SCGD and SCSC with additional optimization analysis. Comparing these results, we theoretically uncover the impacts that a better gradient estimation brings a tighter learning guarantee and a larger proportion of unknown gradients may lead to a stronger dependence on the gradient estimation quality. Finally, our analysis is applied to two SCO algorithms, FOO-based vertical VFL and VFL-CZOFO, to build the first learning guarantees for VFL that align with the findings of SCGD and SCSC.

JMLR Journal 2024 Journal Article

Learning Discretized Neural Networks under Ricci Flow

  • Jun Chen
  • Hanwen Chen
  • Mengmeng Wang
  • Guang Dai
  • Ivor W. Tsang
  • Yong Liu

In this paper, we study Discretized Neural Networks (DNNs) composed of low-precision weights and activations, which suffer from either infinite or zero gradients due to the non-differentiable discrete function during training. Most training-based DNNs in such scenarios employ the standard Straight-Through Estimator (STE) to approximate the gradient w.r.t. discrete values. However, the use of STE introduces the problem of gradient mismatch, arising from perturbations in the approximated gradient. To address this problem, this paper reveals that this mismatch can be interpreted as a metric perturbation in a Riemannian manifold, viewed through the lens of duality theory. Building on information geometry, we construct the Linearly Nearly Euclidean (LNE) manifold for DNNs, providing a background for addressing perturbations. By introducing a partial differential equation on metrics, i.e., the Ricci flow, we establish the dynamical stability and convergence of the LNE metric with the $L^2$-norm perturbation. In contrast to previous perturbation theories with convergence rates in fractional powers, the metric perturbation under the Ricci flow exhibits exponential decay in the LNE manifold. Experimental results across various datasets demonstrate that our method achieves superior and more stable performance for DNNs compared to other representative training-based methods. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

NeurIPS Conference 2024 Conference Paper

Minimum Entropy Coupling with Bottleneck

  • M. Reza Ebrahimi
  • Jun Chen
  • Ashish Khisti

This paper investigates a novel lossy compression framework operating under logarithmic loss, designed to handle situations where the reconstruction distribution diverges from the source distribution. This framework is especially relevant for applications that require joint compression and retrieval, and in scenarios involving distributional shifts due to processing. We show that the proposed formulation extends the classical minimum entropy coupling framework by integrating a bottleneck, allowing for controlled variability in the degree of stochasticity in the coupling. We explore the decomposition of the Minimum Entropy Coupling with Bottleneck (MEC-B) into two distinct optimization problems: Entropy-Bounded Information Maximization (EBIM) for the encoder, and Minimum Entropy Coupling (MEC) for the decoder. Through extensive analysis, we provide a greedy algorithm for EBIM with guaranteed performance, and characterize the optimal solution near functional mappings, yielding significant theoretical insights into the structural complexity of this problem. Furthermore, we illustrated the practical application of MEC-B through experiments in Markov Coding Games (MCGs) under rate limits. These games simulate a communication scenario within a Markov Decision Process, where an agent must transmit a compressed message from a sender to a receiver through its actions. Our experiments highlighted the trade-offs between MDP rewards and receiver accuracy across various compression rates, showcasing the efficacy of our method compared to conventional compression baseline.

AAAI Conference 2024 Conference Paper

SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes

  • Boshi Tang
  • Zhiyong Wu
  • Xixin Wu
  • Qiaochu Huang
  • Jun Chen
  • Shun Lei
  • Helen Meng

Graph neural networks (GNNs) have exhibited impressive performance in modeling graph data as exemplified in various applications. Recently, the GNN calibration problem has attracted increasing attention, especially in cost-sensitive scenarios. Previous work has gained empirical insights on the issue, and devised effective approaches for it, but theoretical supports still fall short. In this work, we shed light on the relationship between GNN calibration and nodewise similarity via theoretical analysis. A novel calibration framework, named SimCalib, is accordingly proposed to consider similarity between nodes at global and local levels. At the global level, the Mahalanobis distance between the current node and class prototypes is integrated to implicitly consider similarity between the current node and all nodes in the same class. At the local level, the similarity of node representation movement dynamics, quantified by nodewise homophily and relative degree, is considered. Informed about the application of nodewise movement patterns in analyzing nodewise behavior on the over-smoothing problem, we empirically present a possible relationship between over-smoothing and GNN calibration problem. Experimentally, we discover a correlation between nodewise similarity and model calibration improvement, in alignment with our theoretical results. Additionally, we conduct extensive experiments investigating different design factors and demonstrate the effectiveness of our proposed SimCalib framework for GNN calibration by achieving state-of-the-art performance on 14 out of 16 benchmarks.

JBHI Journal 2023 Journal Article

CCS-Net: Cascade Detection Network With the Convolution Kernel Switch Block and Statistics Optimal Anchors Block in Hypopharyngeal Cancer MRI

  • Shuo Zhang
  • Yang Miao
  • Jun Chen
  • Xiwei Zhang
  • Lei Han
  • Zehao Huang
  • Ning Pei
  • Haibin Liu

Magnetic resonance imaging (MRI) is a common diagnostic method for hypopharyngeal cancer (HPC). It is a challenge to automatically detect HPC tumors and swollen lymph nodes (HPC risk areas) from MRI slices because of the small size and irregular shape of HPC risk areas. Herein, we propose a cascade detection network with Convolution Kernel Switch (CKS) Block and Statistics Optimal Anchors (SOA) Block in HPC MRI (CCS-Net). CKS Block can adaptively switch standard convolution to deformable convolution in some appropriate layers to detect irregular objects more efficiently without taking up too much computing resources. SOA Block can automatically generate the optimal anchors based on the size distribution of objects. Compared with other methods, our method achieves splendid detection performance and outperforms other methods on the HPC dataset (more than 1800 T2 MRI slices), achieving the highest AP 50 of 78. 90%. Experiments show that the proposed network can be the basis of a computer aided diagnosis utility that helps achieve faster and more accurate diagnostic decisions for HPC.

NeurIPS Conference 2023 Conference Paper

Fine-Grained Theoretical Analysis of Federated Zeroth-Order Optimization

  • Jun Chen
  • Hong Chen
  • Bin Gu
  • Hao Deng

Federated zeroth-order optimization (FedZO) algorithm enjoys the advantages of both zeroth-order optimization and federated learning, and has shown exceptional performance on black-box attack and softmax regression tasks. However, there is no generalization analysis for FedZO, and its analysis on computing convergence rate is slower than the corresponding first-order optimization setting. This paper aims to establish systematic theoretical assessments of FedZO by developing the analysis technique of on-average model stability. We establish the first generalization error bound of FedZO under the Lipschitz continuity and smoothness conditions. Then, refined generalization and optimization bounds are provided by replacing bounded gradient with heavy-tailed gradient noise and utilizing the second-order Taylor expansion for gradient approximation. With the help of a new error decomposition strategy, our theoretical analysis is also extended to the asynchronous case. For FedZO, our fine-grained analysis fills the theoretical gap on the generalization guarantees and polishes the convergence characterization of the computing algorithm.

NeurIPS Conference 2023 Conference Paper

On the choice of Perception Loss Function for Learned Video Compression

  • Sadaf Salehkalaibar
  • Truong Buu Phan
  • Jun Chen
  • Wei Yu
  • Ashish Khisti

We study causal, low-latency, sequential video compression when the output is subjected to both a mean squared-error (MSE) distortion loss as well as a perception loss to target realism. Motivated by prior approaches, we consider two different perception loss functions (PLFs). The first, PLF-JD, considers the joint distribution (JD) of all the video frames up to the current one, while the second metric, PLF-FMD, considers the framewise marginal distributions (FMD) between the source and reconstruction. Using information theoretic analysis and deep-learning based experiments, we demonstrate that the choice of PLF can have a significant effect on the reconstruction, especially at low-bit rates. In particular, while the reconstruction based on PLF-JD can better preserve the temporal correlation across frames, it also imposes a significant penalty in distortion compared to PLF-FMD and further makes it more difficult to recover from errors made in the earlier output frames. Although the choice of PLF decisively affects reconstruction quality, we also demonstrate that it may not be essential to commit to a particular PLF during encoding and the choice of PLF can be delegated to the decoder. In particular, encoded representations generated by training a system to minimize the MSE (without requiring either PLF) can be {\em near universal} and can generate close to optimal reconstructions for either choice of PLF at the decoder. We validate our results using (one-shot) information-theoretic analysis, detailed study of the rate-distortion-perception tradeoff of the Gauss-Markov source model as well as deep-learning based experiments on moving MNIST and KTH datasets.

AAAI Conference 2023 Conference Paper

On the Stability and Generalization of Triplet Learning

  • Jun Chen
  • Hong Chen
  • Xue Jiang
  • Bin Gu
  • Weifu Li
  • Tieliang Gong
  • Feng Zheng

Triplet learning, i.e. learning from triplet data, has attracted much attention in computer vision tasks with an extremely large number of categories, e.g., face recognition and person re-identification. Albeit with rapid progress in designing and applying triplet learning algorithms, there is a lacking study on the theoretical understanding of their generalization performance. To fill this gap, this paper investigates the generalization guarantees of triplet learning by leveraging the stability analysis. Specifically, we establish the first general high-probability generalization bound for the triplet learning algorithm satisfying the uniform stability, and then obtain the excess risk bounds of the order O(log(n)/(√n) ) for both stochastic gradient descent (SGD) and regularized risk minimization (RRM), where 2n is approximately equal to the number of training samples. Moreover, an optimistic generalization bound in expectation as fast as O(1/n) is derived for RRM in a low noise case via the on-average stability analysis. Finally, our results are applied to triplet metric learning to characterize its theoretical underpinning.

AAAI Conference 2023 Conference Paper

Stability-Based Generalization Analysis for Mixtures of Pointwise and Pairwise Learning

  • Jiahuan Wang
  • Jun Chen
  • Hong Chen
  • Bin Gu
  • Weifu Li
  • Xin Tang

Recently, some mixture algorithms of pointwise and pairwise learning (PPL) have been formulated by employing the hybrid error metric of “pointwise loss + pairwise loss” and have shown empirical effectiveness on feature selection, ranking and recommendation tasks. However, to the best of our knowledge, the learning theory foundation of PPL has not been touched in the existing works. In this paper, we try to fill this theoretical gap by investigating the generalization properties of PPL. After extending the definitions of algorithmic stability to the PPL setting, we establish the high-probability generalization bounds for uniformly stable PPL algorithms. Moreover, explicit convergence rates of stochastic gradient descent (SGD) and regularized risk minimization (RRM) for PPL are stated by developing the stability analysis technique of pairwise learning. In addition, the refined generalization bounds of PPL are obtained by replacing uniform stability with on-average stability.

NeurIPS Conference 2023 Conference Paper

SUBP: Soft Uniform Block Pruning for 1$\times$N Sparse CNNs Multithreading Acceleration

  • JINGYANG XIANG
  • Siqi Li
  • Jun Chen
  • Guang Dai
  • Shipeng Bai
  • Yukai Ma
  • Yong Liu

The study of sparsity in Convolutional Neural Networks (CNNs) has become widespread to compress and accelerate models in environments with limited resources. By constraining N consecutive weights along the output channel to be group-wise non-zero, the recent network with 1$\times$N sparsity has received tremendous popularity for its three outstanding advantages: 1) A large amount of storage space saving by a \emph{Block Sparse Row} matrix. 2) Excellent performance at a high sparsity. 3) Significant speedups on CPUs with Advanced Vector Extensions. Recent work requires selecting and fine-tuning 1$\times$N sparse weights based on dense pre-trained weights, leading to the problems such as expensive training cost and memory access, sub-optimal model quality, as well as unbalanced workload across threads (different sparsity across output channels). To overcome them, this paper proposes a novel \emph{\textbf{S}oft \textbf{U}niform \textbf{B}lock \textbf{P}runing} (SUBP) approach to train a uniform 1$\times$N sparse structured network from scratch. Specifically, our approach tends to repeatedly allow pruned blocks to regrow to the network based on block angular redundancy and importance sampling in a uniform manner throughout the training process. It not only makes the model less dependent on pre-training, reduces the model redundancy and the risk of pruning the important blocks permanently but also achieves balanced workload. Empirically, on ImageNet, comprehensive experiments across various CNN architectures show that our SUBP consistently outperforms existing 1$\times$N and structured sparsity methods based on pre-trained models or training from scratch. Source codes and models are available at \url{https: //github. com/JingyangXiang/SUBP}.

JBHI Journal 2022 Journal Article

JAS-GAN: Generative Adversarial Network Based Joint Atrium and Scar Segmentations on Unbalanced Atrial Targets

  • Jun Chen
  • Guang Yang
  • Habib Khan
  • Heye Zhang
  • Yanping Zhang
  • Shu Zhao
  • Raad Mohiaddin
  • Tom Wong

Automated and accurate segmentations of left atrium (LA) and atrial scars from late gadolinium-enhanced cardiac magnetic resonance (LGE CMR) images are in high demand for quantifying atrial scars. The previous quantification of atrial scars relies on a two-phase segmentation for LA and atrial scars due to their large volume difference (unbalanced atrial targets). In this paper, we propose an inter-cascade generative adversarial network, namely JAS-GAN, to segment the unbalanced atrial targets from LGE CMR images automatically and accurately in an end-to-end way. Firstly, JAS-GAN investigates an adaptive attention cascade to automatically correlate the segmentation tasks of the unbalanced atrial targets. The adaptive attention cascade mainly models the inclusion relationship of the two unbalanced atrial targets, where the estimated LA acts as the attention map to adaptively focus on the small atrial scars roughly. Then, an adversarial regularization is applied to the segmentation tasks of the unbalanced atrial targets for making a consistent optimization. It mainly forces the estimated joint distribution of LA and atrial scars to match the real ones. We evaluated the performance of our JAS-GAN on a 3D LGE CMR dataset with 192 scans. Compared with the state-of-the-art methods, our proposed approach yielded better segmentation performance (Average Dice Similarity Coefficient (DSC) values of 0. 946 and 0. 821 for LA and atrial scars, respectively), which indicated the effectiveness of our proposed approach for segmenting unbalanced atrial targets.

AAAI Conference 2022 Conference Paper

REMOTE: Reinforced Motion Transformation Network for Semi-supervised 2D Pose Estimation in Videos

  • Xianzheng Ma
  • Hossein Rahmani
  • Zhipeng Fan
  • Bin Yang
  • Jun Chen
  • Jun Liu

Existing approaches for 2D pose estimation in videos often require a large number of dense annotations, which are costly and labor intensive to acquire. In this paper, we propose a semi-supervised REinforced MOtion Transformation nEtwork (REMOTE) to leverage a few labeled frames and temporal pose variations in videos, which enables effective learning of 2D pose estimation in sparsely annotated videos. Specifically, we introduce a Motion Transformer (MT) module to perform cross frame reconstruction, aiming to learn motion dynamic knowledge in videos. Besides, a novel reinforcement learning-based Frame Selection Agent (FSA) is designed within our framework, which is able to harness informative frame pairs on the fly to enhance the pose estimator under our cross reconstruction mechanism. We conduct extensive experiments that show the efficacy of our proposed REMOTE framework.

IJCAI Conference 2021 Conference Paper

A Novel Sequence-to-Subgraph Framework for Diagnosis Classification

  • Jun Chen
  • Quan Yuan
  • Chao Lu
  • Haifeng Huang

Text-based diagnosis classification is a critical problem in AI-enabled healthcare studies, which assists clinicians in making correct decision and lowering the rate of diagnostic errors. Previous studies follow the routine of sequence based deep learning models in NLP literature to deal with clinical notes. However, recent studies find that structural information is important in clinical contents that greatly impacts the predictions. In this paper, a novel sequence-to-subgraph framework is introduced to process clinical texts for classification, which changes the paradigm of managing texts. Moreover, a new classification model under the framework is proposed that incorporates subgraph convolutional network and hierarchical diagnostic attentive network to extract the layered structural features of clinical texts. The evaluation conducted on both the real-world English and Chinese datasets shows that the proposed method outperforms the state-of-the-art deep learning based diagnosis classification models.

JBHI Journal 2021 Journal Article

Adaptive Stimulation Profiles Modulation for Foot Drop Correction Using Functional Electrical Stimulation: A Proof of Concept Study

  • Yurong Li
  • Xu Yang
  • Yuezhu Zhou
  • Jun Chen
  • Min Du
  • Yuan Yang

Functional electrical stimulation (FES) provides an effective way for foot drop (FD) correction. To overcome the redundant and blind stimulation problems in the state-of-the-art methods, this study proposes a closed-loop scheme for an adaptive electromyography (EMG)-modulated stimulation profile. The developed method detects real-time angular velocity during walking. It provides feedbacks to a long short-term memory (LSTM) neural network for predicting synchronous tibialis anterior (TA) EMG. Based on the prediction, it modulates the stimulation intensity, taking into account of the subject-specific dead zone and saturation of the electrically evoked activation. The proposed method is tested on ten able-bodied participants and six FD subjects as proof of concept. The experimental results show that the proposed method can successfully induce the dorsiflexion of the ankle joint, and generate an activation pattern similar to a natural gait, with the mean Correlation Coefficient of 0. 9021. Thus, the proposed method has the potential to help patients to retrieve normal gait.

NeurIPS Conference 2021 Conference Paper

Universal Rate-Distortion-Perception Representations for Lossy Compression

  • George Zhang
  • Jingjing Qian
  • Jun Chen
  • Ashish Khisti

In the context of lossy compression, Blau & Michaeli (2019) adopt a mathematical notion of perceptual quality and define the information rate-distortion-perception function, generalizing the classical rate-distortion tradeoff. We consider the notion of universal representations in which one may fix an encoder and vary the decoder to achieve any point within a collection of distortion and perception constraints. We prove that the corresponding information-theoretic universal rate-distortion-perception function is operationally achievable in an approximate sense. Under MSE distortion, we show that the entire distortion-perception tradeoff of a Gaussian source can be achieved by a single encoder of the same rate asymptotically. We then characterize the achievable distortion-perception region for a fixed representation in the case of arbitrary distributions, and identify conditions under which the aforementioned results continue to hold approximately. This motivates the study of practical constructions that are approximately universal across the RDP tradeoff, thereby alleviating the need to design a new encoder for each objective. We provide experimental results on MNIST and SVHN suggesting that on image compression tasks, the operational tradeoffs achieved by machine learning models with a fixed encoder suffer only a small penalty when compared to their variable encoder counterparts.

NeurIPS Conference 2020 Conference Paper

Temporal Positive-unlabeled Learning for Biomedical Hypothesis Generation via Risk Estimation

  • Uchenna Akujuobi
  • Jun Chen
  • Mohamed Elhoseiny
  • Michael Spranger
  • Xiangliang Zhang

Understanding the relationships between biomedical terms like viruses, drugs, and symptoms is essential in the fight against diseases. Many attempts have been made to introduce the use of machine learning to the scientific process of hypothesis generation (HG), which refers to the discovery of meaningful implicit connections between biomedical terms. However, most existing methods fail to truly capture the temporal dynamics of scientific term relations and also assume unobserved connections to be irrelevant (i. e. , in a positive-negative (PN) learning setting). To break these limits, we formulate this HG problem as future connectivity prediction task on a dynamic attributed graph via positive-unlabeled (PU) learning. Then, the key is to capture the temporal evolution of node pair (term pair) relations from just the positive and unlabeled data. We propose a variational inference model to estimate the positive prior, and incorporate it in the learning of node pair embeddings, which are then used for link prediction. Experiment results on real-world biomedical term relationship datasets and case study analyses on a COVID-19 dataset validate the effectiveness of the proposed model.

IJCAI Conference 2020 Conference Paper

The Graph-based Mutual Attentive Network for Automatic Diagnosis

  • Quan Yuan
  • Jun Chen
  • Chao Lu
  • Haifeng Huang

The automatic diagnosis has been suffering from the problem of inadequate reliable corpus to train a trustworthy predictive model. Besides, most of the previous deep learning based diagnosis models adopt the sequence learning techniques (CNN or RNN), which is difficult to extract the complex structural information, e. g. graph structure, between the critical medical entities. In this paper, we propose to build the diagnosis model based on the high-standard EMR documents from real hospitals to improve the accuracy and the credibility of the resulting model. Meanwhile, we introduce the Graph Convolutional Network into the model that alleviates the sparse feature problem and facilitates the extraction of structural information for diagnosis. Moreover, we propose the mutual attentive network to enhance the representation of inputs towards the better model performance. The evaluation conducted on the real EMR documents demonstrates that the proposed model is more accurate compared to the previous sequence learning based diagnosis models. The proposed model has been integrated into the information systems in over hundreds of primary health care facilities in China to assist physicians in the diagnostic process.

IJCAI Conference 2020 Conference Paper

When Pedestrian Detection Meets Nighttime Surveillance: A New Benchmark

  • Xiao Wang
  • Jun Chen
  • Zheng Wang
  • Wu Liu
  • Shin'ichi Satoh
  • Chao Liang
  • Chia-Wen Lin

Pedestrian detection at nighttime is a crucial and frontier problem in surveillance, but has not been well explored by the computer vision and artificial intelligence communities. Most of existing methods detect pedestrians under favorable lighting conditions (e. g. daytime) and achieve promising performances. In contrast, they often fail under unstable lighting conditions (e. g. nighttime). Night is a critical time for criminal suspects to act in the field of security. The existing nighttime pedestrian detection dataset is captured by a car camera, specially designed for autonomous driving scenarios. The dataset for nighttime surveillance scenario is still vacant. There are vast differences between autonomous driving and surveillance, including viewpoint and illumination. In this paper, we build a novel pedestrian detection dataset from the nighttime surveillance aspect: NightSurveillance1. As a benchmark dataset for pedestrian detection at nighttime, we compare the performances of state-of-the-art pedestrian detectors and the results reveal that the methods cannot solve all the challenging problems of NightSurveillance. We believe that NightSurveillance can further advance the research of pedestrian detection, especially in the field of surveillance security at nighttime.

AAAI Conference 2018 Conference Paper

Curve-Structure Segmentation From Depth Maps: A CNN-Based Approach and Its Application to Exploring Cultural Heritage Objects

  • Yuhang Lu
  • Jun Zhou
  • Jing Wang
  • Jun Chen
  • Karen Smith
  • Colin Wilder
  • Song Wang

Motivated by the important archaeological application of exploring cultural heritage objects, in this paper we study the challenging problem of automatically segmenting curve structures that are very weakly stamped or carved on an object surface in the form of a highly noisy depth map. Different from most classical low-level image segmentation methods that are known to be very sensitive to the noise and occlusions, we propose a new supervised learning algorithm based on Convolutional Neural Network (CNN) to implicitly learn and utilize more curve geometry and pattern information for addressing this challenging problem. More specifically, we first propose a Fully Convolutional Network (FCN) to estimate the skeleton of curve structures and at each skeleton pixel, a scale value is estimated to reflect the local curve width. Then we propose a dense prediction network to re- fine the estimated curve skeletons. Based on the estimated scale values, we finally develop an adaptive thresholding algorithm to achieve the final segmentation of curve structures. In the experiment, we validate the performance of the proposed method on a dataset of depth images scanned from unearthed pottery sherds dating to the Woodland period of Southeastern North America.

TIST Journal 2018 Journal Article

RelationLines

  • Wei Chen
  • Jing Xia
  • Xumeng Wang
  • Yi Wang
  • Jun Chen
  • Liang Chang

The increased accessibility of urban sensor data and the popularity of social network applications is enabling the discovery of crowd mobility and personal communication patterns. However, studying the egocentric relationships of an individual can be very challenging because available data may refer to direct contacts, such as phone calls between individuals, or indirect contacts, such as paired location presence. In this article, we develop methods to integrate three facets extracted from heterogeneous urban data (timelines, calls, and locations) through a progressive visual reasoning and inspection scheme. Our approach uses a detect-and-filter scheme such that, prior to visual refinement and analysis, a coarse detection is performed to extract the target individual and construct the timeline of the target. It then detects spatio-temporal co-occurrences or call-based contacts to develop the egocentric network of the individual. The filtering stage is enhanced with a line-based visual reasoning interface that facilitates a flexible and comprehensive investigation of egocentric relationships and connections in terms of time, space, and social networks. The integrated system, RelationLines, is demonstrated using a dataset that contains taxi GPS data, cell-base mobility data, mobile calling data, microblog data, and point-of-interest (POI) data from a city with millions of citizens. We examine the effectiveness and efficiency of our system with three case studies and user review.

AAAI Conference 2015 Conference Paper

A Personalized Interest-Forgetting Markov Model for Recommendations

  • Jun Chen
  • Chaokun Wang
  • Jianmin Wang

Intelligent item recommendation is a key issue in AI research which enables recommender systems to be more “human-minded” when generating recommendations. However, one of the major features of human — forgetting, has barely been discussed as regards recommender systems. In this paper, we considered people’s forgetting of interest when performing personalized recommendations, and brought forward a personalized framework to integrate interest-forgetting property with Markov model. Multiple implementations of the framework were investigated and compared. The experimental evaluation showed that our methods could significantly improve the accuracy of item recommendation, which verified the importance of considering interest-forgetting in recommendations.

AAAI Conference 2015 Conference Paper

CrowdMR: Integrating Crowdsourcing with MapReduce for AI-Hard Problems

  • Jun Chen
  • Chaokun Wang
  • Yiyuan Bai

Large-scale distributed computing has made available the resources necessary to solve “AI-hard” problems. As a result, it becomes feasible to automate the processing of such problems, but accuracy is not very high due to the conceptual difficulty of these problems. In this paper, we integrated crowdsourcing with MapReduce to provide a scalable innovative human-machine solution to AI-hard problems, which is called CrowdMR. In CrowdMR, the majority of problem instances are automatically processed by machine while the troublesome instances are redirected to human via crowdsourcing. The results returned from crowdsourcing are validated in the form of CAPTCHA (Completely Automated Public Turing test to Tell Computers and Humans Apart) before adding to the output. An incremental scheduling method was brought forward to combine the results from machine and human in a “pay-as-you-go” way.

AAAI Conference 2015 Conference Paper

Will You “Reconsume” the Near Past? Fast Prediction on Short-Term Reconsumption Behaviors

  • Jun Chen
  • Chaokun Wang
  • Jianmin Wang

The short-term reconsumption behaviors, i. e. “reconsume” the near past, account for a large proportion of people’s activities every day and everywhere. In this paper, we firstly derived four generic features which influence people’s short-term reconsumption behaviors. These features were extracted with respect to different roles in the process of reconsumption behaviors, i. e. users, items and interactions. Then, we brought forward two fast algorithms with the linear and the quadratic kernels to predict whether a user will perform a short-term reconsumption at a specific time given the context. The experimental results show that our proposed algorithms are more accurate in the prediction tasks compared with the baselines. Meanwhile, the time complexity of online prediction of our algorithms is O(1), which enables fast prediction in real-world scenarios. The prediction contributes to more intelligent decision-making, e. g. potential revisited customer identification, personalized recommendation, and information re-finding.

ICRA Conference 2014 Conference Paper

Automated microrobotic characterization of cell-cell communication

  • Jun Liu 0007
  • Vinayakumar Siragam
  • Zheng Gong
  • Jun Chen
  • Clement Leung
  • Zhe Lu
  • Changhai Ru
  • Shaorong Xie

Most mammalian cells (e. g. , cancer cells and cardiomyocytes) adhere to a culturing surface. Compared to robotic injection of suspended cells (e. g. , embryos and oocytes), fewer attempts were made to automate the injection of adherent cells due to their smaller size, highly irregular morphology, small thickness (a few micrometers thick), and large variations in thickness across cells. This paper presents a recently developed robotic system for automated microinjection of adherent cells. The system is embedded with several new capabilities: automatically locating micropipette tips; robustly detecting the contact of micropipette tip with cell culturing surface and directly with cell membrane; and precisely compensating for accumulative positioning errors. These new capabilities make it practical to perform adherent cell microinjection truly via computer mouse clicking in front of a computer monitor, on hundreds and thousands of cells per experiment (vs. a few to tens of cells as state-of-the-art). System operation speed, success rate, and cell viability rate were quantitatively evaluated based on robotic microinjection of over 4, 000 cells. This paper also reports the use of the new robotic system to perform cell-cell communication studies using large sample sizes. The gap junction function in a cardiac muscle cell line (HL-1 cells), for the first time, was quantified with the system.

ICRA Conference 2014 Conference Paper

Simultaneous projection mapping using high-frame-rate depth vision

  • Jun Chen
  • Takashi Yamamoto
  • Tadayoshi Aoyama
  • Takeshi Takaki
  • Idaku Ishii

In this paper, we report on the development of a projection mapping system that can project RGB light patterns that are enhanced for three-dimensional (3-D) scenes using a GPU-based high-frame-rate (HFR) vision system synchronized with HFR projectors. Our system can acquire 512×512 depth images in real time at 500 fps. The depth image processing is accelerated by installing a GPU board for parallel processing of a gray-code structured light method using infrared (IR) light patterns projected from an IR projector. Using the computed depth images, suitable RGB light patterns to be projected are generated in real time for enhanced application tasks. They are projected from an RGB projector as augmented information onto a 3-D scene with pixel-wise correspondence even when the 3-D scene is time-varied. Experimental results obtained from enhanced application tasks for time-varying 3-D scenes such as (1) depth-based color mapping and (2) augmented reality (AR) spirit level, confirm the efficacy of our system.

IROS Conference 2013 Conference Paper

Fast 3-D shape measurement using blink-dot projection

  • Jun Chen
  • Qingyi Gu
  • Hao Gao
  • Tadayoshi Aoyama
  • Takeshi Takaki
  • Idaku Ishii

We propose a novel dot-pattern-projection three-dimensional (3-D) shape measurement method that can measure 3-D displacements of blink dots projected onto a measured object accurately even when it moves rapidly or is observed from a camera as moving rapidly. In our method, blinking dot patterns, in which each dot changes its size at different timings corresponding to its identification (ID) number, are projected from a projector at a high frame rate. 3-D shapes can be obtained without any miscorrespondence of the projected dots between frames by simultaneous tracking and identification of multiple dots projected onto a measured 3-D object in a camera view. Our method is implemented on a field-programmable gate array (FPGA)-based high-frame-rate (HFR) vision platform that can track and recognize as much as 15×15 blink-dot pattern in a 512×512 image in real time at 1000 fps, synchronized with an HFR projector. We demonstrate the performance of our system by showing real-time 3-D measurement results when our system is mounted on a parallel link manipulator as a sensing head.

ICRA Conference 2012 Conference Paper

Analysis of dynamics and planar motion strategies of a swimming microorganism - Giardia lamblia

  • Jun Chen
  • Scott C. Lenaghan
  • Mingjun Zhang

We studied the dynamics associated with planar swimming in the microorganism Giardia lamblia. Giardia parasitizes the small intestine of humans and other animals, and has evolved a robust attachment and swimming mechanism to survive this harsh environment, which provides potential bio-inspiration for microrobot design. In this paper, a 2D dynamic model of flagella-body-fluid interaction was developed to analyze the actuation of the flagellum, energy supply and dissipation, and thrust along the flagellum. We found that to achieve the observed flagella motion, the required actuation bending moment decreases in magnitude from the proximal to the distal end, and that energy only needs to be supplied to the proximal half portion of the flagellum. The supplied energy is dissipated to the fluid continuously along the flagellum, with almost linearly increasing magnitude towards the distal end. Consistently, thrust mainly comes from the posterior portion of the flagellum. We also analyzed the kinematics of the flagella. The characteristics of the forward and turning motion are revealed through simulation. These results may help the gait planning and actuation for energy efficient propulsion in swimming micro-robotic design.