Author name cluster

Bin Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

43 papers

2 author rows

EAAI Journal 2026 Journal Article

A multi-scale adaptive frequency domain network for few-shot current sensor fault diagnosis

Bin Chen
Hongmei Li
Haonan Zhao
Peng Zhang
Jiandong Huang

Addressing the challenges of sample scarcity and inter-domain distribution shift in current sensor fault diagnosis for Permanent Magnet Synchronous Motor (PMSM) systems within real industrial scenarios, this paper proposes a Multi-Scale Adaptive Frequency Domain Network (MSAFD-Net). This method addresses the issues of insufficient feature representation and distribution shifts across operating conditions in few-shot current sensor fault diagnosis. The approach constructs a Multi-Scale Adaptive Frequency Modulator (MS-AFM) to extract sensitive features across frequency bands and designs a Frequency Adaptive Fusion module (FAF-Module) for the dynamic enhancement of multi-frequency-domain information. Furthermore, a joint optimization strategy combining prototype learning and Maximum Mean Discrepancy (MMD) is employed to achieve robust alignment of frequency-domain features between the source and target domains. Experimental results demonstrate that MSAFD-Net attains significantly superior diagnostic performance compared to existing methods across multiple scenarios, including constant speed with variable load, different rotational speeds, dynamic disturbances, and cross-operating condition transfer, exhibiting excellent generalization capability and engineering applicability.

Details DOI

AAAI Conference 2026 Conference Paper

OmniBench: A Comprehensive Benchmark Integrating Real-World, Time-sensitive, and Multi-Hop Questions with a Multi-Dimensional Hybrid Evaluation Framework

Wenjie Wang
Yufeng Jiang
Ge Sun
Chenghang Dong
Zheng Jun
Li Mengjie
Lixin Chen
Huan Wang

Recently, with the increasing capabilities of Large Language Models (LLMs), AI applications have gradually emerged to solve various problems in people's daily lives, so accurately measuring their performance and reliability is paramount. However, existing benchmarks predominantly rely on closed-ended, multiple-choice or short-answer question formats. While useful for assessment, these formats exhibit a significant gap compared to the diverse and open-ended nature of questions posed by real-world users. To bridge this gap, we produce OmniBench, a comprehensive open-domain benchmark. OmniBench is uniquely composed of authentic, user-generated questions harvested from real-world interactions on various websites and applications, covering 16 rigorously defined knowledge domains and 5 crucial user intents derived from a large-scale analysis of the mass corpus. Crucially, we propose three automated data construction pipelines that enable the continuous and periodic updating of the benchmark dataset. This approach not only ensures that the questions can keep up with current events, but also effectively mitigates the critical issue of data contamination prevalent in static benchmarks. Moreover, a multi-dimensional hybrid evaluation framework named OmniEval is proposed for evaluating the responses. This framework combines diverse metrics and evaluation methods to capture nuanced aspects of answer performance. Extensive validation demonstrates that this evaluation framework exhibits strong alignment with human judgments, ensuring the reliability of the benchmark results.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Towards Efficient Low-rate Image Compression with Frequency-aware Diffusion Prior Refinement

Yichong Xia
Yimin Zhou
Jinpeng Wang
Bin Chen

Recent advancements in diffusion-based generative priors have enabled visually plausible image compression at extremely low bit rates. However, existing approaches suffer from slow sampling processes and suboptimal bit allocation due to fragmented training paradigms. In this work, we propose Accelerate Diffusion-based Image Compression via Consistency Prior Refinement (DiffCR), a novel compression framework for efficient and high-fidelity image reconstruction. At the heart of DiffCR is a Frequency-aware Skip Estimation (FaSE) module that refines the epsilon-prediction prior from a pre-trained latent diffusion model and aligns it with compressed latents at different timesteps via Frequency Decoupling Attention (FDA). Furthermore, a lightweight consistency estimator enables fast two-step decoding by preserving the semantic trajectory of diffusion sampling. Without updating the backbone diffusion model, DiffCR achieves substantial bitrate savings (27.2% BD-rate(LPIPS) and 65.1% BD-rate(PSNR)) and over 10 times speed-up compared to SOTA diffusion-based compression baselines.

PDF Details DOI

EAAI Journal 2025 Journal Article

Adaptively multi-modal contrastive fusion network for molecular properties prediction

Wenyan Tang
Meng Li
Yi Zhan
Bin Chen

Molecular property prediction has become the mainstream approach for revealing the underlying mechanisms of biomedical systems with molecular representations. Existing prediction methods based on deep learning typically learn features from molecules at a specific modality or simple fusion solution, failing to consider the inconsistency, complexity, and relationships inherent in multi-modal data. To solve this issue, an adaptively multi-modal contrastive fusion network (AMCFNet) is proposed to adaptively extract the complementary features from interaction and consensus between multi-modal representations for molecular property prediction of breast cancer. The proposed model begins with a two-stream feature extractor module, which learns both one-dimensional (1D) and two-dimensional (2D) molecular representations simultaneously. The basic part of the network is the adaptively contrastive fusion module, contrastively learning features between similar and different molecules with consensus scores, which can adaptively allocate weight to fuse semantic and structural information while avoiding cognitive gaps caused by inconsistencies within multi-modal. Additionally, the final complementary molecular representation is derived by integrating 1D, 2D, and fused 1D-2D features to enhance the prediction of molecular properties in breast cancer. The proposed AMCFNet model is evaluated on five estrogen receptor alpha (ER α ) and five compound public datasets, consistently outperforming state-of-the-art baselines in classification and regression tasks of molecular property prediction including single- and multi-modal methodologies.

Details DOI

EAAI Journal 2025 Journal Article

An effective exploration method based on N-step updated Dirichlet distribution and Dempster–Shafer theory for deep reinforcement learning

Fanghui Huang
Yixin He
Yu Zhang
Bin Chen
Lina Yang

Deep reinforcement learning (DRL) has been regarded as a promising approach for solving decision-making problems. However, how to enhance the agent exploration ability is still an extremely challenging issue for existing methods, especially under sparse rewards. Facing with this challenge, we propose a novel efficient exploration method, which can comprehensively consider the uncertainty of the environment and the uncertainty of Q function, so as to improve the agent exploration efficiency. Specifically, we first construct an exploration policy by n-step updated Dirichlet distribution to implement the adaptive exploration of the agent to the environment, which can reduce the uncertainty of the agent about the environment to achieve global efficient exploration. Next, a state–action basic probability assignment (BPA) is constructed based on the Dempster–Shafer theory. On this basis, an interval Q function is designed by combining BPA and belief interval, which can effectively characterize the uncertainty of the Q function to achieve deep exploration. Then, the proposed method is applied to classic DRL algorithms, deep Q-network (DQN) and double DQN (DDQN), two novel algorithms are proposed. Finally, under a series of sparse external reward tasks, experimental results show that our proposed algorithms outperform several state-of-the-art DRL algorithms in term of exploring efficiency.

Details DOI

AAAI Conference 2025 Conference Paper

Efficient Self-Supervised Video Hashing with Selective State Spaces

Jinpeng Wang
Niu Lian
Jun Li
Yuting Wang
Yan Feng
Bin Chen
Yongbing Zhang
Shu-Tao Xia

Self-supervised video hashing (SSVH) is a practical task in video indexing and retrieval. Although Transformers are predominant in SSVH for their impressive temporal modeling capabilities, they often suffer from computational and memory inefficiencies. Drawing inspiration from Mamba, an advanced state-space model, we explore its potential in SSVH to achieve a better balance between efficacy and efficiency. We introduce S5VH, a Mamba-based video hashing model with an improved self-supervised learning paradigm. Specifically, we design bidirectional Mamba layers for both the encoder and decoder, which are effective and efficient in capturing temporal relationships thanks to the data-dependent selective scanning mechanism with linear complexity. In our learning strategy, we transform global semantics in the feature space into semantically consistent and discriminative hash centers, followed by a center alignment loss as a global learning signal. Our self-local-global (SLG) paradigm significantly improves learning efficiency, leading to faster and better convergence. Extensive experiments demonstrate S5VH's improvements over state-of-the-art methods, superior transferability, and scalable advantages in inference efficiency.

PDF Details DOI

AAAI Conference 2025 Conference Paper

GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

Jintong Hu
Bin Xia
Bin Chen
Wenming Yang
Lei Zhang

Implicit neural representations (INRs) have revolutionized arbitrary-scale super-resolution (ASSR) by modeling images as continuous functions. Most existing INR-based ASSR networks first extract features from the given low-resolution image using an encoder, and then render the super-resolved result via a multi-layer perceptron decoder. Although these approaches have shown promising results, their performance is constrained by the limited representation ability of discrete latent codes in the encoded features. In this paper, we propose a novel ASSR method named GaussianSR that overcomes this limitation through 2D Gaussian Splatting (2DGS). Unlike traditional methods that treat pixels as discrete points, GaussianSR represents each pixel as a continuous Gaussian field. The encoded features are simultaneously refined and upsampled by rendering the mutually stacked Gaussian fields. As a result, long-range dependencies are established to enhance representation ability. In addition, a classifier is developed to dynamically assign Gaussian kernels to all pixels to further improve flexibility. All components of GaussianSR (i.e. encoder, classifier, Gaussian kernels, and decoder) are jointly learned end-to-end. Experiments demonstrate that GaussianSR achieves superior ASSR performance with fewer parameters than existing methods while enjoying interpretable and content-aware feature aggregations.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs

Hao Fang
Changle Zhou
Jiawei Kong
Kuofeng Gao
Bin Chen
Tao Liang
Guojun Ma
Shu-Tao Xia

Large Vision-Language Models (LVLMs) are susceptible to hallucinations, where generated responses seem semantically plausible yet exhibit little or no relevance to the input image. Previous studies reveal that this issue primarily stems from LVLMs' over-reliance on language priors while disregarding the visual information during decoding. To alleviate this issue, we introduce a novel Conditional Pointwise Mutual Information (C-PMI) calibrated decoding strategy, which adaptively strengthens the mutual dependency between generated texts and input images to mitigate hallucinations. Unlike existing methods solely focusing on text token sampling, we propose to jointly model the contributions of visual and textual tokens to C-PMI, formulating hallucination mitigation as a bi-level optimization problem aimed at maximizing mutual information. To solve it, we design a token purification mechanism that dynamically regulates the decoding process by sampling text tokens remaining maximally relevant to the given image, while simultaneously refining image tokens most pertinent to the generated response. Extensive experiments across various benchmarks reveal that the proposed method significantly reduces hallucinations in LVLMs while preserving decoding efficiency.

PDF Details

NeurIPS Conference 2025 Conference Paper

Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior

Yulin Li
Haokun GUI
Ziyang Fan
Junjie Wang
Bin Kang
Bin Chen
Zhuotao Tian

Recent advances in Video Large Language Models (VLLMs) have achieved remarkable video understanding capabilities, yet face critical efficiency bottlenecks due to quadratic computational growth with lengthy visual token sequences of long videos. While existing keyframe sampling methods can improve temporal modeling efficiency, additional computational cost is introduced before feature encoding, and the binary frame selection paradigm is found suboptimal. Therefore, in this work, we propose Dy namic To ken compression via LLM-guided K eyframe prior ( DyToK ), a training-free paradigm that enables dynamic token compression by harnessing VLLMs' inherent attention mechanisms. Our analysis reveals that VLLM attention layers naturally encoding query-conditioned keyframe priors, by which DyToK dynamically adjusts per-frame token retention ratios, prioritizing semantically rich frames while suppressing redundancies. Extensive experiments demonstrate that DyToK achieves state-of-the-art efficiency-accuracy tradeoffs. DyToK shows plug-and-play compatibility with existing compression methods, such as VisionZip and FastV, attaining 2. 5x faster inference while preserving accuracy across multiple VLLMs, such as LLaVA-OneVision and Qwen2. 5-VL. Code and models will be made publicly available.

PDF Details

AAAI Conference 2025 Conference Paper

OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision

Junjie Wang
Bin Chen
Bin Kang
Yulin Li
Weizhi Xian
Yichi Chen
Yong Xu

Open-vocabulary detection aims to detect objects from novel categories beyond the base categories on which the detector is trained. However, existing open-vocabulary detectors trained on base category data tend to assign higher confidence to trained categories and confuse novel categories with the background. To resolve this, we propose OV-DQUO, an Open-Vocabulary DETR with Denoising text Query training and open-world Unknown Objects supervision. Specifically, we introduce a wildcard matching method. This method enables the detector to learn from pairs of unknown objects recognized by the open-world detector and text embeddings with general semantics, mitigating the confidence bias between base and novel categories. Additionally, we propose a denoising text query training strategy. It synthesizes foreground and background query-box pairs from open-world unknown objects to train the detector through contrastive learning, enhancing its ability to distinguish novel objects from the background. We conducted extensive experiments on the OV-COCO and OV-LVIS benchmarks, achieving new state-of-the-art results of 45.6 AP50 and 39.3 mAP on novel categories, respectively.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Point Cloud Mixture-of-Domain-Experts Model for 3D Self-supervised Learning

Yaohua Zha
Tao Dai
Hang Guo
Yanzi Wang
Bin Chen
Ke Chen
Shu-Tao Xia

Point clouds, as a primary representation of 3D data, can be categorized into scene domain point clouds and object domain point clouds. Point cloud self-supervised learning (SSL) has become a mainstream paradigm for learning 3D representations. However, existing point cloud SSL primarily focuses on learning domain-specific 3D representations within a single domain, neglecting the complementary nature of cross-domain knowledge, which limits the learning of 3D representations. In this paper, we propose to learn a comprehensive Point cloud Mixture-of-Domain-Experts model (Point-MoDE) via a block-to-scene pre-training strategy. Specifically, We first propose a mixture-of-domain-expert model consisting of scene domain experts and multiple shared object domain experts. Furthermore, we propose a block-to-scene pretraining strategy, which leverages the features of point blocks in the object domain to regress their initial positions in the scene domain through object-level block mask reconstruction and scene-level block position regression. By integrating the complementary knowledge between object and scene, this strategy simultaneously facilitates the learning of both object-domain and scene-domain representations, leading to a more comprehensive 3D representation. Extensive experiments in downstream tasks demonstrate the superiority of our model.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping

Taolin Zhang
Jinpeng Wang
Hang Guo
Tao Dai
Bin Chen
Shu-Tao Xia

Adaptation of pretrained vision-language models such as CLIP to various downstream tasks have raised great interest in recent researches. Previous works have proposed a variety of test-time adaptation (TTA) methods to achieve strong generalization without any knowledge of the target domain. However, existing training-required TTA approaches like TPT necessitate entropy minimization that involves large computational overhead, while training-free methods like TDA overlook the potential for information mining from the test samples themselves. In this paper, we break down the design of existing popular training-required and training-free TTA methods and bridge the gap between them within our framework. Specifically, we maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples. The historical samples are filtered from the testing data stream and serve to extract useful information from the target distribution, while the boosting samples are drawn from regional bootstrapping and capture the knowledge of the test sample itself. We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets, showcasing its applicability in real-world situations.

PDF Details DOI

EAAI Journal 2024 Journal Article

Boosting cluster tree with reciprocal nearest neighbors scoring

Wen-Bo Xie
Zhen Liu
Bin Chen
Jaideep Srivastava

Clustering plays a pivotal role in knowledge processing, knowledge bases, and expert systems, enabling AI systems to acquire knowledge effectively. Hierarchical clustering, in particular, offers an intelligent approach to represent knowledge hierarchically by transforming raw data into one/multiple tree-shaped components. However, a notable difficulty arises when attempting to pinpoint appropriate representative points within lower levels of the cluster tree. These points are of paramount importance, as they serve as the roots for subsequent aggregation within the upper levels of the cluster tree. Traditional hierarchical clustering algorithms have relied on rudimentary techniques to select these representative points, which may not provide an adequate representation. Consequently, the resulting cluster tree often falls short in terms of empirical performance. To address this shortcoming, we proposed an innovative hierarchical clustering algorithm in this paper. The proposed algorithm is designed to efficiently identify the representative point within each sub-minimum-spanning-tree during the construction of the cluster tree, achieved by topology-based scoring the reciprocal nearest data points. Rigorous testing on UCI datasets has demonstrated the superior clustering accuracy (measured by Rand Index and Normalized Mutual Information) of our proposed algorithm compared to other benchmark algorithms. Further analysis reveals that our algorithm boasts a O ( n log n ) time-complexity and a O ( log n ) space-complexity, indicating its scalability and efficiency in handling large-scale data with minimal time and storage costs. Importantly, our algorithm’s ability to process up to two million data points on a standard personal computer underscores its cost-effectiveness.

Details DOI

NeurIPS Conference 2024 Conference Paper

COSMIC: Compress Satellite Image Efficiently via Diffusion Compensation

Ziyuan Zhang
Han Qiu
Maosen Zhang
Jun Liu
Bin Chen
Tianwei Zhang
Hewu Li

With the rapidly increasing number of satellites in space and their enhanced capabilities, the amount of earth observation images collected by satellites is exceeding the transmission limits of satellite-to-ground links. Although existing learned image compression solutions achieve remarkable performance by using a sophisticated encoder to extract fruitful features as compression and using a decoder to reconstruct. It is still hard to directly deploy those complex encoders on current satellites' embedded GPUs with limited computing capability and power supply to compress images in orbit. In this paper, we propose COSMIC, a simple yet effective learned compression solution to transmit satellite images. We first design a lightweight encoder (i. e. reducing FLOPs by 2. 5~5X) on satellite to achieve a high image compression ratio to save satellite-to-ground links. Then, for reconstructions on the ground, to deal with the feature extraction ability degradation due to simplifying encoders, we propose a diffusion-based model to compensate image details when decoding. Our insight is that satellite's earth observation photos are not just images but indeed multi-modal data with a nature of Text-to-Image pairing since they are collected with rich sensor data (e. g. coordinates, timestep, etc. ) that can be used as the condition for diffusion generation. Extensive experiments show that COSMIC outperforms state-of-the-art baselines on both perceptual and distortion metrics.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

GladCoder: Stylized QR Code Generation with Grayscale-Aware Denoising Process

Yuqiu Xie
Bolin Jiang
Jiawei Li
Naiqi Li
Bin Chen
Tao Dai
Yuang Peng
Shu-Tao Xia

Traditional QR codes consist of a grid of black-and-white square modules, which lack aesthetic appeal and meaning for human perception. This has motivated recent research to beautify the visual appearance of QR codes. However, there exists a trade-off between the visual quality and scanning-robustness of the image, causing outputs of previous works are simple and of low quality to ensure scanning-robustness. In this paper, we introduce a novel approach GladCoder to generate stylized QR codes that are personalized, natural, and text-driven. Its pipeline includes a Depth-guided Aesthetic QR code Generator (DAG) to improve quality of image foreground, and a GrayscaLe-Aware Denoising (GLAD) process to enhance scanning-robustness. The overall pipeline is based on diffusion models, which allow users to create stylized QR images from a textual prompt to describe the image and a textual input to be encoded. Experiments demonstrate that our method can generate stylized QR code with appealing perception details, while maintaining robust scanning reliability under real world applications.

PDF Details DOI

AAAI Conference 2024 Conference Paper

GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval

Yuting Wang
Jinpeng Wang
Bin Chen
Ziyun Zeng
Shu-Tao Xia

Given a text query, partially relevant video retrieval (PRVR) seeks to find untrimmed videos containing pertinent moments in a database. For PRVR, clip modeling is essential to capture the partial relationship between texts and videos. Current PRVR methods adopt scanning-based clip construction to achieve explicit clip modeling, which is information-redundant and requires a large storage overhead. To solve the efficiency problem of PRVR methods, this paper proposes GMMFormer, a Gaussian-Mixture-Model based Transformer which models clip representations implicitly. During frame interactions, we incorporate Gaussian-Mixture-Model constraints to focus each frame on its adjacent frames instead of the whole video. Then generated representations will contain multi-scale clip information, achieving implicit clip modeling. In addition, PRVR methods ignore semantic differences between text queries relevant to the same video, leading to a sparse embedding space. We propose a query diverse loss to distinguish these text queries, making the embedding space more intensive and contain more semantic information. Extensive experiments on three large-scale video datasets (i.e., TVR, ActivityNet Captions, and Charades-STA) demonstrate the superiority and efficiency of GMMFormer.

PDF Details DOI

AAMAS Conference 2024 Conference Paper

HLG: Bridging Human Heuristic Knowledge and Deep Reinforcement Learning for Optimal Agent Performance

Bin Chen
Zehong Cao

Training an optimal policy in deep reinforcement learning (DRL) remains a significant challenge due to the pitfalls of inefficient sampling in dynamic environments with sparse rewards. In this paper, we proposed a Human Local Guide (HLG) incorporating high-level human knowledge and local policies to guide DRL agents to achieve optimal performance. HLG deployed the heuristic rules from human knowledge in differential decision trees and then injected them into neural networks, which can continuously improve the suboptimal global policy till the optimal level. Our developed HLG includes action guides based on a policy-switching mechanism and adaptive action guides inspired by an approximate policy evaluation scheme through a perturbation model to optimise policy further. Our proposed HLG outperforms PPO and PROLONET with at least 25% improvement in training efficiency and exploration capability based on MinGrid environments with sparse reward signals. This implies that HLG has a significant potential to continuously assist the DRL agent in achieving optimal policy in dynamic and complex environments.

PDF

AAAI Conference 2024 Short Paper

Interpreting Temporal Knowledge Graph Reasoning (Student Abstract)

Bin Chen
Kai Yang
Wenxin Tai
Zhangtao Cheng
Leyuan Liu
Ting Zhong
Fan Zhou

Temporal knowledge graph reasoning is an essential task that holds immense value in diverse real-world applications. Existing studies mainly focus on leveraging structural and sequential dependencies, excelling in tasks like entity and link prediction. However, they confront a notable interpretability gap in their predictions, a pivotal facet for comprehending model behavior. In this study, we propose an innovative method, LSGAT, which not only exhibits remarkable precision in entity predictions but also enhances interpretability by identifying pivotal historical events influencing event predictions. LSGAT enables concise explanations for prediction outcomes, offering valuable insights into the otherwise enigmatic "black box" reasoning process. Through an exploration of the implications of the most influential events, it facilitates a deeper understanding of the underlying mechanisms governing predictions.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Invertible Residual Rescaling Models

Jinmin Li
Tao Dai
Yaohua Zha
Yilu Luo
Longfei Lu
Bin Chen
Zhi Wang
Shu-Tao Xia

Invertible Rescaling Networks (IRNs) and their variants have witnessed remarkable achievements in various image processing tasks like image rescaling. However, we observe that IRNs with deeper networks are difficult to train, thus hindering the representational ability of IRNs. To address this issue, we propose Invertible Residual Rescaling Models (IRRM) for image rescaling by learning a bijection between a high-resolution image and its low-resolution counterpart with a specific distribution. Specifically, we propose IRRM to build a deep network, which contains several Residual Downscaling Modules (RDMs) with long skip connections. Each RDM consists of several Invertible Residual Blocks (IRBs) with short connections. In this way, RDM allows rich low-frequency information to be bypassed by skip connections and forces models to focus on extracting high-frequency information from the image. Extensive experiments show that our IRRM performs significantly better than other state-of-the-art methods with much fewer parameters and complexity. Particularly, our IRRM has respectively PSNR gains of at least 0. 3 dB over HCFlow and IRN in the x4 rescaling while only using 60% parameters and 50% FLOPs. The code will be available at https: //github. com/THU-Kingmin/IRRM.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling

Yaohua Zha
Naiqi Li
Yanzi Wang
Tao Dai
Hang Guo
Bin Chen
Zhi Wang
Zhihao Ouyang

The pre-trained point cloud model based on Masked Point Modeling (MPM) has exhibited substantial improvements across various tasks. However, these models heavily rely on the Transformer, leading to quadratic complexity and limited decoder, hindering their practice application. To address this limitation, we first conduct a comprehensive analysis of existing Transformer-based MPM, emphasizing the idea that redundancy reduction is crucial for point cloud analysis. To this end, we propose a Locally constrained Compact point cloud Model (LCM) consisting of a locally constrained compact encoder and a locally constrained Mamba-based decoder. Our encoder replaces self-attention with our local aggregation layers to achieve an elegant balance between performance and efficiency. Considering the varying information density between masked and unmasked patches in the decoder inputs of MPM, we introduce a locally constrained Mamba-based decoder. This decoder ensures linear complexity while maximizing the perception of point cloud geometry information from unmasked patches with higher information density. Extensive experimental results show that our compact model significantly surpasses existing Transformer-based models in both performance and efficiency, especially our LCM-based Point-MAE model, compared to the Transformer-based model, achieved an improvement of 1. 84%, 0. 67%, and 0. 60% in performance on the three variants of ScanObjectNN while reducing parameters by 88% and computation by 73%. The code is available at https: //github. com/zyh16143998882/LCM.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Parameter Efficient Adaptation for Image Restoration with Heterogeneous Mixture-of-Experts

Hang Guo
Tao Dai
Yuanchao Bai
Bin Chen
Xudong Ren
Zexuan Zhu
Shu-Tao Xia

Designing single-task image restoration models for specific degradation has seen great success in recent years. To achieve generalized image restoration, all-in-one methods have recently been proposed and shown potential for multiple restoration tasks using one single model. Despite the promising results, the existing all-in-one paradigm still suffers from high computational costs as well as limited generalization on unseen degradations. In this work, we introduce an alternative solution to improve the generalization of image restoration models. Drawing inspiration from recent advancements in Parameter Efficient Transfer Learning (PETL), we aim to tune only a small number of parameters to adapt pre-trained restoration models to various tasks. However, current PETL methods fail to generalize across varied restoration tasks due to their homogeneous representation nature. To this end, we propose AdaptIR, a Mixture-of-Experts (MoE) with orthogonal multi-branch design to capture local spatial, global spatial, and channel representation bases, followed by adaptive base combination to obtain heterogeneous representation for different degradations. Extensive experiments demonstrate that our AdaptIR achieves stable performance on single-degradation tasks, and excels in hybrid-degradation tasks, with training only 0. 6% parameters for 8 hours.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

ReFIR: Grounding Large Restoration Models with Retrieval Augmentation

Hang Guo
Tao Dai
Zhihao Ouyang
Taolin Zhang
Yaohua Zha
Bin Chen
Shu-Tao Xia

Recent advances in diffusion-based Large Restoration Models (LRMs) have significantly improved photo-realistic image restoration by leveraging the internal knowledge embedded within model weights. However, existing LRMs often suffer from the hallucination dilemma, i. e. , producing incorrect contents or textures when dealing with severe degradations, due to their heavy reliance on limited internal knowledge. In this paper, we propose an orthogonal solution called the Retrieval-augmented Framework for Image Restoration (ReFIR), which incorporates retrieved images as external knowledge to extend the knowledge boundary of existing LRMs in generating details faithful to the original scene. Specifically, we first introduce the nearest neighbor lookup to retrieve content-relevant high-quality images as reference, after which we propose the cross-image injection to modify existing LRMs to utilize high-quality textures from retrieved images. Thanks to the additional external knowledge, our ReFIR can well handle the hallucination challenge and facilitate faithfully results. Extensive experiments demonstrate that ReFIR can achieve not only high-fidelity but also realistic restoration results. Importantly, our ReFIR requires no training and is adaptable to various LRMs.

PDF Details DOI

AAAI Conference 2024 Short Paper

Shallow Diffusion for Fast Speech Enhancement (Student Abstract)

Yue Lei
Bin Chen
Wenxin Tai
Ting Zhong
Fan Zhou

Recently, the field of Speech Enhancement has witnessed the success of diffusion-based generative models. However, these diffusion-based methods used to take multiple iterations to generate high-quality samples, leading to high computational costs and inefficiency. In this paper, we propose SDFEN (Shallow Diffusion for Fast spEech eNhancement), a novel approach for addressing the inefficiency problem while enhancing the quality of generated samples by reducing the iterative steps in the reverse process of diffusion method. Specifically, we introduce the shallow diffusion strategy initiating the reverse process with an adaptive time step to accelerate inference. In addition, a dedicated noisy predictor is further proposed to guide the adaptive selection of time step. Experiment results demonstrate the superiority of the proposed SDFEN in effectiveness and efficiency.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders

Yaohua Zha
Huizhen Ji
Jinmin Li
Rongsheng Li
Tao Dai
Bin Chen
Zhi Wang
Shu-Tao Xia

Learning 3D representation plays a critical role in masked autoencoder (MAE) based pre-training methods for point cloud, including single-modal and cross-modal based MAE. Specifically, although cross-modal MAE methods learn strong 3D representations via the auxiliary of other modal knowledge, they often suffer from heavy computational burdens and heavily rely on massive cross-modal data pairs that are often unavailable, which hinders their applications in practice. Instead, single-modal methods with solely point clouds as input are preferred in real applications due to their simplicity and efficiency. However, such methods easily suffer from limited 3D representations with global random mask input. To learn compact 3D representations, we propose a simple yet effective Point Feature Enhancement Masked Autoencoders (Point-FEMAE), which mainly consists of a global branch and a local branch to capture latent semantic features. Specifically, to learn more compact features, a share-parameter Transformer encoder is introduced to extract point features from the global and local unmasked patches obtained by global random and local block mask strategies, followed by a specific decoder to reconstruct. Meanwhile, to further enhance features in the local branch, we propose a Local Enhancement Module with local patch convolution to perceive fine-grained local context at larger scales. Our method significantly improves the pre-training efficiency compared to cross-modal alternatives, and extensive downstream experiments underscore the state-of-the-art effectiveness, particularly outperforming our baseline (Point-MAE) by 5.16%, 5.00%, and 5.04% in three variants of ScanObjectNN, respectively. Code is available at https://github.com/zyh16143998882/AAAI24-PointFEMAE.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization

Yang Jin
Kun Xu 0005
Li-Wei Chen
Chao Liao
Jianchao Tan
Quzhe Huang
Bin Chen
Chengru Song

Recently, the remarkable advance of the Large Language Model (LLM) has inspired researchers to transfer its extraordinary reasoning capability to both vision and language data. However, the prevailing approaches primarily regard the visual input as a prompt and focus exclusively on optimizing the text generation process conditioned upon vision content by a frozen LLM. Such an inequitable treatment of vision and language heavily constrains the model's potential. In this paper, we break through this limitation by representing both vision and language in a unified form. Specifically, we introduce a well-designed visual tokenizer to translate the non-linguistic image into a sequence of discrete tokens like a foreign language that LLM can read. The resulting visual tokens encompass high-level semantics worthy of a word and also support dynamic sequence length varying from the image. Coped with this tokenizer, the presented foundation model called LaVIT can handle both image and text indiscriminately under the same generative learning paradigm. This unification empowers LaVIT to serve as an impressive generalist interface to understand and generate multi-modal content simultaneously. Extensive experiments further showcase that it outperforms the existing models by a large margin on massive vision-language tasks. Our code and models are available at https://github.com/jy0205/LaVIT.

Details

AAAI Conference 2024 Conference Paper

Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding

Taolin Zhang
Sunan He
Tao Dai
Zhi Wang
Bin Chen
Shu-Tao Xia

In recent years, vision language pre-training frameworks have made significant progress in natural language processing and computer vision, achieving remarkable performance improvement on various downstream tasks. However, when extended to point cloud data, existing works mainly focus on building task-specific models, and fail to extract universal 3D vision-language embedding that generalize well. We carefully investigate three common tasks in semantic 3D scene understanding, and derive key insights into the development of a pre-training model. Motivated by these observations, we propose a vision-language pre-training framework 3DVLP (3D vision-language pre-training with object contrastive learning), which transfers flexibly on 3D vision-language downstream tasks. 3DVLP takes visual grounding as the proxy task and introduces Object-level IoU-guided Detection (OID) loss to obtain high-quality proposals in the scene. Moreover, we design Object-level Cross-Contrastive alignment (OCC) task and Object-level Self-Contrastive learning (OSC) task to align the objects with descriptions and distinguish different objects in the scene, respectively. Extensive experiments verify the excellent performance of 3DVLP on three 3D vision-language tasks, reflecting its superiority in semantic 3D scene understanding. Code is available at https://github.com/iridescentttt/3DVLP.

PDF Details DOI

JBHI Journal 2023 Journal Article

A Domain Generative Graph Network for EEG-Based Emotion Recognition

Yun Gu
Xinyue Zhong
Cheng Qu
Chuanjun Liu
Bin Chen

Emotion is a human attitude experience and corresponding behavioral response to objective things. Effective emotion recognition is important for the intelligence and humanization of brain-computer interface (BCI). Although deep learning has been widely used in emotion recognition in recent years, emotion recognition based on electroencephalography (EEG) is still a challenging task in practical applications. Herein, we proposed a novel hybrid model that employs generative adversarial networks to generate potential representations of EEG signals while combining graph convolutional neural networks and long short-term memory networks to recognize emotions from EEG signals. Experimental results on DEAP and SEED datasets show that the proposed model achieved the promising emotion classification performance compared with the state-of-the-art methods.

Details DOI

AAAI Conference 2023 Conference Paper

Contrastive Masked Autoencoders for Self-Supervised Video Hashing

Yuting Wang
Jinpeng Wang
Bin Chen
Ziyun Zeng
Shu-Tao Xia

Self-Supervised Video Hashing (SSVH) models learn to generate short binary representations for videos without ground-truth supervision, facilitating large-scale video retrieval efficiency and attracting increasing research attention. The success of SSVH lies in the understanding of video content and the ability to capture the semantic relation among unlabeled videos. Typically, state-of-the-art SSVH methods consider these two points in a two-stage training pipeline, where they firstly train an auxiliary network by instance-wise mask-and-predict tasks and secondly train a hashing model to preserve the pseudo-neighborhood structure transferred from the auxiliary network. This consecutive training strategy is inflexible and also unnecessary. In this paper, we propose a simple yet effective one-stage SSVH method called ConMH, which incorporates video semantic information and video similarity relationship understanding in a single stage. To capture video semantic information for better hashing learning, we adopt an encoder-decoder structure to reconstruct the video from its temporal-masked frames. Particularly, we find that a higher masking ratio helps video understanding. Besides, we fully exploit the similarity relationship between videos by maximizing agreement between two augmented views of a video, which contributes to more discriminative and robust hash codes. Extensive experiments on three large-scale video datasets (i.e., FCVID, ActivityNet and YFCC) indicate that ConMH achieves state-of-the-art results. Code is available at https://github.com/huangmozhi9527/ConMH.

PDF Details DOI

AAAI Conference 2023 Conference Paper

FSR: A General Frequency-Oriented Framework to Accelerate Image Super-resolution Networks

Jinmin Li
Tao Dai
Mingyan Zhu
Bin Chen
Zhi Wang
Shu-Tao Xia

Deep neural networks (DNNs) have witnessed remarkable achievement in image super-resolution (SR), and plenty of DNN-based SR models with elaborated network designs have recently been proposed. However, existing methods usually require substantial computations by operating in spatial domain. To address this issue, we propose a general frequency-oriented framework (FSR) to accelerate SR networks by considering data characteristics in frequency domain. Our FSR mainly contains dual feature aggregation module (DFAM) to extract informative features in both spatial and transform domains, followed by a four-path SR-Module with different capacities to super-resolve in the frequency domain. Specifically, DFAM further consists of a transform attention block (TABlock) and a spatial context block (SCBlock) to extract global spectral information and local spatial information, respectively, while SR-Module is a parallel network container that contains four to-be-accelerated branches. Furthermore, we propose an adaptive weight strategy for a trade-off between image details recovery and visual quality. Extensive experiments show that our FSR can save FLOPs by almost 40% while reducing inference time by 50% for other SR methods (e.g., FSRCNN, CARN, SRResNet and RCAN). Code is available at https://github.com/THU-Kingmin/FSR.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Learned Distributed Image Compression with Multi-Scale Patch Matching in Feature Domain

Yujun Huang
Bin Chen
Shiyu Qin
Jiawei Li
Yaowei Wang
Tao Dai
Shu-Tao Xia

Beyond achieving higher compression efficiency over classical image compression codecs, deep image compression is expected to be improved with additional side information, e.g., another image from a different perspective of the same scene. To better utilize the side information under the distributed compression scenario, the existing method only implements patch matching at the image domain to solve the parallax problem caused by the difference in viewing points. However, the patch matching at the image domain is not robust to the variance of scale, shape, and illumination caused by the different viewing angles, and can not make full use of the rich texture information of the side information image. To resolve this issue, we propose Multi-Scale Feature Domain Patch Matching (MSFDPM) to fully utilizes side information at the decoder of the distributed image compression model. Specifically, MSFDPM consists of a side information feature extractor, a multi-scale feature domain patch matching module, and a multi-scale feature fusion network. Furthermore, we reuse inter-patch correlation from the shallow layer to accelerate the patch matching of the deep layer. Finally, we find that our patch matching in a multi-scale feature domain further improves compression rate by about 20% compared with the patch matching method at image domain.

PDF Details DOI

EAAI Journal 2023 Journal Article

Optimization of high-performance concrete mix ratio design using machine learning

Bin Chen
Lei Wang
Zongbao Feng
Yang Liu
Xianguo Wu
Yawei Qin
Lingyu Xia

High-durability concrete is required in extremely cold or ocean environments, making the design of concrete mixes highly important and complicated. In this study, a hybrid intelligent framework for multi-objective optimization based on random forest (RF) and the non-dominated sorting genetic algorithm version II (NSGA-II) is developed to efficiently predict concrete durability and optimize the concrete mix ratio. The relative dynamic elastic modulus of concrete after 300 freeze–thaw cycles and the chloride ion permeability coefficient at 28 days are defined as the standard measures of durability. The concrete mix ratio is taken as the influencing parameter, and orthogonal test data and engineering practice data are collected as the datasets. The proposed framework is applied to a realistic expressway project in a cold region of China. The results demonstrate that (1) a hybrid intelligent framework based on RF-NSGA-II can effectively predict concrete durability and optimize the mix ratio. (2) The developed RF model has an excellent regression learning ability, while the goodness of fit (R2) of concrete durability reaches 0. 9503 and 0. 9551, respectively, with root mean square error (RMSE) values of only 0. 096 and 0. 043, the mean absolute percentage error (MAPE) values of 2. 54% and 2. 17%. (3) After optimization, the concrete durability reaches a high standard, with a frost resistance of >95% and a chloride ion permeability coefficient of <3*10 − 8 cm2/s, at a unit volume cost of only 376. 77 yuan. Hence, the proposed framework can be used to effectively optimize the concrete mix design and provide guidance for similar projects.

Details DOI

IJCAI Conference 2023 Conference Paper

Sph2Pob: Boosting Object Detection on Spherical Images with Planar Oriented Boxes Methods

Xinyuan Liu
Hang Xu
Bin Chen
Qiang Zhao
Yike Ma
Chenggang Yan
Feng Dai

Object detection on panoramic/spherical images has been developed rapidly in the past few years, where IoU-calculator is a fundamental part of various detector components, i. e. Label Assignment, Loss and NMS. Due to the low efficiency and non-differentiability of spherical Unbiased IoU, spherical approximate IoU methods have been proposed recently. We find that the key of these approximate methods is to map spherical boxes to planar boxes. However, there exists two problems in these methods: (1) they do not eliminate the influence of panoramic image distortion; (2) they break the original pose between bounding boxes. They lead to the low accuracy of these methods. Taking the two problems into account, we propose a new sphere-plane boxes transform, called Sph2Pob. Based on the Sph2Pob, we propose (1) an differentiable IoU, Sph2Pob-IoU, for spherical boxes with low time-cost and high accuracy and (2) an agent Loss, Sph2Pob-Loss, for spherical detection with high flexibility and expansibility. Extensive experiments verify the effectiveness and generality of our approaches, and Sph2Pob-IoU and Sph2Pob-Loss together boost the performance of spherical detectors. The source code is available at https: //github. com/AntXinyuan/sph2pob.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Contrastive Quantization with Code Memory for Unsupervised Image Retrieval

Jinpeng Wang
Ziyun Zeng
Bin Chen
Tao Dai
Shu-Tao Xia

The high efficiency in computation and storage makes hashing (including binary hashing and quantization) a common strategy in large-scale retrieval systems. To alleviate the reliance on expensive annotations, unsupervised deep hashing becomes an important research problem. This paper provides a novel solution to unsupervised deep quantization, namely Contrastive Quantization with Code Memory (MeCoQ). Different from existing reconstruction-based strategies, we learn unsupervised binary descriptors by contrastive learning, which can better capture discriminative visual semantics. Besides, we uncover that codeword diversity regularization is critical to prevent contrastive learning-based quantization from model degeneration. Moreover, we introduce a novel quantization code memory module that boosts contrastive learning with lower feature drift than conventional feature memories. Extensive experiments on benchmark datasets show that MeCoQ outperforms state-of-the-art methods. Code and configurations are publicly released.

PDF Details

AAAI Conference 2022 Conference Paper

Is Discourse Role Important for Emotion Recognition in Conversation?

Donovan Ong
Jian Su
Bin Chen
Anh Tuan Luu
Ashok Narendranath
Yue Li
Shuqi Sun
Yingzhan Lin

A conversation is a sequence of utterances, where each utterance plays a specific discourse role while expressing a particular emotion. This paper proposes a novel method to exploit latent discourse role information of an utterance to determine the emotion it conveys in a conversation. Specifically, we use a variant of the Variational-Autoencoder (VAE) to model the context-aware latent discourse roles of each utterance in an unsupervised way. The latent discourse role representation further equips the utterance representation with a salient clue for more accurate emotion recognition. Our experiments show that our proposed method beats the best-reported performances on three public Emotion Recognition in Conversation datasets. This proves that the discourse role information of an utterance plays an important role in the emotion recognition task, which no previous work has studied.

PDF Details

AAAI Conference 2022 Conference Paper

Unbiased IoU for Spherical Image Object Detection

Feng Dai
Bin Chen
Hang Xu
Yike Ma
Xiaodong Li
Bailan Feng
Peng Yuan
Chenggang Yan

As one of the fundamental components of object detection, intersection-over-union (IoU) calculations between two bounding boxes play an important role in samples selection, NMS operation and evaluation of object detection algorithms. This procedure is well-defined and solved for planar images, while it is challenging for spherical ones. Some existing methods utilize planar bounding boxes to represent spherical objects. However, they are biased due to the distortions of spherical objects. Others use spherical rectangles as unbiased representations, but they adopt excessive approximate algorithms when computing the IoU. In this paper, we propose an unbiased IoU as a novel evaluation criterion for spherical image object detection, which is based on the unbiased representations and utilize unbiased analytical method for IoU calculation. This is the first time that the absolutely accurate IoU calculation is applied to the evaluation criterion, thus object detection algorithms can be correctly evaluated for spherical images. With the unbiased representation and calculation, we also present Spherical CenterNet, an anchor free object detection algorithm for spherical images. The experiments show that our unbiased IoU gives accurate results and the proposed Spherical CenterNet achieves better performance on one real-world and two synthetic spherical object detection datasets than existing methods.

PDF Details

EAAI Journal 2021 Journal Article

Concurrent multi-process graph-based design component synthesis: Framework and algorithm

Bin Chen
Jie Hu
Jin Qi
Weixing Chen

Facing today’s increasingly complex and high-demanded design missions, the abundant and multifarious design components distributed in different disciplines and locations should be fully considered and elaborately synthesized. However, this involves a large amount of data processing workload which heavily restrains the application and development of the traditional graph-based synthesis methods. Therefore, a concurrent multi-process graph-based design component synthesis method is proposed to break the bottleneck. With this method, the heavy workload can be dynamically and efficiently decentralized and shared in a group of processes working simultaneously and concurrently. As an application, a software prototype is presented, and the design component synthesis of a biochemical heating system is completed with it.

Details DOI

AAAI Conference 2021 Conference Paper

Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval

Jinpeng Wang
Bin Chen
Qiang Zhang
Zaiqiao Meng
Shangsong Liang
Shutao Xia

Deep quantization methods have shown high efficiency on large-scale image retrieval. However, current models heavily rely on ground-truth information, hindering the application of quantization in label-hungry scenarios. A more realistic demand is to learn from inexhaustible uploaded images that are associated with informal tags provided by amateur users. Though such sketchy tags do not obviously reveal the labels, they actually contain useful semantic information for supervising deep quantization. To this end, we propose Weakly-Supervised Deep Hyperspherical Quantization (WSDHQ), which is the first work to learn deep quantization from weakly tagged images. Specifically, 1) we use word embeddings to represent the tags and enhance their semantic information based on a tag correlation graph. 2) To better preserve semantic information in quantization codes and reduce quantization error, we jointly learn semantics-preserving embeddings and supervised quantizer on hypersphere by employing a well-designed fusion layer and tailor-made loss functions. Extensive experiments show that WSDHQ can achieve state-of-art performance in weakly-supervised compact coding.

PDF Details

AAAI Conference 2020 Conference Paper

Adversarial Attack on Deep Product Quantization Network for Image Retrieval

Yan Feng
Bin Chen
Tao Dai
Shu-Tao Xia

Deep product quantization network (DPQN) has recently received much attention in fast image retrieval tasks due to its efﬁciency of encoding high-dimensional visual features especially when dealing with large-scale datasets. Recent studies show that deep neural networks (DNNs) are vulnerable to input with small and maliciously designed perturbations (a. k. a. , adversarial examples). This phenomenon raises the concern of security issues for DPQN in the testing/deploying stage as well. However, little effort has been devoted to investigating how adversarial examples affect DPQN. To this end, we propose product quantization adversarial generation (PQ-AG), a simple yet effective method to generate adversarial examples for product quantization based retrieval systems. PQ-AG aims to generate imperceptible adversarial perturbations for query images to form adversarial queries, whose nearest neighbors from a targeted product quantizaiton model are not semantically related to those from the original queries. Extensive experiments show that our PQ-AQ successfully creates adversarial examples to mislead targeted product quantization retrieval models. Besides, we found that our PQ-AG signiﬁcantly degrades retrieval performance in both white-box and black-box settings.

PDF Details

IJCAI Conference 2018 Conference Paper

Fast Factorization-free Kernel Learning for Unlabeled Chunk Data Streams

Yi Wang
Nan Xue
Xin Fan
Jiebo Luo
Risheng Liu
Bin Chen
Haojie Li
Zhongxuan Luo

Data stream analysis aims at extracting discriminative information for classification from continuously incoming samples. It is extremely challenging to detect novel data while updating the model in an efficient and stable fashion, especially for the chunk data. This paper proposes a fast factorization-free kernel learning method to unify novelty detection and incremental learning for unlabeled chunk data streams in one framework. The proposed method constructs a joint reproducing kernel Hilbert space from known class centers by solving a linear system in kernel space. Naturally, unlabeled data can be detected and classified among multi-classes by a single decision model. And projecting samples into the discriminative feature space turns out to be the product of two small-sized kernel matrices without needing such time-consuming factorization like QR-decomposition or singular value decomposition. Moreover, the insertion of a novel class can be treated as the addition of a new orthogonal basis to the existing feature space, resulting in fast and stable updating schemes. Both theoretical analysis and experimental validation on real-world datasets demonstrate that the proposed methods learn chunk data streams with significantly lower computational costs and comparable or superior accuracy than the state of the art.

PDF Details

AAMAS Conference 2016 Conference Paper

Collaborative Human Task Assignment for Open Systems (Extended Abstract)

Bin Chen
Adam Eck
Leen-Kiat Soh

Through gathering information, acting autonomously, learning, and behaving socially, intelligent agents provide useful interfaces between complex systems and human users. For example, agents can interact with people to discover their preferences, skills, and expertise, then find suitable tasks that exploit the users’ abilities. We describe modeling environmental openness and human learning in a multiagent system for a human collaborative task assignment problem.

PDF

TIST Journal 2013 Journal Article

Random walks down the mention graphs for event coreference resolution

Bin Chen
Jian Su
Chew Lim Tan

Event coreference is an important task in event extraction and other natural language processing tasks. Despite its importance, it was merely discussed in previous studies. In this article, we present a global coreference resolution system dedicated to various sophisticated event coreference phenomena. First, seven resolvers are utilized to resolve different event and object coreference mention pairs with a new instance selection strategy and new linguistic features. Second, a global solution—a modified random walk partitioning—is employed for the chain formation. Being the first attempt to apply the random walk model for coreference resolution, the revised model utilizes a sampling method, termination criterion, and stopping probability to greatly improve the effectiveness of random walk model for event coreference resolution. Last but not least, the new model facilitates a convenient way to incorporate sophisticated linguistic constraints and preferences, the related object mention graph, as well as pronoun coreference information not used in previous studies for effective chain formation. In total, these techniques impose more than 20% F-score improvement over the baseline system.

Details DOI

YNIMG Journal 2008 Journal Article

Integrated SENSE DTI with correction of susceptibility- and eddy current-induced geometric distortions

Trong-Kha Truong
Bin Chen
Allen W. Song

Diffusion tensor imaging (DTI) is vulnerable to geometric distortions caused by subject-dependent susceptibility effects and diffusion-weighting direction-dependent eddy currents. Although the introduction of sensitivity encoding (SENSE) has reduced the overall distortions for the same imaging matrix size, this benefit is offset by the increasing demand for higher spatial resolution. Thus, significant distortions remain or are exacerbated in high-resolution SENSE DTI acquisitions. While the susceptibility-induced distortions cause global spatial misregistration, the direction-dependent eddy current-induced distortions cause misregistration among different diffusion-weighted images, leading to errors in the derivation of the diffusion tensor in virtually all voxels, and consequently in resulting diffusion parameters as well as in fiber tracking. Here, we apply a comprehensive approach that corrects for both susceptibility- and eddy current-induced distortions to high-resolution SENSE DTI acquisitions, and demonstrate its effectiveness, efficiency, and reliability in vivo as well as its advantages over a twice-refocused spin-echo sequence. This method should find increased use in modern DTI experiments where SENSE acquisitions are commonly used.

Details DOI

YNIMG Journal 2006 Journal Article

Correction for direction-dependent distortions in diffusion tensor imaging using matched magnetic field maps

Bin Chen
Hua Guo
Allen W. Song

Diffusion tensor imaging (DTI) has seen increased usage in clinical and basic science research in the past decade. By assessing the water diffusion anisotropy within biological tissues, e. g. brain, researchers can infer different fiber structures important for neural pathways. A typical DTI data set contains at least one base image and six diffusion-weighted images along non-collinear encoding directions. The resultant images can then be combined to derive the three principal axes of the diffusion tensor and their respective cross terms, which can in turn be used to compute fractional anisotropy (FA) maps, apparent diffusion coefficient (ADC) maps, and to construct axonal fibers. The above operations all assume that DTI images along different diffusion-weighting directions for the same brain register to each other without spatial distortions. This assumption is generally false, as the large diffusion-weighting gradients would usually induce eddy currents to generate diffusion-weighting direction-dependent field gradients, leading to mis-registration within the DTI data set. Traditional methods for correcting magnetic field-induced distortions do not usually take into account these direction-dependent eddy currents unique for DTI, and they are usually time-consuming because multiple phase images need to be acquired. In this report, we describe our theory and implementation of an efficient and effective method to correct for the main field and eddy current-induced direction-dependent distortions for DTI images under a unified framework to facilitate the daily practice of DTI acquisitions.

Details DOI