Arrow Research search

Author name cluster

Chen Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

58 papers
2 author rows

Possible papers

58

EAAI Journal 2026 Journal Article

A novel physics-constrained deep learning framework for the inverse design of assembly contact interfaces

  • Lifei Chen
  • Qiyin Lin
  • Mingjun Qiu
  • Chen Wang
  • Tao Wang
  • Hao Guan
  • Qiyuan Xie
  • Yuge Jiao

Assembly contact interface characteristics critically influence the performance of precision mechanical systems. Traditional design methods relying on iterative finite element analysis are computationally expensive, while existing deep learning approaches often neglect physical constraints and the complex effects of assembly processes. To address these limitations, this paper proposes a physics-constrained deep learning framework for the inverse design of assembly interfaces. Specifically, we introduce a novel network architecture which integrates multi-source inputs including target contact pressure, assembly parameters, and service conditions. To enforce physical consistency, a differentiable loss function incorporating the impenetrability condition is developed. Furthermore, an optimized learning rate scheduling strategy is implemented to enhance model convergence. Comprehensive ablation and comparative experiments demonstrate that our method outperforms conventional approaches in both accuracy and physical plausibility. When applied to an aero-engine flange structure, the framework enables rapid inverse design of interface morphology, reducing maximum contact pressure by 15. 67% and increasing the effective contact area by 45. 23% compared to traditional designs. This work provides a robust solution for assembly interface design and advances the application of physics-constrained deep learning in complex engineering systems.

EAAI Journal 2026 Journal Article

Balance divergence for knowledge distillation

  • Yafei Qi
  • Chen Wang
  • Zhaoning Zhang
  • Yaping Liu
  • Yongmin Zhang

Knowledge distillation (KD) represents a fundamental artificial intelligence (AI) technique for model compression and optimization. In computer vision AI applications, most KD methods use Kullback–Leibler (KL) divergence to align teacher–student output probabilities, but often neglect crucial negative aspects of teacher “dark knowledge” by underweighting low-probability signals. This limitation leads to suboptimal logit mimicry and unbalanced knowledge transfer to the student network. In this paper, we investigate the impact of this imbalance and propose a novel method, named Balance Divergence Distillation (BDD). By introducing a compensatory operation using reverse KL divergence, our method can improve the modeling of the extremely small values in the negative from the teacher and preserve the learning capacity for the positive. Furthermore, we test the impact of different temperature coefficients adjustments, which can lead to further balance in knowledge transfer. The evaluation results demonstrate that our method achieves accuracy improvements of 1 % ∼ 3 % for lightweight student networks over standard KD methods on both Canadian Institute for Advanced Research 100 classes(CIFAR-100) and ImageNet datasets. Additionally, when applied to semantic segmentation, our approach enhances the student by 4. 55% in mean Intersection over Union (mIoU) compared to the baseline on the Cityscapes dataset. These experiments confirm that our method provides a simple yet highly effective solution that can be seamlessly integrated with various KD frameworks across different vision tasks.

AAAI Conference 2026 Conference Paper

Benchmarking LLMs for Political Science: A United Nations Perspective

  • Yueqing Liang
  • Liangwei Yang
  • Chen Wang
  • Congying Xia
  • Rui Meng
  • Xiongxiao Xu
  • Haoran Wang
  • Ali Payani

Large Language Models (LLMs) have achieved significant advances in natural language processing, yet their potential for high-stake political decision-making remains largely unexplored. This paper addresses the gap by focusing on the application of LLMs to the United Nations (UN) decision-making process, where the stakes are particularly high and political decisions can have far-reaching consequences. We introduce a novel dataset comprising publicly available UN Security Council (UNSC) records from 1994 to 2024, including draft resolutions, voting records, and diplomatic speeches. Using this dataset, we propose the United Nations Benchmark (UNBench), the first comprehensive benchmark designed to evaluate LLMs across four interconnected political science tasks: co-penholder judgment, representative voting simulation, draft adoption prediction, and representative statement generation. These tasks span the three stages of the UN decision-making process—drafting, voting, and discussing—and aim to assess LLMs' ability to understand and simulate political dynamics. Our experimental analysis demonstrates the potential and challenges of applying LLMs in this domain, providing insights into their strengths and limitations in political science. To the best of our knowledge, this is the first benchmark to systematically evaluate LLMs in UN decision-making, contributing to the growing intersection of AI and political science.

AAAI Conference 2026 Conference Paper

Instance Generation for Meta-Black-Box Optimization Through Latent Space Reverse Engineering

  • Chen Wang
  • Yue-Jiao Gong
  • Zhiguang Cao
  • Zeyuan Ma

To relieve intensive human-expertise required to design optimization algorithms, recent Meta-Black-Box Optimization (MetaBBO) researches leverage generalization strength of meta-learning to train neural network-based algorithm design policies over a predefined training problem set, which automates the adaptability of the low-level optimizers on unseen problem instances. Currently, a common training problem set choice in existing MetaBBOs is well-known benchmark suites CoCo-BBOB. Although such choice facilitates the MetaBBO's development, problem instances in CoCo-BBOB are more or less limited in diversity, raising the risk of overfitting of MetaBBOs, which might further results in poor generalization. In this paper, we propose an instance generation approach, termed as LSRE, which could generate diverse training problem instances for MetaBBOs to learn more generalizable policies. LSRE first trains an autoencoder which maps high-dimensional problem features into a 2-dimensional latent space. Uniform-grid sampling in this latent space leads to hidden representations of problem instances with sufficient diversity. By leveraging a genetic-programming approach to search function formulas with minimal L2-distance to these hidden representations, LSRE reverse engineers a diversified problem set, termed as Diverse-BBO. We validate the effectiveness of LSRE by training various MetaBBOs on Diverse-BBO and observe their generalization performances on either synthetic or realistic scenarios. Extensive experimental results underscore the superiority of Diverse-BBO to existing training set choices in MetaBBOs. Further ablation studies not only demonstrate the effectiveness of design choices in LSRE, but also reveal interesting insights on instance diversity and MetaBBO's generalization.

AAAI Conference 2026 Conference Paper

Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models

  • Meng Cao
  • Pengfei Hu
  • Yingyao Wang
  • Jihao Gu
  • Haoran Tang
  • Haoze Zhao
  • Chen Wang
  • Jiahua Dong

Recent advancements in Large Video Language Models (LVLMs) have highlighted their potential for multi-modal understanding, yet evaluating their factual grounding in videos remains a critical unsolved challenge. To address this gap, we introduce Video SimpleQA, the first comprehensive benchmark tailored for factuality evaluation in video contexts. Our work differs from existing video benchmarks through the following key features: 1) Knowledge required: demanding integration of external knowledge beyond the video’s explicit narrative; 2) Multi-hop fact-seeking question: Each question involves multiple explicit facts and requires strict factual grounding without hypothetical or subjective inferences. We include per-hop single-fact-based sub-QAs alongside final QAs to enable fine-grained, step-by-step evaluation; 3) Short-form definitive answer: Answers are crafted as unambiguous and definitively correct in a short format with minimal scoring variance; 4) Temporal grounded required: Requiring answers to rely on one or more temporal segments in videos, rather than single frames. We extensively evaluate 33 state-of-the-art LVLMs and summarize key findings as follows: 1) Current LVLMs exhibit notable deficiencies in factual adherence, with the best-performing model o3 merely achieving an F-score of 66.3%; 2) Most LVLMs are overconfident in what they generate, with self-stated confidence exceeding actual accuracy; 3) Retrieval-Augmented Generation demonstrates consistent improvements at the cost of additional inference time overhead; 4) Multi-hop QA demonstrates substantially degraded performance compared to single-hop sub-QAs, with first-hop object/event recognition emerging as the primary bottleneck. We position Video SimpleQA as the cornerstone benchmark for video factuality assessment, aiming to steer LVLM development toward verifiable grounding in real-world contexts.

YNIMG Journal 2025 Journal Article

Dynamic changes in brain function during sleep deprivation: Increased occurrence of non-stationary states indicates the extent of cognitive impairment

  • Ziliang Xu
  • Chaozong Ma
  • Chen Wang
  • Fan Guo
  • Minwen Zheng
  • Peng Fang
  • Yuanqiang Zhu

OBJECTIVE: The brain networks are inherently dynamic, constantly adjusting and reorganizing over time; therefore, the cognitive impairment caused by sleep deprivation (SD) should also exhibit dynamism. However, previous studies on SD that have provided valuable insights predominantly rely on static functional connectivity (FC) analysis. Hence, this study aims to employ dynamical FC (DFC) analysis to capture the dynamic changes in cognitive impairment during SD. METHODS: The data from 32 subjects, encompassing resting state and psychomotor vigilance task (PVT) functional magnetic resonance imaging data collected at five different timepoints (22:00, 00:00, 02:00, 04:00 and 06:00) during a whole night were acquired. Dynamic functional connectivity (DFC) analysis was employed to assess alterations in brain states across the five timepoints, resulting in the identification of three distinct DFC states. RESULTS: After conducting ANOVA analysis, significant changes were observed in the fraction rate of state 1 (non-stationary state) across five timepoints in both resting and task conditions. The transition time corresponding to state 1 consistently showed an increase over time. Furthermore, task condition-related DFC metrics, particularly those associated with state 1, exhibited significant correlations with PVT metrics across five timepoints as well as their changes. CONCLUSIONS: The collective findings suggest that cognitive impairment resulting from sleep deprivation is a dynamic process, with state 1-related indicators exerting the most significant influence on cognition.

NeurIPS Conference 2025 Conference Paper

First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training

  • Lai Wei
  • Yuting Li
  • Chen Wang
  • Yue Wang
  • Linghe Kong
  • Weiran Huang
  • Lichao Sun

Improving Multi-modal Large Language Models (MLLMs) in the post-training stage typically relies on supervised fine-tuning (SFT) or reinforcement learning (RL), which require expensive and manually annotated multi-modal data--an ultimately unsustainable resource. This limitation has motivated a growing interest in unsupervised paradigms as a third stage of post-training after SFT and RL. While recent efforts have explored this direction, their methods are complex and difficult to iterate. To address this, we propose MM-UPT, a simple yet effective framework for unsupervised post-training of MLLMs, enabling continual self-improvement without any external supervision. The training method of MM-UPT builds upon GRPO, replacing traditional reward signals with a self-rewarding mechanism based on majority voting over multiple sampled responses. Our experiments demonstrate that such training method effectively improves the reasoning ability of Qwen2. 5-VL-7B (e. g. , 66. 3\%$\rightarrow$72. 9\% on MathVista, 62. 9\%$\rightarrow$68. 7\% on We-Math), using standard dataset without ground truth labels. To further explore scalability, we extend our framework to a data self-generation setting, designing two strategies that prompt the MLLM to synthesize new training samples on its own. Additional experiments show that combining these synthetic data with the unsupervised training method can also boost performance, highlighting a promising approach for scalable self-improvement. Overall, MM-UPT offers a new paradigm for autonomous enhancement of MLLMs, serving as a critical third step after initial SFT and RL in the absence of external supervision. Our code is available at \url{https: //github. com/waltonfuture/MM-UPT}.

AAAI Conference 2025 Conference Paper

Geometry-Aware 3D Salient Object Detection Network

  • Chen Wang
  • Liyuan Zhang
  • Le Hui
  • Qi Liu
  • Yuchao Dai

Point cloud salient object detection has attracted the attention of researchers in recent years. Since existing works do not fully utilize the geometry context of 3D objects, blurry boundaries are generated when segmenting objects with complex backgrounds. In this paper, we propose a geometry-aware 3D salient object detection network that explicitly clusters points into superpoints to enhance the geometric boundaries of objects, thereby segmenting complete objects with clear boundaries. Specifically, we first propose a simple yet effective superpoint partition module to cluster points into superpoints. In order to improve the quality of superpoints, we present a point cloud class-agnostic loss to learn discriminative point features for clustering superpoints from the object. After obtaining superpoints, we then propose a geometry enhancement module that utilizes superpoint-point attention to aggregate geometric information into point features for predicting the salient map of the object with clear boundaries. Extensive experiments show that our method achieves new state-of-the-art performance on the PCSOD dataset.

IROS Conference 2025 Conference Paper

iWalker: Imperative Visual Planning for Walking Humanoid Robot

  • Xiao Lin
  • Yuhao Huang
  • Taimeng Fu
  • Xiaobin Xiong
  • Chen Wang

Humanoid robots, designed to operate in human-centric environments, serve as a fundamental platform for a broad range of tasks. Although humanoid robots have been extensively studied for decades, a majority of existing humanoid robots still heavily rely on complex modular frameworks, leading to inflexibility and potential compounded errors from independent sensing, planning, and acting components. In response, we propose an end-to-end humanoid sense-plan-act walking system, enabling vision-based obstacle avoidance and footstep planning for whole body balancing simultaneously. We designed two imperative learning (IL)-based bilevel optimizations for model-predictive step planning and whole body balancing, respectively, to achieve self-supervised learning for humanoid robot walking. This enables the robot to learn from arbitrary unlabeled data, improving its adaptability and generalization capabilities. We refer to our method as iWalker and demonstrate its effectiveness in both simulated and real-world environments, representing a significant advancement toward autonomous humanoid robots.

ICLR Conference 2025 Conference Paper

Language Imbalance Driven Rewarding for Multilingual Self-improving

  • Wen Yang
  • Junhong Wu
  • Chen Wang
  • Chengqing Zong
  • Jiajun Zhang 0001

Large Language Models (LLMs) have achieved state-of-the-art performance across numerous tasks. However, these advancements have predominantly benefited "first-class" languages such as English and Chinese, leaving many other languages underrepresented. This imbalance, while limiting broader applications, generates a natural preference ranking between languages, offering an opportunity to bootstrap the multilingual capabilities of LLM in a self-improving manner. Thus, we propose $\textit{Language Imbalance Driven Rewarding}$, where the inherent imbalance between dominant and non-dominant languages within LLMs is leveraged as a reward signal. Iterative DPO training demonstrates that this approach not only enhances LLM performance in non-dominant languages but also improves the dominant language's capacity, thereby yielding an iterative reward signal. Fine-tuning Meta-Llama-3-8B-Instruct over two iterations of this approach results in continuous improvements in multilingual performance across instruction-following and arithmetic reasoning tasks, evidenced by an average improvement of 7.46\% win rate on the X-AlpacaEval leaderboard and 13.9\% accuracy on the MGSM benchmark. This work serves as an initial exploration, paving the way for multilingual self-improvement of LLMs.

NeurIPS Conference 2025 Conference Paper

MetaBox-v2: A Unified Benchmark Platform for Meta-Black-Box Optimization

  • Zeyuan Ma
  • Yue-Jiao Gong
  • Hongshu Guo
  • Wenjie Qiu
  • Sijie Ma
  • Hongqiao Lian
  • Jiajun Zhan
  • Kaixu Chen

Meta-Black-Box Optimization (MetaBBO) streamlines the automation of optimization algorithm design through meta-learning. It typically employs a bi-level structure: the meta-level policy undergoes meta-training to reduce the manual effort required in developing algorithms for low-level optimization tasks. The original MetaBox (2023) provided the first open-source framework for reinforcement learning-based single-objective MetaBBO. However, its relatively narrow scope no longer keep pace with the swift advancement in this field. In this paper, we introduce MetaBox-v2 (\url{https: //github. com/MetaEvo/MetaBox}) as a milestone upgrade with four novel features: 1) a unified architecture supporting RL, evolutionary, and gradient-based approaches, by which we reproduce $23$ up-to-date baselines; 2) efficient parallelization schemes, which reduce the training/testing time by $10-40$x; 3) a comprehensive benchmark suite of $18$ synthetic/realistic tasks ($1900$+ instances) spanning single-objective, multi-objective, multi-model, and multi-task optimization scenarios; 4) plentiful and extensible interfaces for custom analysis/visualization and integrating to external optimization tools/benchmarks. To show the utility of MetaBox-v2, we carry out a systematic case study that evaluates the built-in baselines in terms of the optimization performance, generalization ability and learning efficiency. Valuable insights are concluded from thorough and detailed analysis for practitioners and those new to the field.

AAAI Conference 2025 Conference Paper

Nearly Tight Bounds for Exploration in Streaming Multi-Armed Bandits with Known Optimality Gap

  • Nikolai Karpov
  • Chen Wang

We investigate the sample-memory-pass trade-offs for pure exploration in multi-pass streaming multi-armed bandits (MABs) with the *a priori* knowledge of the optimality gap?_[2]. Here, and throughout, the optimality gap?_[i] is defined as the mean reward gap between the best and the i-th best arms. A recent line of results have shown that if there is no known?_[2], a pass complexity of ̃?(log(1/?_[2])) is necessary and sufficient to obtain the *worst-case optimal* O(n/?²_[2]) sample complexity with a single-arm memory. However, our understanding of multi-pass algorithms with known?_[2] is still limited. Here, the key open problem is how many passes are required to achieve the complexity, i.e., O( ∑ᵢ₌₂ⁿ1/?²_[i] log{n}) arm pulls, with a sublinear memory size. In this work, we show that the ``right answer'' for the question is ̃?(log{n}) passes. We first present a lower bound, showing that any algorithm that finds the best arm with slightly sublinear memory -- a memory of o(n/polylog(n)) arms -- and O( ∑ᵢ₌₂ⁿ1/?²_[i] log n) arm pulls has to make?(log n/loglog n) passes over the stream. We then show a nearly-matching algorithm that assuming the knowledge of?_[2], finds the best arm with O( ∑ᵢ₌₂ⁿ1/?²_[i] log n) arm pulls and a *single arm* memory.

NeurIPS Conference 2025 Conference Paper

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

  • Chen Wang
  • Chuhao Chen
  • Yiming Huang
  • Zhiyang Dou
  • Yuan Liu
  • Jiatao Gu
  • Lingjie Liu

Existing video generation models excel at producing photo-realistic videos from text or images, but often lack physical plausibility and 3D controllability. To overcome these limitations, we introduce PhysCtrl, a novel framework for physics-grounded image-to-video generation with physical parameters and force control. At its core is a generative physics network that learns the distribution of physical dynamics across four materials (elastic, sand, plasticine, and rigid) via a diffusion model conditioned on physics parameters and applied forces. We represent physical dynamics as 3D point trajectories and train on a large-scale synthetic dataset of 550K animations generated by physics simulators. We enhance the diffusion model with a novel spatiotemporal attention block that emulates particle interactions and incorporates physics-based constraints during training to enforce physical plausibility. Experiments show that PhysCtrl generates realistic, physics-grounded motion trajectories which, when used to drive image-to-video models, yield high-fidelity, controllable videos that outperform existing methods in both visual quality and physical plausibility. Our code, model and data will be made publicly available upon publication.

IROS Conference 2025 Conference Paper

Physics-Informed LSTM for Shape and Contact Force Prediction of a Flexible Surgical Robot *

  • Feng Ju
  • Chen Wang
  • Yingying Wang
  • Yuxing Wang
  • Liping Ding

Real-time morphological perception and precise end force feedback prediction of surgical robots constitute critical technical elements for ensuring safety and efficacy in complex interventional procedures such as Endoscopic Retrograde Cholangiopancreatography (ERCP). In this paper, we design a miniature flexible surgical robot (FSR) with a nested spring structure and proposed a physics-informed deep learning approach to simultaneously predict both the FSR's shape and 2D contact forces at its end-effector. The physical constraints were derived from a quasi-static model of the FSR, which is capable of characterizing persistent environmental interactions. Our method eliminates the need for end-effector sensors, not only ensuring high accuracy in both shape and contact force predictions but also maintaining consistent predictive performance under continuous environmental interactions. Experimental validation of the method revealed a high consistency between predicted values and reference data, achieving a 34. 97% improvement in computational speed and a maximum prediction accuracy enhancement of 71. 64% compared to conventional LSTM approaches.

EAAI Journal 2025 Journal Article

Predictor-based state-constrained bipartite formation control for nonlinear multi-agent systems with disturbances

  • Yang Yang
  • Xiao Wu
  • Hongyan Yu
  • Chen Wang

Constrained states and transient performance are critical in bipartite formation control of multi-agent systems. In this paper, a predictor-based state-constrained bipartite formation control strategy is proposed for a class of nonlinear multi-agent systems with unknown external disturbances. A dual barrier Lyapunov function is proposed to constrain both prediction errors and predictor-based surface errors. With the help of the dual barrier Lyapunov function, a barrier Lyapunov function-based predictor is then constructed. It alleviates oscillations generated by overlarge adaptive gains in identification of state-constrained followers. With a barrier Lyapunov function-based predictor, barrier Lyapunov function neural networks are developed to approximate unknown dynamics of a multi-agent system with state constraints. A barrier Lyapunov function neural network nonlinear disturbance observer is designed for compensating for generalized disturbances including disturbances as well as barrier Lyapunov function neural networks’ identification errors. From analysis, it is proven that the multi-agent system achieves bipartite formation and states of followers satisfy the constrained condition. The effectiveness of the strategy with state constraints is verified via two simulation examples. Compared with the traditional approaches, the proposed strategy reduces the integrated absolute formation error by 31% and the integrated absolute error of identification by 63% during the simulation process.

AAAI Conference 2025 Conference Paper

Real-Time Neural Denoising with Render-Aware Knowledge Distillation

  • Mengxun Kong
  • Jie Guo
  • Chen Wang
  • Ye Yuan
  • Yanwen Guo

Real-time Monte Carlo (MC) ray tracing with low sampling rates demands a denoising algorithm that adeptly balances the trade-off between quality and efficiency. Previous works have paid much attention on designing delicate denoising architecture while ignoring model compression. In this work, we present a render-aware knowledge distillation (RAKD) framework, specifically designed for Monte Carlo denoising. We meticulously delineate the Knowledge Distillation (KD) process within RAKD, emphasizing three pivotal techniques: the strategic incorporation of an auxiliary unlabeled dataset, the integration of adversarial learning through generative adversarial network (GAN), and the application of parameter transfer for robust model initialization. These approaches are harmoniously combined to distill knowledge effectively, enabling our student model to adeptly strike a balance between preserving high-frequency details and reducing low-frequency noise. Finally, our results demonstrate that RAKD achieves state-of-the-art quality while upholding real-time performance, successfully tackling the computational constraints faced by resource-limited devices.

IROS Conference 2025 Conference Paper

SLU-DQN: A Model for Anticipatory Steam Detection for Steamer-Filling in Baijiu Intelligent Distillation Systems

  • Jia Yu
  • Jiankun Ren
  • Hanwen Liang
  • Chen Wang
  • Lizhe Qi
  • Yunquan Sun

The true implementation of the Anticipatory Steam Detection for Steamer-Filling(ASDSF) process in baijiu intelligent distillation systems, which involves predicting and precisely spreading distillers’ grains before steam emerges, remains a critical unresolved challenge. In this study, we introduce the SLU model, which utilizes SwinLSTM as the core feature extraction module and adopts a U-shaped structure. This model achieves spatiotemporal feature extraction and dynamic change prediction. It is further enhanced by integrating a U-Net module for multi-scale feature fusion and optimized through a Deep Q-Network (DQN)-based decision-making process. The SLU-DQN model, specifically designed for anticipatory material spreading planning in the baijiu Steamer-Filling(SF) distillation system, predicts future steam emission areas. Finally, both quantitative and qualitative experimental results demonstrate the excellent performance of the SLU-DQN model in solving the ASDSF problem. The model achieved 91. 1% reward accuracy, an F1-Score of 91% for material spreading point prediction, an MSE of 19. 02, and an SSIM of 95. 8%. These results not only highlight the model’s superior accuracy in predicting future steam emission areas but also provide a significant technical breakthrough for intelligent baijiu distillation systems, filling a crucial gap in the field.

NeurIPS Conference 2025 Conference Paper

Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking

  • Changlun Li
  • Yao SHI
  • Chen Wang
  • Qiqi Duan
  • Runke RUAN
  • Weijie Huang
  • Haonan Long
  • Lijun Huang

Large Language Models (LLMs) have demonstrated notable capabilities across financial tasks, including financial report summarization, earnings call transcript analysis, and asset classification. However, their real-world effectiveness in managing complex fund investment remains inadequately assessed. A fundamental limitation of existing benchmarks for evaluating LLM-driven trading strategies is their reliance on historical back-testing, inadvertently enabling LLMs to "time travel"—leveraging future information embedded in their training corpora, thus resulting in possible information leakage and overly optimistic performance estimates. To address this issue, we introduce DeepFund, a live fund benchmark tool designed to rigorously evaluate LLM in real-time market conditions. Utilizing a multi-agent architecture, DeepFund connects directly with real-time stock market data—specifically data published after each model’s pretraining cutoff—to ensure fair and leakage-free evaluations. Empirical tests on nine flagship LLMs from leading global institutions across multiple investment dimensions—including ticker-level analysis, investment decision-making, portfolio management, and risk control—reveal significant practical challenges. Notably, even cutting-edge models such as DeepSeek-V3 and Claude-3. 7-Sonnet incur net trading losses within DeepFund real-time evaluation environment, underscoring the present limitations of LLMs for active fund management. Our code is available at https: //github. com/HKUSTDial/DeepFund.

NeurIPS Conference 2025 Conference Paper

V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

  • Lei Yang
  • Xinyu Zhang
  • Jun Li
  • Chen Wang
  • Jiaqi Ma
  • Zhiying Song
  • Tong Zhao
  • Ziying Song

Modern autonomous vehicle perception systems often struggle with occlusions and limited perception range. Previous studies have demonstrated the effectiveness of cooperative perception in extending the perception range and overcoming occlusions, thereby enhancing the safety of autonomous driving. In recent years, a series of cooperative perception datasets have emerged; however, these datasets primarily focus on cameras and LiDAR, neglecting 4D Radar—a sensor used in single-vehicle autonomous driving to provide robust perception in adverse weather conditions. In this paper, to bridge the gap created by the absence of 4D Radar datasets in cooperative perception, we present V2X-Radar, the first large-scale, real-world multi-modal dataset featuring 4D Radar. V2X-Radar dataset is collected using a connected vehicle platform and an intelligent roadside unit equipped with 4D Radar, LiDAR, and multi-view cameras. The collected data encompasses sunny and rainy weather conditions, spanning daytime, dusk, and nighttime, as well as various typical challenging scenarios. The dataset consists of 20K LiDAR frames, 40K camera images, and 20K 4D Radar data, including 350K annotated boxes across five categories. To support various research domains, we have established V2X-Radar-C for cooperative perception, V2X-Radar-I for roadside perception, and V2X-Radar-V for single-vehicle perception. Furthermore, we provide comprehensive benchmarks across these three sub-datasets.

TMLR Journal 2025 Journal Article

Zero-1-to-G: Taming Pretrained 2D Diffusion Model for Direct 3D Generation

  • Xuyi Meng
  • Chen Wang
  • Jiahui Lei
  • Kostas Daniilidis
  • Jiatao Gu
  • Lingjie Liu

Recent advances in 2D image generation have achieved remarkable quality, largely driven by the capacity of diffusion models and the availability of large-scale datasets. However, direct 3D generation is still constrained by the scarcity and lower fidelity of 3D datasets. In this paper, we introduce Zero-1-to-G, a novel approach that addresses this problem by enabling direct single-view generation on Gaussian splats using pretrained 2D diffusion models. Our key insight is that Gaussian splats, a 3D representation, can be decomposed into multi-view images encoding different attributes. This reframes the challenging task of direct 3D generation within a 2D diffusion framework, allowing us to leverage the rich priors of pretrained 2D diffusion models. To incorporate 3D awareness, we introduce cross-view and cross-attribute attention layers, which capture complex correlations and enforce 3D consistency across generated splats. This makes Zero-1-to-G the first direct image-to-3D generative model to effectively utilize pretrained 2D diffusion priors, enabling efficient training and improved generalization to unseen objects. Extensive experiments on both synthetic and in-the-wild datasets demonstrate superior performance in 3D object generation, offering a new approach to high-quality 3D generation.

YNIMG Journal 2024 Journal Article

Adaptive node feature extraction in graph-based neural networks for brain diseases diagnosis using self-supervised learning

  • Youbing Zeng
  • Jiaying Lin
  • Zhuoshuo Li
  • Zehui Xiao
  • Chen Wang
  • Xinting Ge
  • Cheng Wang
  • Gui Huang

Electroencephalography (EEG) has demonstrated significant value in diagnosing brain diseases. In particular, brain networks have gained prominence as they offer additional valuable insights by establishing connections between EEG signal channels. While brain connections are typically delineated by channel signal similarity, there lacks a consistent and reliable strategy for ascertaining node characteristics. Conventional node features such as temporal and frequency domain properties of EEG signals prove inadequate for capturing the extensive EEG information. In our investigation, we introduce a novel adaptive method for extracting node features from EEG signals utilizing a distinctive task-induced self-supervised learning technique. By amalgamating these extracted node features with fundamental edge features constructed using Pearson correlation coefficients, we showed that the proposed approach can function as a plug-in module that can be integrated to many common GNN networks (e.g., GCN, GraphSAGE, GAT) as a replacement of node feature selections module. Comprehensive experiments are then conducted to demonstrate the consistently superior performance and high generality of the proposed method over other feature selection methods in various of brain disorder prediction tasks, such as depression, schizophrenia, and Parkinson's disease. Furthermore, compared to other node features, our approach unveils profound spatial patterns through graph pooling and structural learning, shedding light on pivotal brain regions influencing various brain disorder prediction based on derived features.

IJCAI Conference 2024 Conference Paper

AI-Olympics: Exploring the Generalization of Agents through Open Competitions

  • Chen Wang
  • Yan Song
  • Shuai Wu
  • Sa Wu
  • Ruizhi Zhang
  • Shu Lin
  • Haifeng Zhang

Between 2021 and 2023, AI-Olympics---a series of online AI competitions, was hosted by the online evaluation platform Jidi in collaboration with the IJCAI committee. In these competitions, an agent is required to accomplish diverse sports tasks in a two-dimensional continuous world, while competing against an opponent. This paper provides a brief overview of the competition series and highlights notable findings. We aim to contribute insights to the field of multi-agent decision-making and explore the generalization of agents through engineering efforts.

AAAI Conference 2024 Conference Paper

Collaborative Tooth Motion Diffusion Model in Digital Orthodontics

  • Yeying Fan
  • Guangshun Wei
  • Chen Wang
  • Shaojie Zhuang
  • Wenping Wang
  • Yuanfeng Zhou

Tooth motion generation is an essential task in digital orthodontic treatment for precise and quick dental healthcare, which aims to generate the whole intermediate tooth motion process given the initial pathological and target ideal tooth alignments. Most prior works for multi-agent motion planning problems usually result in complex solutions. Moreover, the occlusal relationship between upper and lower teeth is often overlooked. In this paper, we propose a collaborative tooth motion diffusion model. The critical insight is to remodel the problem as a diffusion process. In this sense, we model the whole tooth motion distribution with a diffusion model and transform the planning problem into a sampling process from this distribution. We design a tooth latent representation to provide accurate conditional guides consisting of two key components: the tooth frame represents the position and posture, and the tooth latent shape code represents the geometric morphology. Subsequently, we present a collaborative diffusion model to learn the multi-tooth motion distribution based on inter-tooth and occlusal constraints, which are implemented by graph structure and new loss functions, respectively. Extensive qualitative and quantitative experiments demonstrate the superiority of our framework in the application of orthodontics compared with state-of-the-art methods.

TCS Journal 2024 Journal Article

Decision algorithms for reversibility of 1D cellular automata under reflective boundary conditions

  • Junchi Ma
  • Chen Wang
  • Weilin Chen
  • Defu Lin
  • Chao Wang

Reversibility is one of the most significant properties of cellular automata (CA). In this paper, we focus on the reversibility of one-dimensional finite CA under reflective boundary conditions (RBC). We present two algorithms for deciding the reversibility of one-dimensional CA under RBC. Both algorithms work for not only linear rules but also non-linear rules. The first algorithm is to determine what we call the “strict reversibility” of CA. The second algorithm is to compute what we call the “reversibility function” of CA. Reversibility functions are proved to be periodic. Based on the algorithms, we list some experiment results of one-dimensional CA under RBC and analyse some features of this family of CA.

AAMAS Conference 2024 Conference Paper

Detecting Anomalous Agent Decision Sequences Based on Offline Imitation Learning

  • Chen Wang
  • Sarah Erfani
  • Tansu Alpcan
  • Christopher Leckie

Anomaly detection in decision-making sequences is a challenging problem due to the complexity of normality representation learning, the sequential nature of the task and the difficulty of realworld implementation. In this work, we propose extracting two behaviour features: action optimality and sequential association to detect anomalous behaviour. Our offline imitation learning model is an adaptation of behavioural cloning with a transformer policy network, where we modify the training process to learn a Q function and a state value function from normal trajectories.

NeurIPS Conference 2024 Conference Paper

LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation

  • Bowen Li
  • Zhaoyu Li
  • Qiwei Du
  • Jinqi Luo
  • Wenshan Wang
  • Yaqi Xie
  • Simon Stepputtis
  • Chen Wang

Recent years have witnessed the rapid development of Neuro-Symbolic (NeSy) AI systems, which integrate symbolic reasoning into deep neural networks. However, most of the existing benchmarks for NeSy AI fail to provide long-horizon reasoning tasks with complex multi-agent interactions. Furthermore, they are usually constrained by fixed and simplistic logical rules over limited entities, making them far from real-world complexities. To address these crucial gaps, we introduce LogiCity, the first simulator based on customizable first-order logic (FOL) for an urban-like environment with multiple dynamic agents. LogiCity models diverse urban elements using semantic and spatial concepts, such as $\texttt{IsAmbulance}(\texttt{X})$ and $\texttt{IsClose}(\texttt{X}, \texttt{Y})$. These concepts are used to define FOL rules that govern the behavior of various agents. Since the concepts and rules are abstractions, they can be universally applied to cities with any agent compositions, facilitating the instantiation of diverse scenarios. Besides, a key feature of LogiCity is its support for user-configurable abstractions, enabling customizable simulation complexities for logical reasoning. To explore various aspects of NeSy AI, LogiCity introduces two tasks, one features long-horizon sequential decision-making, and the other focuses on one-step visual reasoning, varying in difficulty and agent behaviors. Our extensive evaluation reveals the advantage of NeSy frameworks in abstract reasoning. Moreover, we highlight the significant challenges of handling more complex abstractions in long-horizon multi-agent scenarios or under high-dimensional, imbalanced data. With its flexible design, various features, and newly raised challenges, we believe LogiCity represents a pivotal step forward in advancing the next generation of NeSy AI. All the code and data are open-sourced at our website.

NeurIPS Conference 2024 Conference Paper

Map It Anywhere: Empowering BEV Map Prediction using Large-scale Public Datasets

  • Cherie Ho
  • Jiaye Zou
  • Omar Alama
  • Sai M. Kumar
  • Benjamin Chiang
  • Taneesh Gupta
  • Chen Wang
  • Nikhil Keetha

Top-down Bird's Eye View (BEV) maps are a popular perception representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more scalable approach towards generalizable map prediction can be enabled by using two large-scale crowd-sourced mapping platforms, Mapillary for FPV images and OpenStreetMap for BEV semantic maps. We introduce Map It Anywhere (MIA), a data engine that enables seamless curation and modeling of labeled map prediction data from existing open-source map platforms. Using our MIA data engine, we display the ease of automatically collecting a 1. 2 million FPV & BEV pair dataset encompassing diverse geographies, landscapes, environmental factors, camera models & capture scenarios. We further train a simple camera model-agnostic model on this data for BEV map prediction. Extensive evaluations using established benchmarks and our dataset show that the data curated by MIA enables effective pretraining for generalizable BEV map prediction, with zero-shot performance far exceeding baselines trained on existing datasets by 35%. Our analysis highlights the promise of using large-scale public maps for developing & testing generalizable BEV perception, paving the way for more robust autonomous navigation. Website: mapitanywhere. github. io

ICRA Conference 2024 Conference Paper

Real-Time Estimation for the Swimming Direction of Robotic Fish Based on IMU Sensors

  • Shikun Li
  • Yufan Zhai
  • Chen Wang
  • Guangming Xie

An increasing number of underwater robots inspired by Carangidae are developed, which is characterized by high efficiency and flexibility. However, estimating the swimming direction of these robotic fish is challenging due to the constant swinging of the head during movement, which complicates precise control. In this study, we installed two low-cost inertial measurement unit (IMU) sensors separately on the head and tail parts of a double-joint robotic fish and presented a method for accurately and timely estimating the swimming direction. Firstly, we effectively compensated for the yaw angle drift of the IMU sensors through a fused Kalman Filter. Furthermore, we propose the Anti-Shake Estimation (ASE) algorithm to calculate the real-time swimming direction using filtered yaw angles at a high updating rate of 100Hz. Finally, we applied the method to swimming direction feedback control for evaluation and comparison. The results show that our ASE method performs better than other existing methods in straight-line swimming experiments. The experiment of S-curve swimming also demonstrates the effectiveness of our method in complex missions.

EAAI Journal 2024 Journal Article

Reference-based super-resolution reconstruction of remote sensing images based on a coarse-to-fine feature matching transformer

  • Chen Wang
  • Fuzhen Zhu
  • Bing Zhu
  • Qi Zhang
  • Hongbin Ma

Remote sensing image super-resolution reconstruction technology mines the deep details of remote sensing image, which has been widely used in the fields of intelligent and precise agriculture, intelligent transportation, earth surface object recognition, and so on. To obtain more detailed information, we design a reference-based super-resolution reconstruction network. Firstly, the coarse-to-fine feature matching strategy is adopted for the features of the input image. Global coarse matching is performed on the center patch of each block, and then pixel-level local fine matching is performed on the edge patches of the block. This both reduces the amount of computation and improves the matching accuracy. A threshold is set to determine whether the feature matching results meet the criteria for feature transfer. Finally, different scale features are fused through several convolutional layers and sampling operations, obtaining the reconstruction features after a fourfold increase in resolution. The ultimate super-resolution image is generated through a decoder. We have performed training and testing on remote sensing datasets. Compared to the current state-of-the-art methods, our proposed method is visually more details and outperforms other methods in terms of objective evaluation metrics.

JBHI Journal 2024 Journal Article

Time-Frequency-Space EEG Decoding Model Based on Dense Graph Convolutional Network for Stroke

  • Jiancai Leng
  • Han Li
  • Weiyou Shi
  • Licai Gao
  • Chengyan Lv
  • Chen Wang
  • Fangzhou Xu
  • Yang Zhang

Stroke, a sudden cerebrovascular ailment resulting from brain tissue damage, has prompted the use of motor imagery (MI)-based Brain-Computer Interface (BCI) systems in stroke rehabilitation. However, analyzing EEG signals from stroke patients is challenging because of their low signal-to-noise ratio and high variability. Therefore, we propose a novel approach that combines the modified S-transform (MST) and a dense graph convolutional network (DenseGCN) algorithm to enhance the MI-BCI performance across time, frequency, and space domains. MST is a time-frequency analysis method that efficiently concentrates energy in EEG signals, while DenseGCN is a deep learning model that uses EEG feature maps from each layer as inputs for subsequent layers, facilitating feature reuse and hyper-parameters optimization. Our approach outperforms conventional networks, achieving a peak classification accuracy of 90. 22% and an average information transfer rate (ITR) of 68. 52 bits per minute. Moreover, we conduct an in-depth analysis of the event-related desynchronization/event-related synchronization (ERD/ERS) phenomenon in the deep-level EEG features of stroke patients. Our experimental results confirm the feasibility and efficacy of the proposed approach for MI-BCI rehabilitation systems.

NeurIPS Conference 2024 Conference Paper

United We Stand, Divided We Fall: Fingerprinting Deep Neural Networks via Adversarial Trajectories

  • Tianlong Xu
  • Chen Wang
  • Gaoyang Liu
  • Yang Yang
  • Kai Peng
  • Wei Liu

In recent years, deep neural networks (DNNs) have witnessed extensive applications, and protecting their intellectual property (IP) is thus crucial. As a non-invasive way for model IP protection, model fingerprinting has become popular. However, existing single-point based fingerprinting methods are highly sensitive to the changes in the decision boundary, and may suffer from the misjudgment of the resemblance of sparse fingerprinting, yielding high false positives of innocent models. In this paper, we propose ADV-TRA, a more robust fingerprinting scheme that utilizes adversarial trajectories to verify the ownership of DNN models. Benefited from the intrinsic progressively adversarial level, the trajectory is capable of tolerating greater degree of alteration in decision boundaries. We further design novel schemes to generate a surface trajectory that involves a series of fixed-length trajectories with dynamically adjusted step sizes. Such a design enables a more unique and reliable fingerprinting with relatively low querying costs. Experiments on three datasets against four types of removal attacks show that ADV-TRA exhibits superior performance in distinguishing between infringing and innocent models, outperforming the state-of-the-art comparisons.

EAAI Journal 2023 Journal Article

Ada-CCFNet: Classification of multimodal direct immunofluorescence images for membranous nephropathy via adaptive weighted confidence calibration fusion network

  • Ruili Wang
  • Xueyu Liu
  • Fang Hao
  • Xing Chen
  • Xinyu Li
  • Chen Wang
  • Dan Niu
  • Ming Li

In the pathological diagnosis of early, late and non-membranous nephropathy, direct immunofluorescence is highly likely to present potentially specific lesions, while it is often overlooked due to the difficulty of screening with naked eyes. With the advanced progress of deep learning, they have shown powerful abilities in detecting potential lesions. In this paper, we propose an adaptive weighted confidence calibration fusion framework (Ada-CCFNet) consisting of a preprocessing module, an adaptive weighted confidence calibration fusion (Ada-CCF) module and a classification module for diagnosis of membranous nephropathy by classifying the multimodal direct immunofluorescence images. In the preprocessing module, we use the well-known U-Net to segment individual glomeruli and standardize their luminance appearance by the average luminance difference method, allowing the subsequent modules to focus more on the diseased glomerular region. Subsequently, in the Ada-CCF module, six confidence calibration methods are utilized for two main direct immunofluorescence images, IgG and C3, and the comprehensive calibration scores are obtained based on the adaptive weighted fusion of six confidence calibration methods to obtain more reliable confidence level, in which the adaptive weights are related with expected calibration error reductions. For the classification module, the weighted probability scores of IgG and C3 are jointly fed into the module to achieve the classification by random forest. Experimental results showed that Ada-CCFNet achieves the classification accuracy of 73. 52%, surpassing the methods of using single IgG or C3 images and positive grade indicator with 8. 24%, 8. 94% and 22. 76%, and outperforming the compared methods in the classification of membranous nephropathy.

EAAI Journal 2023 Journal Article

Automatic waste detection with few annotated samples: Improving waste management efficiency

  • Wei Zhou
  • Lei Zhao
  • Hongpu Huang
  • Yuzhi Chen
  • Sixuan Xu
  • Chen Wang

Automatic waste detection in natural environments exhibits a great potential to improve the efficiency and reduce the labor cost of waste management. Recent deep learning-based waste detectors rely heavily on substantial annotated samples for training, but annotating sufficient samples for various categories of waste is labor-intensive and time-consuming. To address this issue, this paper simulates the visual system of human beings and develops a few-shot waste detection framework. To enable the proposed framework more suitable for waste detection, a waste proposal module using a comprehensive feature fusion manner is designed to allow the features of support images to fully interact with those of query images, guiding the framework to generate more potential region proposals containing waste. Also, a waste classification module using soft attention mechanism and foreground mask is designed to alleviate the issue of spatial misalignment and achieve the fine-grained classification towards waste-related proposals. The proposed framework is a general detection framework which can flexibly detect various categories of waste with few labeled samples (i. e. , less than 30 instances per category). Experimental results show that the proposed framework achieves a mean average precision of 31. 16% over 12 waste categories when only few samples (i. e. , 30 instances per category) are provided, surpassing a state-of-the-art few-shot detector named AFDNet by 1. 68%. This data scale-insensitive nature allows humans to reduce the effort and time required for laborious waste image collection and annotation, significantly increasing the flexibility of automatic waste detection and boosting the efficiency of waste management.

YNIMG Journal 2023 Journal Article

Cognitive impairment after sleep deprivation: The role of precuneus related connectivity on the intra-individual variability changes

  • Ziliang Xu
  • Yingjuan Chang
  • Chen Wang
  • Fan Guo
  • Minwen Zheng
  • Peng Fang
  • Yuanqiang Zhu

OBJECTIVE: Intra-individual variability (IIV) in cognitive performance is thought to reflect the efficiency with which attentional resources are allocated in different circumstances requiring cognitive control. IIV in cognitive performance is associated with the strength of the negative correlation between task-positive network and default mode network (DMN) activity. In this study, we investigated the impact of sleep deprivation (SD) on functional connectivity (FC) between the DMN and psychomotor vigilance task-related network (PVT-RN), and its relationship with IIV in cognitive performance. METHODS: Two analyses, network-level independent component analysis (NL-ICA) and region-level (RL)-ICA, were employed to compare the coefficient of variation (CV) of the PVT between normal sleep and SD conditions across 67 healthy participants. RESULTS: After SD, in NL-ICA, the FC between the PVT-RN and DMN was positively correlated with the CV of the PVT, as well as the changes therein, compared with normal sleep. Using a mask derived from the DMN and PVT-RN, the RL-ICA revealed that 12 edges/connections between DMN and PVT independent components were associated with the CV of the PVT, with nine of these connections involving the precuneus. CONCLUSIONS: These findings suggest that the precuneus may play a crucial role in the interactions of various brain functions during the PVT, with the connections between the precuneus and frontoparietal and somatosensory networks being significantly altered after SD. Moreover, following SD, weakened negative FC between the precuneus and bilateral inferior parietal lobule may disrupt the balance between cognitive and executive control functions, leading to a decline in cognitive performance.

EAAI Journal 2023 Journal Article

MIANet: Multi-level temporal information aggregation in mixed-periodicity time series forecasting tasks

  • Sheng Wang
  • Xi Chen
  • Dongliang Ma
  • Chen Wang
  • Yong Wang
  • Honggang Qi
  • Gongjian Zhou
  • Qingli Li

Regular human activities generate a large number of time series with mixed periodicity that can reflect human behavior patterns and the societal working mechanism. When forecasting these time series, nonlinear neural networks often encounter some limitations, such as utilizing mixed-periodic patterns, balancing multi-level information, incorporating future vision, forecasting delays and scale insensitivity, which affect the forecasting accuracy. To address these problems, we propose the Multi-level Information Aggregation Network (MIANet), a novel neural network with four key characteristics: (i) a novel folded recurrent structure that dynamically updates the local and mini-local information at a global range in a compact manner; (ii) a new recurrent unit called Folded Convolution Aggregation Temporal Memory (FCATM) that extracts and aggregates neighbor-trends in local and mini-local data; (iii) a fusing decoder structure that promotes the sharing of forward–backward future information and adaptively adjusts relationships among adjacent points; and (iv) a new Skip-Autoregressive (SAR) linear strategy that addresses scale sensitivity issues. The SAR can be embedded as a plug-and-play component into other deep learning (DL) models. Compared with other baseline methods, MIANet obtains statistically significant improvements on six real-world datasets, as demonstrated by conducting two-sample t-tests, indicating that the MIANet can be applied to various predictive scenarios, such as road occupancy, electricity consumption, pedestrian flow and urban noise.

NeurIPS Conference 2023 Conference Paper

Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity

  • Weichao Mao
  • Haoran Qiu
  • Chen Wang
  • Hubertus Franke
  • Zbigniew Kalbarczyk
  • Ravishankar Iyer
  • Tamer Basar

Multi-agent reinforcement learning (MARL) has primarily focused on solving a single task in isolation, while in practice the environment is often evolving, leaving many related tasks to be solved. In this paper, we investigate the benefits of meta-learning in solving multiple MARL tasks collectively. We establish the first line of theoretical results for meta-learning in a wide range of fundamental MARL settings, including learning Nash equilibria in two-player zero-sum Markov games and Markov potential games, as well as learning coarse correlated equilibria in general-sum Markov games. Under natural notions of task similarity, we show that meta-learning achieves provable sharper convergence to various game-theoretical solution concepts than learning each task separately. As an important intermediate step, we develop multiple MARL algorithms with initialization-dependent convergence guarantees. Such algorithms integrate optimistic policy mirror descents with stage-based value updates, and their refined convergence guarantees (nearly) recover the best known results even when a good initialization is unknown. To our best knowledge, such results are also new and might be of independent interest. We further provide numerical simulations to corroborate our theoretical findings.

NeurIPS Conference 2023 Conference Paper

PanoGRF: Generalizable Spherical Radiance Fields for Wide-baseline Panoramas

  • Zheng Chen
  • Yan-Pei Cao
  • Yuan-Chen Guo
  • Chen Wang
  • Ying Shan
  • Song-Hai Zhang

Achieving an immersive experience enabling users to explore virtual environments with six degrees of freedom (6DoF) is essential for various applications such as virtual reality (VR). Wide-baseline panoramas are commonly used in these applications to reduce network bandwidth and storage requirements. However, synthesizing novel views from these panoramas remains a key challenge. Although existing neural radiance field methods can produce photorealistic views under narrow-baseline and dense image captures, they tend to overfit the training views when dealing with wide-baseline panoramas due to the difficulty in learning accurate geometry from sparse $360^{\circ}$ views. To address this problem, we propose PanoGRF, Generalizable Spherical Radiance Fields for Wide-baseline Panoramas, which construct spherical radiance fields incorporating $360^{\circ}$ scene priors. Unlike generalizable radiance fields trained on perspective images, PanoGRF avoids the information loss from panorama-to-perspective conversion and directly aggregates geometry and appearance features of 3D sample points from each panoramic view based on spherical projection. Moreover, as some regions of the panorama are only visible from one view while invisible from others under wide baseline settings, PanoGRF incorporates $360^{\circ}$ monocular depth priors into spherical depth estimation to improve the geometry features. Experimental results on multiple panoramic datasets demonstrate that PanoGRF significantly outperforms state-of-the-art generalizable view synthesis methods for wide-baseline panoramas (e. g. , OmniSyn) and perspective images (e. g. , IBRNet, NeuRay).

EAAI Journal 2023 Journal Article

Physics-informed few-shot deep learning for elastoplastic constitutive relationships

  • Chen Wang
  • You-quan He
  • Hong-ming Lu
  • Jian-guo Nie
  • Jian-sheng Fan

Elastoplastic modeling is essential for accurately predicting material behavior in various engineering applications. However, existing approaches to developing intelligent models for elastoplasticity face limitations due to the scarcity of available data, particularly in cases where experiments cannot support full training of these models. To address this challenge, we propose a physics-informed few-shot learning framework that incorporates classic elastoplasticity theory as prior knowledge. Instead of directly solving specific constitutive ordinary differential equations, we utilize general mechanical inequations as physical regularization to confine the optimization space. Our framework is model-agnostic in the uniaxial scenario and free from complicated numerical implementation, preserving the advantages of the data-driven paradigm in terms of efficiency and simplicity. In the multiaxial scenario, the framework can automatically calibrate multiple interwoven material parameters of the underlying constitutive model, facilitating the generation of ample multiaxial data for downstream learning. To further improve convergence, we introduce a novel training strategy. A numerical experiments with only 32 pieces of training data validate the effectiveness of the developed framework. The trained uniaxial model exhibits exceptional generalization capabilities, achieving 44. 9% higher accuracy than the purely data-driven model. For multiaxial relationships, our framework demonstrates remarkable efficiency in calibrating material constants compared to conventional machine or manual methods, and the trained model accurately reproduces highly nonlinear multiaxial elastoplastic curves. Our work addresses the pressing need for accurate elastoplastic modeling in the absence of large datasets and provides a promising new solution that can significantly improve generalization capabilities.

NeurIPS Conference 2023 Conference Paper

Streaming Algorithms and Lower Bounds for Estimating Correlation Clustering Cost

  • Sepehr Assadi
  • Vihan Shah
  • Chen Wang

Correlation clustering is a fundamental optimization problem at the intersection of machine learning and theoretical computer science. Motivated by applications to big data processing, recent years have witnessed a flurry of results on this problem in the streaming model. In this model, the algorithm needs to process the input $n$-vertex graph by making one or few passes over the stream of its edges and using a limited memory, much smaller than the input size. All previous work on streaming correlation clustering have focused on semi-streaming algorithms with $\Omega(n)$ memory, whereas in this work, we study streaming algorithms with much smaller memory requirement of only $\text{polylog}{(n)}$ bits. This stringent memory requirement is in the same spirit of classical streaming algorithms that instead of recovering a full solution to the problem---which can be prohibitively large with such small memory as is the case in our problem---, aimed to learn certain statistical properties of their inputs. In our case, this translates to determining the ``(correlation) clusterability'' of input graphs, or more precisely, estimating the cost of the optimal correlation clustering solution. As our main result, we present two novel algorithms that in only $\text{polylog}{(n)}$ space are able to estimate the optimal correlation clustering cost up to some constant multiplicative factor plus some extra additive error. One of the algorithms outputs a $3$-multiplicative approximation plus $o(n^2)$ additive approximation, and the other one improves the additive error further down at the cost of increasing the multiplicative factor to some large constant. We then present new lower bounds that justify this mix of both multiplicative and additive error approximation in our algorithms.

NeurIPS Conference 2023 Conference Paper

VoxDet: Voxel Learning for Novel Instance Detection

  • Bowen Li
  • Jiashun Wang
  • Yaoyu Hu
  • Chen Wang
  • Sebastian Scherer

Detecting unseen instances based on multi-view templates is a challenging problem due to its open-world nature. Traditional methodologies, which primarily rely on $2 \mathrm{D}$ representations and matching techniques, are often inadequate in handling pose variations and occlusions. To solve this, we introduce VoxDet, a pioneer 3D geometry-aware framework that fully utilizes the strong 3D voxel representation and reliable voxel matching mechanism. VoxDet first ingeniously proposes template voxel aggregation (TVA) module, effectively transforming multi-view 2D images into 3D voxel features. By leveraging associated camera poses, these features are aggregated into a compact 3D template voxel. In novel instance detection, this voxel representation demonstrates heightened resilience to occlusion and pose variations. We also discover that a $3 \mathrm{D}$ reconstruction objective helps to pre-train the 2D-3D mapping in TVA. Second, to quickly align with the template voxel, VoxDet incorporates a Query Voxel Matching (QVM) module. The 2D queries are first converted into their voxel representation with the learned 2D-3D mapping. We find that since the 3D voxel representations encode the geometry, we can first estimate the relative rotation and then compare the aligned voxels, leading to improved accuracy and efficiency. In addition to method, we also introduce the first instance detection benchmark, RoboTools, where 20 unique instances are video-recorded with camera extrinsic. RoboTools also provides 24 challenging cluttered scenarios with more than $9 \mathrm{k}$ box annotations. Exhaustive experiments are conducted on the demanding LineMod-Occlusion, YCB-video, and RoboTools benchmarks, where VoxDet outperforms various $2 \mathrm{D}$ baselines remarkably with faster speed. To the best of our knowledge, VoxDet is the first to incorporate implicit 3D knowledge for 2D novel instance detection tasks.

NeurIPS Conference 2022 Conference Paper

A Mean-Field Game Approach to Cloud Resource Management with Function Approximation

  • Weichao Mao
  • Haoran Qiu
  • Chen Wang
  • Hubertus Franke
  • Zbigniew Kalbarczyk
  • Ravishankar Iyer
  • Tamer Basar

Reinforcement learning (RL) has gained increasing popularity for resource management in cloud services such as serverless computing. As self-interested users compete for shared resources in a cluster, the multi-tenancy nature of serverless platforms necessitates multi-agent reinforcement learning (MARL) solutions, which often suffer from severe scalability issues. In this paper, we propose a mean-field game (MFG) approach to cloud resource management that is scalable to a large number of users and applications and incorporates function approximation to deal with the large state-action spaces in real-world serverless platforms. Specifically, we present an online natural actor-critic algorithm for learning in MFGs compatible with various forms of function approximation. We theoretically establish its finite-time convergence to the regularized Nash equilibrium under linear function approximation and softmax parameterization. We further implement our algorithm using both linear and neural-network function approximations, and evaluate our solution on an open-source serverless platform, OpenWhisk, with real-world workloads from production traces. Experimental results demonstrate that our approach is scalable to a large number of users and significantly outperforms various baselines in terms of function latency and resource utilization efficiency.

JBHI Journal 2022 Journal Article

DPProm: A Two-Layer Predictor for Identifying Promoters and Their Types on Phage Genome Using Deep Learning

  • Chen Wang
  • Junyin Zhang
  • Li Cheng
  • Jiawei Wu
  • Minfeng Xiao
  • Junfeng Xia
  • Yannan Bin

With the number of phage genomes increasing, it is urgent to develop new bioinformatics methods for phage genome annotation. Promoter, a DNA region, is important for gene transcriptional regulation. In the era of post-genomics, the availability of data makes it possible to establish computational models for promoter identification with robustness. In this work, we introduce DPProm, a two-layer model composed of DPProm-1L and DPProm-2L, to predict promoters and their types for phages. On the first layer, as a dual-channel deep neural network ensemble method fusing multi-view features (sequence feature and handcrafted feature), the model DPProm-1L is proposed to identify whether a DNA sequence is a promoter or non-promoter. The sequence feature is extracted with convolutional neural network (CNN). And the handcrafted feature is the combination of free energy, GC content, cumulative skew, and Z curve features. On the second layer, DPProm-2L based on CNN is trained to predict the promoters' types (host or phage). For the realization of prediction on the whole genomes, the model DPProm, combines with a novel sequence data processing workflow, which contains sliding window and merging sequences modules. Experimental results show that DPProm outperforms the state-of-the-art methods, and decreases the false positive rate effectively on whole genome prediction. Furthermore, we provide a user-friendly web at http://bioinfo.ahu.edu.cn/DPProm.We expect that DPProm can serve as a useful tool for identification of promoters and their types.

JBHI Journal 2022 Journal Article

Interpreting Depression From Question-Wise Long-Term Video Recording of SDS Evaluation

  • Wanqing Xie
  • Lizhong Liang
  • Yao Lu
  • Chen Wang
  • Jihong Shen
  • Hui Luo
  • Xiaofeng Liu

Self-Rating Depression Scale (SDS) questionnaire has frequently been used for efficient depression preliminary screening. However, the uncontrollable self-administered measure can be easily affected by insouciantly or deceptively answering, and producing the different results with the clinician-administered Hamilton Depression Rating Scale (HDRS) and the final diagnosis. Clinically, facial expression (FE) and actions play a vital role in clinician-administered evaluation, while FE and action are underexplored for self-administered evaluations. In this work, we collect a novel dataset of 200 subjects to evidence the validity of self-rating questionnaires with their corresponding question-wise video recording. To automatically interpret depression from the SDS evaluation and the paired video, we propose an end-to-end hierarchical framework for the long-term variable-length video, which is also conditioned on the questionnaire results and the answering time. Specifically, we resort to a hierarchical model which utilizes a 3D CNN for local temporal pattern exploration and a redundancy-aware self-attention (RAS) scheme for question-wise global feature aggregation. Targeting for the redundant long-term FE video processing, our RAS is able to effectively exploit the correlations of each video clip within a question set to emphasize the discriminative information and eliminate the redundancy based on feature pair-wise affinity. Then, the question-wise video feature is concatenated with the questionnaire scores for final depression detection. Our thorough evaluations also show the validity of fusing SDS evaluation and its video recording, and the superiority of our framework to the conventional state-of-the-art temporal modeling methods.

NeurIPS Conference 2022 Conference Paper

Single-pass Streaming Lower Bounds for Multi-armed Bandits Exploration with Instance-sensitive Sample Complexity

  • Sepehr Assadi
  • Chen Wang

Motivated by applications to process massive datasets, we study streaming algorithms for pure exploration in Stochastic Multi-Armed Bandits (MABs). This problem was first formulated by Assadi and Wang [STOC 2020] as follows: A collection of $n$ arms with unknown rewards are arriving one by one in a stream, and the algorithm is only allowed to store a limited number of arms at any point. The goal is to find the arm with the largest reward while minimizing the number of arm pulls (sample complexity) and the maximum number of stored arms (space complexity). Assuming $\Delta_{[2]}$ is known, Assadi and Wang designed an algorithm that uses a memory of just one arm and still achieves the sample complexity of $O(n/\Delta_{[2]}^2)$ which is worst-case optimal even for non-streaming algorithms; here $\Delta_{[i]}$ is the gap between the rewards of the best and the $i$-th best arms. In this paper, we extended this line of work to stochastic MABs in the streaming model with the instance-sensitive sample complexity, i. e. the sample complexity of $O(\sum_{i=2}^{n} \frac{1}{\Delta_{[i]}^2}\log\log{(\frac{1}{\Delta_{[i]}})})$, similar in spirit to Karnin et. al. [ICML 2013] and Jamieson et. al. [COLT 2014] in the classical setting. We devise strong negative results under this setting: our results show that any streaming algorithm under a single pass has to use either asymptotically higher sample complexity than the instance-sensitive bound, or a memory of $\Omega(n)$ arms, even if the parameter $\Delta_{[2]}$ is known. In fact, the lower bound holds under much stronger assumptions, including the random order streams or the knowledge of all gap parameters $\{\Delta_{[i]}\}_{i=2}^n$. We complement our lower bounds by proposing a new algorithm that uses a memory of a single arm and achieves the instance-optimal sample complexity when all the strong assumptions hold simultaneously. Our results are developed based on a novel arm-trapping lemma. This generic complexity result shows that any algorithm to trap the index of the best arm among $o(n)$ indices (but not necessarily to find it) has to use $\Theta(n/\Delta_{[2]}^2)$ sample complexity. This result is not restricted to the streaming setting, and to the best of our knowledge, this is the first result that captures the sample-space trade-off for `trapping' arms in multi-armed bandits, and it can be of independent interest.

EAAI Journal 2021 Journal Article

A comprehensive survey on 2D multi-person pose estimation methods

  • Chen Wang
  • Feng Zhang
  • Shuzhi Sam Ge

Human pose estimation is a fundamental yet challenging computer vision task and studied by many researchers around the world in recent years. As a basic task in computer vision, multi-person pose estimation is the core component for many practical applications. This paper extensively reviews recent works on multi-person pose estimation. Specifically, we illustrate and analyze popular methods in detail and compare their pros and cons to fill in the gaps existing in other surveys. In addition, the commonly used datasets, evaluation metrics, and open-source systems are also introduced respectively. Finally, we summarize the development of multi-person pose estimation frameworks and discuss the research trends.

IROS Conference 2020 Conference Paper

Autonomous Obstacle Avoidance for UAV based on Fusion of Radar and Monocular Camera

  • Hang Yu
  • Fan Zhang 0031
  • Panfeng Huang
  • Chen Wang
  • Yuanhao Li 0002

UAVs face many challenges in autonomous obstacle avoidance in large outdoor scenarios, specifically the long communication distance from ground stations. The computing power of onboard computers is limited, and the unknown obstacles cannot be accurately detected. In this paper, an autonomous obstacle avoidance scheme based on the fusion of millimeter wave radar and monocular camera is proposed. The visual detection is designed to detect unknown obstacles which is more robust than traditional algorithms. Then extended Kalman filter (EKF) data fusion is used to build exact real 3D coordinates of the obstacles. Finally, an efficient path planning algorithm is used to obtain the path to avoid obstacles. Based on the theoretical design, an experimental platform is built to verify the UAV autonomous obstacle avoidance scheme proposed in this paper. The experiment results show the proposed scheme cannot only detect different kinds of unknown obstacles, but can also take up very little computing resources to run on an onboard computer. The outdoor flight experiment shows the feasibility of the proposed scheme.

NeurIPS Conference 2020 Conference Paper

Supervised Contrastive Learning

  • Prannay Khosla
  • Piotr Teterwak
  • Chen Wang
  • Aaron Sarna
  • Yonglong Tian
  • Phillip Isola
  • Aaron Maschinot
  • Ce Liu

Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models. Modern batch contrastive approaches subsume or significantly outperform traditional contrastive losses such as triplet, max-margin and the N-pairs loss. In this work, we extend the self-supervised batch contrastive approach to the fully-supervised setting, allowing us to effectively leverage label information. Clusters of points belonging to the same class are pulled together in embedding space, while simultaneously pushing apart clusters of samples from different classes. We analyze two possible versions of the supervised contrastive (SupCon) loss, identifying the best-performing formulation of the loss. On ResNet-200, we achieve top-1 accuracy of 81. 4% on the ImageNet dataset, which is 0. 8% above the best number reported for this architecture. We show consistent outperformance over cross-entropy on other datasets and two ResNet variants. The loss shows benefits for robustness to natural corruptions, and is more stable to hyperparameter settings such as optimizers and data augmentations. In reduced data settings, it outperforms cross-entropy significantly. Our loss function is simple to implement and reference TensorFlow code is released at https: //t. ly/supcon.

IROS Conference 2020 Conference Paper

SwingBot: Learning Physical Features from In-hand Tactile Exploration for Dynamic Swing-up Manipulation

  • Chen Wang
  • Shaoxiong Wang
  • Branden Romero
  • Filipe Veiga
  • Edward H. Adelson

Several robot manipulation tasks are extremely sensitive to variations of the physical properties of the manipulated objects. One such task is manipulating objects by using gravity or arm accelerations, increasing the importance of mass, center of mass, and friction information. We present SwingBot, a robot that is able to learn the physical features of an held object through tactile exploration. Two exploration actions (tilting and shaking) provide the tactile information used to create a physical feature embedding space. With this embedding, SwingBot is able to predict the swing angle achieved by a robot performing dynamic swing-up manipulations on a previously unseen object. Using these predictions, it is able to search for the optimal control parameters for a desired swing-up angle. We show that with the learned physical features our end-to-end self-supervised learning pipeline is able to substantially improve the accuracy of swinging up unseen objects. We also show that objects with similar dynamics are closer to each other on the embedding space and that the embedding can be disentangled into values of specific physical properties.

IJCAI Conference 2019 Conference Paper

A Compliance Checking Framework for DNN Models

  • Sunny Verma
  • Chen Wang
  • Liming Zhu
  • Wei Liu

Growing awareness towards ethical use of machine learning (ML) models has created a surge for the development of fair models. Existing work in this regard assumes the presence of sensitive attributes in the data and hence can build classifiers whose decisions remain agnostic to such attributes. However, in the real world settings, the end-user of the ML model is unaware of the training data; besides, building custom models is not always feasible. Moreover, utilizing a pre-trained model with high accuracy on certain dataset can not be assumed to be fair. Unknown biases in the training data are the true culprit for unfair models (i. e. , disparate performance for groups in the dataset). In this preliminary research, we propose a different lens for building fair models by enabling the user with tools to discover blind spots and biases in a pre-trained model and augment them with corrective measures.

IJCAI Conference 2019 Conference Paper

Adversarial Examples for Graph Data: Deep Insights into Attack and Defense

  • Huijun Wu
  • Chen Wang
  • Yuriy Tyshetskiy
  • Andrew Docherty
  • Kai Lu
  • Liming Zhu

Graph deep learning models, such as graph convolutional networks (GCN) achieve state-of-the-art performance for tasks on graph data. However, similar to other deep learning models, graph deep learning models are susceptible to adversarial attacks. However, compared with non-graph data the discrete nature of the graph connections and features provide unique challenges and opportunities for adversarial attacks and defenses. In this paper, we propose techniques for both an adversarial attack and a defense against adversarial attacks. Firstly, we show that the problem of discrete graph connections and the discrete features of common datasets can be handled by using the integrated gradient technique that accurately determines the effect of changing selected features or edges while still benefiting from parallel computations. In addition, we show that an adversarially manipulated graph using a targeted attack statistically differs from un-manipulated graphs. Based on this observation, we propose a defense approach which can detect and recover a potential adversarial perturbation. Our experiments on a number of datasets show the effectiveness of the proposed techniques.

IJCAI Conference 2019 Conference Paper

DeepCU: Integrating both Common and Unique Latent Information for Multimodal Sentiment Analysis

  • Sunny Verma
  • Chen Wang
  • Liming Zhu
  • Wei Liu

Multimodal sentiment analysis combines information available from visual, textual, and acoustic representations for sentiment prediction. The recent multimodal fusion schemes combine multiple modalities as a tensor and obtain either; the common information by utilizing neural networks, or the unique information by modeling low-rank representation of the tensor. However, both of these information are essential as they render inter-modal and intra-modal relationships of the data. In this research, we first propose a novel deep architecture to extract the common information from the multi-mode representations. Furthermore, we propose unique networks to obtain the modality-specific information that enhances the generalization performance of our multimodal system. Finally, we integrate these two aspects of information via a fusion layer and propose a novel multimodal data fusion architecture, which we call DeepCU (Deep network with both Common and Unique latent information). The proposed DeepCU consolidates the two networks for joint utilization and discovery of all-important latent information. Comprehensive experiments are conducted to demonstrate the effectiveness of utilizing both common and unique information discovered by DeepCU on multiple real-world datasets. The source code of proposed DeepCU is available at https: //github. com/sverma88/DeepCU-IJCAI19.

AAAI Conference 2018 Short Paper

Discriminative Semi-Supervised Feature Selection via Rescaled Least Squares Regression-Supplement

  • Guowen Yuan
  • Xiaojun Chen
  • Chen Wang
  • Feiping Nie
  • Liping Jing

In this paper, we propose a Discriminative Semi-Supervised Feature Selection (DSSFS) method. In this method, a dragging technique is introduced to the Rescaled Linear Square Regression in order to enlarge the distances between different classes. An iterative method is proposed to simultaneously learn the regression coefficients, -draggings matrix and predicting the unknown class labels. Experimental results show the superiority of DSSFS.

TIST Journal 2018 Journal Article

illiad

  • Nikhil Muralidhar
  • Chen Wang
  • Nathan Self
  • Marjan Momtazpour
  • Kiyoshi Nakayama
  • Ratnesh Sharma
  • Naren Ramakrishnan

Cyber-physical systems (CPSs) are today ubiquitous in urban environments. Such systems now serve as the backbone to numerous critical infrastructure applications, from smart grids to IoT installations. Scalable and seamless operation of such CPSs requires sophisticated tools for monitoring the time series progression of the system, dynamically tracking relationships, and issuing alerts about anomalies to operators. We present an online monitoring system ( illiad ) that models the state of the CPS as a function of its relationships between constituent components, using a combination of model-based and data-driven strategies. In addition to accurate inference for state estimation and anomaly tracking, illiad also exploits the underlying network structure of the CPS (wired or wireless) for state estimation purposes. We demonstrate the application of illiad to two diverse settings: a wireless sensor motes application and an IEEE 33-bus microgrid.

AAAI Conference 2018 Conference Paper

Kernel Cross-Correlator

  • Chen Wang
  • Le Zhang
  • Lihua Xie
  • Junsong Yuan

Cross-correlator plays a significant role in many visual perception tasks, such as object detection and tracking. Beyond the linear cross-correlator, this paper proposes a kernel crosscorrelator (KCC) that breaks traditional limitations. First, by introducing the kernel trick, the KCC extends the linear crosscorrelation to non-linear space, which is more robust to signal noises and distortions. Second, the connection to the existing works shows that KCC provides a unified solution for correlation filters. Third, KCC is applicable to any kernel function and is not limited to circulant structure on training data, thus it is able to predict affine transformations with customized properties. Last, by leveraging the fast Fourier transform (FFT), KCC eliminates direct calculation of kernel vectors, thus achieves better performance yet still with a reasonable computational cost. Comprehensive experiments on visual tracking and human activity recognition using wearable devices demonstrate its robustness, flexibility, and efficiency. The source codes of both experiments are released at https: //github. com/wang-chen/KCC.

AAAI Conference 2017 Short Paper

Extracting Highly Effective Features for Supervised Learning via Simultaneous Tensor Factorization

  • Sunny Verma
  • Wei Liu
  • Chen Wang
  • Liming Zhu

Real world data is usually generated over multiple time periods associated with multiple labels, which can be represented as multiple labeled tensor sequences. These sequences are linked together, sharing some common features while exhibiting their own unique features. Conventional tensor factorization techniques are limited to extract either common or unique features, but not both simultaneously. However, both types of these features are important in many machine learning systems as they inherently affect the systems’ performance. In this paper, we propose a novel supervised tensor factorization technique which simultaneously extracts ordered common and unique features. Classification results using features extracted by our method on CIFAR-10 database achieves significantly better performance over other factorization methods, illustrating the effectiveness of the proposed technique.

EAAI Journal 2016 Journal Article

Novel continuous function prediction model using an improved Takagi–Sugeno fuzzy rule and its application based on chaotic time series

  • Feng Guo
  • Lin Lin
  • Chen Wang

A novel continuous function prediction model (CFPM) is proposed to resolve prediction problem whose input and output are both continuous functions (CFs). CFPM can simplify sample space reconstruction by using the coefficients of CFs, and use an improved Takagi–Sugeno (TS) fuzzy rule to predict output CF by optimizing the tendency of input CFs. The improved TS fuzzy rule handles each input CF as a consequent parameter and can obtain the nonlinear tendency. After learning process by using opinion-leader-based particle swarm optimization, output CF is determined. In the data prediction based on chaotic time series, CF can either be obtained directly or be fitted by discrete data points, thus the prediction range is enlarged because more discrete data points can be generated once output CF is determined. Two experiments and three cases based on chaotic time series are performed to validate CFPM. The Mackey–Glass chaotic time series is used to prove CFPM validation, while the NN3 time series is used to evaluate CFPM performance. The cases on exhaust gas temperature (EGT), EGT margin and delta EGT are used to show that CFPM is valuable in health status prediction for a particular aircraft engine in the practical engineering field.

ICRA Conference 2014 Conference Paper

A system for automated counting of fetal and maternal red blood cells in clinical KB test

  • Ji Ge
  • Zheng Gong
  • J. Chen
  • Jun Liu 0007
  • J. Nguyen
  • Z. Y. Yang
  • Chen Wang
  • Yu Sun 0001

The Kleihauer-Betke test (KBT) is a widely used method for measuring fetal-maternal hemorrhage (FMH) in maternal care. In hospitals, KBT is performed by a certified technologist to count a minimum of 2, 000 fetal and maternal red blood cells (RBCs) on a blood smear. Manual counting is inherently inconsistent and subjective. This paper presents a system for automated counting and distinguishing fetal and maternal RBCs on clinical KB slides. A custom-adapted hardware platform is used for KB slide scanning and image capturing. Spatial-color pixel classification with spectral clustering is proposed to separate overlapping cells. Optimal clustering number and total cell number are obtained through maximizing cluster validity index. To accurately identify fetal RBCs from maternal RBCs, multiple features including cell size, shape, gradient and saturation difference are used in supervised learning to generate feature vectors, to tackle cell color, shape and contrast variations across clinical KB slides. The results show that the automated system is capable of completing the counting of over 60, 000 cells (vs. 2, 000 by technologists) within 5 minutes (vs. 15 minutes by technologists). The counting results are highly accurate and correlate strongly with those from benchmarking flow cytometry measurement.

TCS Journal 2008 Journal Article

On approximate optimal dual power assignment for biconnectivity and edge-biconnectivity

  • Chen Wang
  • Myung-Ah Park
  • James Willson
  • Yongxi Cheng
  • Andras Farago
  • Weili Wu

Topology control is one of the major approaches to achieve energy efficiency as well as fault tolerance in wireless networks. In this paper, we study the dual power assignment problem for 2-edge connectivity and 2-vertex connectivity in the symmetric graphical model. The problem has arisen from the following practical origin. In a wireless ad hoc network where each node can switch its transmission power between high-level and low-level, how can we establish a fault-tolerant connected network topology in the most energy-efficient way? Specifically, the objective is to minimize the number of nodes assigned with high power and yet achieve 2-edge connectivity or 2-vertex connectivity. Note that to achieve a minimum number of high-power nodes is harder than an optimization problem in the same model whose objective is to minimize the total power cost. We first address these two optimization problems (2-edge connectivity and 2-vertex connectivity version) under the general graph model. Due to the NP-hardness, we propose an approximation algorithm, called prioritized edge selection algorithm, which achieves a 4-ratio approximation for 2-edge connectivity. After that, we modify the algorithm to solve the problem for 2-vertex connectivity and also achieve the same approximation ratio. We also show that the 4-ratio is tight for our algorithms in both cases.