Arrow Research search

Author name cluster

Jian Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

54 papers
2 author rows

Possible papers

54

AAAI Conference 2026 Conference Paper

LAMDAS: LLM as an Implicit Classifier for Domain-specific Data Selection

  • Jian Wu
  • Hang Yu
  • Bingchang Liu
  • Yang Wenjie
  • Peng Di
  • Jianguo Li
  • Yue Zhang

Adapting large language models (LLMs) to specific domains often faces a critical bottleneck: the scarcity of high-quality, human-curated data. While large volumes of unchecked data are readily available, indiscriminately using them for fine-tuning risks introducing noise and degrading performance. Strategic data selection is thus crucial, requiring a method that is both accurate and efficient. Existing approaches, categorized as similarity-based and direct optimization methods, struggle to simultaneously achieve these goals. In this paper, we introduce LAMDAS (LLM as an implicit classifier for domain-specific Data Selection), a novel approach that leverages the pre-trained LLM itself as an implicit classifier, thereby bypassing explicit feature engineering and computationally intensive optimization process. LAMDAS reframes data selection as a one-class classification problem, identifying candidate data that "belongs" to the target domain defined by a small reference dataset. Extensive experimental results demonstrate that LAMDAS not only exceeds the performance of full-data training using a fraction of the data but also outperforms nine state-of-the-art (SOTA) baselines under various scenarios. Furthermore, LAMDAS achieves the most compelling balance between performance gains and computational efficiency compared to all evaluated baselines.

JBHI Journal 2026 Journal Article

RetinexDA: Progressive Disentanglement Domain Adaptation for Unsupervised Cross-Modality Medical Image Segmentation

  • Yixuan Wu
  • Mingze Yin
  • Zitai Kong
  • Jintai Chen
  • Jian Wu
  • Honghao Gao
  • Hongxia Xu

Deep neural networks have achieved strong performance in medical image segmentation when the training and testing data share similar appearance characteristics. However, this assumption is rarely satisfied in practical clinical scenarios, where imaging protocols, scanner vendors, and modality physics differ substantially, resulting in severe performance degradation when the model is deployed to new environments. To address this challenge, we propose RetinexDA, a novel unsupervised domain adaptation framework that explicitly decomposes a medical image into domain-invariant structural and domain-specific appearance representations. This Retinex-inspired formulation preserves essential anatomical details while mitigating modality-dependent variations. Furthermore, we introduce Disentangled Knowledge Distillation (DKD) to ensure mutual semantic alignment between the structure–appearance decomposition in pixel space and the encoded features in latent space, strengthening fine-grained segmentation capability. In addition, a Bézier-curve domain bridging strategy is developed to generate smoothly transitioned intermediate samples across domains, improving adaptation robustness under large modality discrepancies. Extensive experiments on abdominal CT and cardiac MRI segmentation tasks demonstrate that RetinexDA surpasses state-of-the-art unsupervised domain adaptation approaches, showing strong potential for scalable and reliable clinical deployment.

NeurIPS Conference 2025 Conference Paper

3D-RAD: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks

  • Xiaotang Gai
  • Jiaxiang Liu
  • Yichen Li
  • Zijie Meng
  • Jian Wu
  • Zuozhu Liu

Medical Visual Question Answering (Med-VQA) holds significant potential for clinical decision support, yet existing efforts primarily focus on 2D imaging with limited task diversity. This paper presents 3D-RAD, a large-scale dataset designed to advance 3D Med-VQA using radiology CT scans. The 3D-RAD dataset encompasses six diverse VQA tasks: anomaly detection, image observation, medical computation, existence detection, static temporal diagnosis, and longitudinal temporal diagnosis. It supports both open- and closed-ended questions while introducing complex reasoning challenges, including computational tasks and multi-stage temporal analysis, to enable comprehensive benchmarking. Extensive evaluations demonstrate that existing vision-language models (VLMs), especially medical VLMs exhibit limited generalization, particularly in multi-temporal tasks, underscoring the challenges of real-world 3D diagnostic reasoning. To drive future advancements, we release a high-quality training set 3D-RAD-T of 136, 195 expert-aligned samples, showing that fine-tuning on this dataset could significantly enhance model performance. Our dataset and code, aiming to catalyze multimodal medical AI research and establish a robust foundation for 3D medical visual understanding, are publicly available.

ICML Conference 2025 Conference Paper

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation

  • Dongya Jia
  • Zhuo Chen 0006
  • Jiawei Chen
  • Chenpeng Du
  • Jian Wu
  • Jian Cong
  • Xiaobin Zhuang
  • Chumin Li 0002

Several recent studies have attempted to autoregressively generate continuous speech representations without discrete speech tokens by combining diffusion and autoregressive models, yet they often face challenges with excessive computational loads or suboptimal outcomes. In this work, we propose Diffusion Transformer Autoregressive Modeling (DiTAR), a patch-based autoregressive framework combining a language model with a diffusion transformer. This approach significantly enhances the efficacy of autoregressive models for continuous tokens and reduces computational demands. DiTAR utilizes a divide-and-conquer strategy for patch generation, where the language model processes aggregated patch embeddings, and the diffusion transformer subsequently generates the next patch based on the output of the language model. For inference, we propose defining temperature as the time point of introducing noise during the reverse diffusion ODE to balance diversity and determinism. We also show in the extensive scaling analysis that DiTAR has superb scalability. In zero-shot speech generation, DiTAR achieves state-of-the-art performance in robustness, speaker similarity, and naturalness.

IJCAI Conference 2025 Conference Paper

Dual-level Fuzzy Learning with Patch Guidance for Image Ordinal Regression

  • Chunlai Dong
  • Haochao Ying
  • Qibo Qiu
  • Jinhong Wang
  • Danny Chen
  • Jian Wu

Ordinal regression bridges regression and classification by assigning objects to ordered classes. While human experts rely on discriminative patch-level features for decisions, current approaches are limited by the availability of only image-level ordinal labels, overlooking fine-grained patch-level characteristics. In this paper, we propose a Dual-level Fuzzy Learning with Patch Guidance framework, named DFPG that learns precise feature-based grading boundaries from ambiguous ordinal labels, with patch-level supervision. Specifically, we propose patch-labeling and filtering strategies to enable the model to focus on patch-level features exclusively with only image-level ordinal labels available. We further design a dual-level fuzzy learning module, which leverages fuzzy logic to quantitatively capture and handle label ambiguity from both patch-wise and channel-wise perspectives. Extensive experiments on various image ordinal regression datasets demonstrate the superiority of our proposed method, further confirming its ability in distinguishing samples from difficult-to-classify categories. The code is available at https: //github. com/ZJUMAI/DFPG-ord.

IJCAI Conference 2025 Conference Paper

Modality-Fair Preference Optimization for Trustworthy MLLM Alignment

  • Songtao Jiang
  • Yan Zhang
  • Ruizhe Chen
  • Tianxiang Hu
  • Yeying Jin
  • Qinglin He
  • Yang Feng
  • Jian Wu

Multimodal large language models (MLLMs) have achieved remarkable success across various tasks. However, separate training of visual and textual encoders often results in a misalignment of the modality. Such misalignment may lead models to generate content that is absent from the input image, a phenomenon referred to as hallucination. These inaccuracies severely undermine the trustworthiness of MLLMs in real-world applications. Despite attempts to optimize text preferences to mitigate this issue, our initial investigation indicates that the trustworthiness of MLLMs remains inadequate. Specifically, these models tend to provide preferred answers even when the input image is heavily distorted. Analysis of visual token attention also indicates that the model focuses primarily on the surrounding context rather than the key object referenced in the question. These findings highlight a misalignment between the modalities, where answers inadequately leverage input images. Motivated by our findings, we propose Modality-Fair Preference Optimization (MFPO), which comprises three components: the construction of a multimodal preference dataset in which dispreferred images differ from originals solely in key regions; an image reward loss function encouraging the model to generate answers better aligned with the input images; and an easy-to-hard iterative alignment strategy to stabilize joint modality training. Extensive experiments on three trustworthiness benchmarks demonstrate that MFPO significantly enhances the trustworthiness of MLLMs. In particular, it enables the 7B models to attain trustworthiness levels on par with, or even surpass, those of the 13B, 34B, and larger models.

JBHI Journal 2025 Journal Article

Multi-Scale Spatio-Temporal Transformer-Based Imbalanced Longitudinal Learning for Glaucoma Forecasting From Irregular Time Series Images

  • Xikai Yang
  • Jian Wu
  • Xi Wang
  • Yuchen Yuan
  • Jinpeng Li
  • Guangyong Chen
  • Ning Li Wang
  • Pheng-Ann Heng

Glaucoma is one of the major eye diseases that leads to progressive optic nerve fiber damage and irreversible blindness, afflicting millions of individuals. Glaucoma forecast is a good solution to early screening and intervention of potential patients, which is helpful to prevent further deterioration of the disease. It leverages a series of historical fundus images of an eye and forecasts the likelihood of glaucoma occurrence in the future. However, the irregular sampling nature and the imbalanced class distribution are two challenges in the development of disease forecasting approaches. To this end, we introduce the Multi-scale Spatio-temporal Transformer Network (MST-former) based on the transformer architecture tailored for sequential image inputs, which can effectively learn representative semantic information from sequential images on both temporal and spatial dimensions. Specifically, we employ a multi-scale structure to extract features at various resolutions, which can largely exploit rich spatial information encoded in each image. Besides, we design a time distance matrix to scale time attention in a non-linear manner, which could effectively deal with the irregularly sampled data. Furthermore, we introduce a temperature-controlled Balanced Softmax Cross-entropy loss to address the class imbalance issue. Extensive experiments on the Sequential fundus Images for Glaucoma Forecast (SIGF) dataset demonstrate the superiority of the proposed MST-former method, achieving an AUC of 96. 6% for glaucoma forecasting. Besides, our method shows excellent generalization capability on the Alzheimer's Disease Neuroimaging Initiative (ADNI) MRI dataset, with an accuracy of 88. 2% for mild cognitive impairment and Alzheimer's disease prediction, outperforming the compared method by a large margin. A series of ablation studies further verify the contribution of our proposed components in addressing the irregular sampled and class imbalanced problems.

AAAI Conference 2025 Conference Paper

ProtCLIP: Function-Informed Protein Multi-Modal Learning

  • Hanjing Zhou
  • Mingze Yin
  • Wei Wu
  • Mingyang Li
  • Kun Fu
  • Jintai Chen
  • Jian Wu
  • Zheng Wang

Multi-modality pre-training paradigm that aligns protein sequences and biological descriptions has learned general protein representations and achieved promising performance in various downstream applications. However, these works were still unable to replicate the extraordinary success of language-supervised visual foundation models due to the ineffective usage of aligned protein-text paired data and the lack of an effective function-informed pre-training paradigm. To address these issues, this paper curates a large-scale protein-text paired dataset called ProtAnno with a property-driven sampling strategy, and introduces a novel function-informed protein pre-training paradigm. Specifically, the sampling strategy determines selecting probability based on the sample confidence and property coverage, balancing the data quality and data quantity in face of large-scale noisy data. Furthermore, motivated by significance of the protein specific functional mechanism, the proposed paradigm explicitly model protein static and dynamic functional segments by two segment-wise pre-training objectives, injecting fine-grained information in a function-informed manner. Leveraging all these innovations, we develop ProtCLIP, a multi-modality foundation model that comprehensively represents function-aware protein embeddings. On 22 different protein benchmarks within 5 types, including protein functionality classification, mutation effect prediction, cross-modal transformation, semantic similarity inference and protein-protein interaction prediction, our ProtCLIP consistently achieves SOTA performance, with remarkable improvements of 75% on average in five cross-modal transformation benchmarks, 59.9% in GO-CC and 39.7% in GO-BP protein function prediction. The experimental results verify the extraordinary potential of ProtCLIP serving as the protein multi-modality foundation model.

AAAI Conference 2025 Conference Paper

Synergy of GFlowNet and Protein Language Model Makes a Diverse Antibody Designer

  • Mingze Yin
  • Hanjing Zhou
  • Yiheng Zhu
  • Jialu Wu
  • Wei Wu
  • Mingyang Li
  • Kun Fu
  • Zheng Wang

Antibodies defend our health by binding to antigens with high specificity and potentiality, primarily relying on the Complementarity-Determining Region (CDR). Yet, current experimental methods of discovering new antibody CDRs are heavily time-consuming. Computational design could alleviate this burden; especially, protein language models have proven quite beneficial in many recent studies. However, most existing models solely focus on antibody potentiality and struggle to encapsulate the diverse range of plausible CDR candidates, limiting their effectiveness in real-world scenarios as binding is only one factor in the multitude of drug-forming criteria. In this paper, we introduce PG-AbD, a framework uniting Generative Flow Networks (GFlowNets) and pretrained Protein Language Models (PLMs) to successfully generate highly potent, diverse and novel antibody candidates. We innovatively construct a Products of Experts (PoE) composed by the global-distribution-modeling PLM and the local-distribution-modeling Potts Model to serve as the reward function of GFlowNet. The joint training paradigm is introduced, where PoE is trained by contrastive divergence with the negative samples generated by GFlowNet, and then guides GFlowNet to sample diverse antibody candidates. We evaluate PG-AbD on extensive antibody design benchmarks. It significantly outperforms existing methods in diversity (13.5% on RabDab, 31.1% on SabDab) while maintaining optimal potential and novelty. Generated antibodies are also found to form stable, regular 3D structures with their corresponding antigens, demonstrating the great potential of PG-AbD to accelerate real-world antibody discovery.

NeurIPS Conference 2025 Conference Paper

TranSUN: A Preemptive Paradigm to Eradicate Retransformation Bias Intrinsically from Regression Models in Recommender Systems

  • Jiahao Yu
  • Haozhuang Liu
  • Yeqiu Yang
  • Lu Chen
  • Jian Wu
  • Yuning Jiang
  • Bo Zheng

Regression models are crucial in recommender systems. However, retransformation bias problem has been conspicuously neglected within the community. While many works in other fields have devised effective bias correction methods, all of them are post-hoc cures externally to the model, facing practical challenges when applied to real-world recommender systems. Hence, we propose a preemptive paradigm to eradicate the bias intrinsically from the models via minor model refinement. Specifically, a novel TranSUN method is proposed with a joint bias learning manner to offer theoretically guaranteed unbiasedness under empirical superior convergence. It is further generalized into a novel generic regression model family, termed Generalized TranSUN (GTS), which not only offers more theoretical insights but also serves as a generic framework for flexibly developing various bias-free models. Comprehensive experimental results demonstrate the superiority of our methods across data from various domains, which have been successfully deployed in two real-world industrial recommendation scenarios, i. e. product and short video recommendation scenarios in Guess What You Like business domain in the homepage of Taobao App (a leading e-commerce platform with DAU > 300M), to serve the major online traffic.

IJCAI Conference 2024 Conference Paper

AI-Enhanced Virtual Reality in Medicine: A Comprehensive Survey

  • Yixuan Wu
  • Kaiyuan Hu
  • Danny Z. Chen
  • Jian Wu

With the rapid advance of computer graphics and artificial intelligence technologies, the ways we interact with the world have undergone a transformative shift. Virtual Reality (VR) technology, aided by artificial intelligence (AI), has emerged as a dominant interaction media in multiple application areas, thanks to its advantage of providing users with immersive experiences. Among those applications, medicine is considered one of the most promising areas. In this paper, we present a comprehensive examination of the burgeoning field of AI-enhanced VR applications in medical care and services. By introducing a systematic taxonomy, we meticulously classify the pertinent techniques and applications into three well-defined categories based on different phases of medical diagnosis and treatment: Visualization Enhancement, VR-related Medical Data Processing, and VR-assisted Intervention. This categorization enables a structured exploration of the diverse roles that AI-powered VR plays in the medical domain, providing a framework for a more comprehensive understanding and evaluation of these technologies. nTo our best knowledge, this work is the first systematic survey of AI-powered VR systems in medical settings, laying a foundation for future research in this interdisciplinary domain.

AAAI Conference 2024 Conference Paper

Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning

  • Yi Cheng
  • Renjun Hu
  • Haochao Ying
  • Xing Shi
  • Jian Wu
  • Wei Lin

Until recently, the question of the effective inductive bias of deep models on tabular data has remained unanswered. This paper investigates the hypothesis that arithmetic feature interaction is necessary for deep tabular learning. To test this point, we create a synthetic tabular dataset with a mild feature interaction assumption and examine a modified transformer architecture enabling arithmetical feature interactions, referred to as AMFormer. Results show that AMFormer outperforms strong counterparts in fine-grained tabular data modeling, data efficiency in training, and generalization. This is attributed to its parallel additive and multiplicative attention operators and prompt-based optimization, which facilitate the separation of tabular samples in an extended space with arithmetically-engineered features. Our extensive experiments on real-world data also validate the consistent effectiveness, efficiency, and rationale of AMFormer, suggesting it has established a strong inductive bias for deep learning on tabular data. Code is available at https://github.com/aigc-apps/AMFormer.

NeurIPS Conference 2024 Conference Paper

Bridge-IF: Learning Inverse Protein Folding with Markov Bridges

  • Yiheng Zhu
  • Jialu Wu
  • Qiuyi Li
  • Jiahuan Yan
  • Mingze Yin
  • Wei Wu
  • Mingyang Li
  • Jieping Ye

Inverse protein folding is a fundamental task in computational protein design, which aims to design protein sequences that fold into the desired backbone structures. While the development of machine learning algorithms for this task has seen significant success, the prevailing approaches, which predominantly employ a discriminative formulation, frequently encounter the error accumulation issue and often fail to capture the extensive variety of plausible sequences. To fill these gaps, we propose Bridge-IF, a generative diffusion bridge model for inverse folding, which is designed to learn the probabilistic dependency between the distributions of backbone structures and protein sequences. Specifically, we harness an expressive structure encoder to propose a discrete, informative prior derived from structures, and establish a Markov bridge to connect this prior with native sequences. During the inference stage, Bridge-IF progressively refines the prior sequence, culminating in a more plausible design. Moreover, we introduce a reparameterization perspective on Markov bridge models, from which we derive a simplified loss function that facilitates more effective training. We also modulate protein language models (PLMs) with structural conditions to precisely approximate the Markov bridge process, thereby significantly enhancing generation performance while maintaining parameter-efficient training. Extensive experiments on well-established benchmarks demonstrate that Bridge-IF predominantly surpasses existing baselines in sequence recovery and excels in the design of plausible proteins with high foldability. The code is available at https: //github. com/violet-sto/Bridge-IF.

NeurIPS Conference 2024 Conference Paper

Enhancing Semi-Supervised Learning via Representative and Diverse Sample Selection

  • Qian Shao
  • Jiangrui Kang
  • Qiyuan Chen
  • Zepeng Li
  • Hongxia Xu
  • Yiwen Cao
  • Jiajuan Liang
  • Jian Wu

Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep learning tasks, which reduces the need for human labor. Previous studies primarily focus on effectively utilising the labelled and unlabeled data to improve performance. However, we observe that how to select samples for labelling also significantly impacts performance, particularly under extremely low-budget settings. The sample selection task in SSL has been under-explored for a long time. To fill in this gap, we propose a Representative and Diverse Sample Selection approach (RDSS). By adopting a modified Frank-Wolfe algorithm to minimise a novel criterion $\alpha$-Maximum Mean Discrepancy ($\alpha$-MMD), RDSS samples a representative and diverse subset for annotation from the unlabeled data. We demonstrate that minimizing $\alpha$-MMD enhances the generalization ability of low-budget learning. Experimental results show that RDSS consistently improves the performance of several popular SSL frameworks and outperforms the state-of-the-art sample selection approaches used in Active Learning (AL) and Semi-Supervised Active Learning (SSAL), even with constrained annotation budgets. Our code is available at [RDSS](https: //github. com/YanhuiAILab/RDSS).

JBHI Journal 2024 Journal Article

Improving Needle Tip Tracking and Detection in Ultrasound-Based Navigation System Using Deep Learning-Enabled Approach

  • Hui Che
  • Jiaxin Qin
  • Yao Chen
  • Zihan Ji
  • Yibo Yan
  • Jing Yang
  • Qi Wang
  • Chaofeng Liang

Ultrasound-guided percutaneous interventions have numerous advantages over traditional techniques. Accurate needle placement in the target anatomy is crucial for successful intervention, and reliable visual information is essential to achieve this. However, previous studies have revealed several challenges, such as the variability in needle echogenicity and the common misalignment of the ultrasound beam and the needle. Advanced techniques have been developed to optimize needle visualization, including hardware-based and image-processing-based methods. This paper proposes a novel strategy of integrating ultrasound-based deep learning approaches into an optical navigation system to enhance needle visualization and improve tip positioning accuracy. Both the tracking and detection algorithms are optimized utilizing optical tracking information. The information is introduced into the tracking network to define the search patch update strategy and form a trajectory reference to correct tracking results. In the detection network, the original image is processed according to the needle insertion position and current position given by the optical localization system to locate a coarse region, and the depth-score criterion is adopted to optimize detection results. Extensive experiments demonstrate that our approach achieves promising tip tracking and detection performance with tip localization errors of 1. 11 $\pm $ 0. 59 mm and 1. 17 $\pm$ 0. 70 mm, respectively. Moreover, we establish a paired dataset consisting of ultrasound images and their corresponding spatial tip coordinates acquired from the optical tracking system and conduct real puncture experiments to verify the effectiveness of the proposed methods. Our approach significantly improves needle visualization and provides physicians with visual guidance for posture adjustment.

IJCAI Conference 2024 Conference Paper

Personalized Heart Disease Detection via ECG Digital Twin Generation

  • Yaojun Hu
  • Jintai Chen
  • Lianting Hu
  • Dantong Li
  • Jiahuan Yan
  • Haochao Ying
  • Huiying Liang
  • Jian Wu

Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ digital twins to simulate symptoms of diseases in real patients. In this paper, we present an innovative prospective learning approach for personalized heart disease detection, which generates digital twins of healthy individuals' anomalous ECGs and enhances the model sensitivity to the personalized symptoms. In our approach, a vector quantized feature separator is proposed to locate and isolate the disease symptom and normal segments in ECG signals with ECG report guidance. Thus, the ECG digital twins can simulate specific heart diseases used to train a personalized heart disease detection model. Experiments demonstrate that our approach not only excels in generating high-fidelity ECG signals but also improves personalized heart disease detection. Moreover, our approach ensures robust privacy protection, safeguarding patient data in model development. The code can be found at https: //github. com/huyjj/LAVQ-Editor.

JBHI Journal 2024 Journal Article

Polygonal Approximation Learning for Convex Object Segmentation in Biomedical Images With Bounding Box Supervision

  • Wenhao Zheng
  • Jintai Chen
  • Kai Zhang
  • Jiahuan Yan
  • Jinhong Wang
  • Yi Cheng
  • Bang Du
  • Danny Z. Chen

As a common and critical medical image analysis task, deep learning based biomedical image segmentation is hindered by the dependence on costly fine-grained annotations. To alleviate this data dependence, in this article, a novel approach, called Polygonal Approximation Learning (PAL), is proposed for convex object instance segmentation with only bounding-box supervision. The key idea behind PAL is that the detection model for convex objects already contains the necessary information for segmenting them since their convex hulls, which can be generated approximately by the intersection of bounding boxes, are equivalent to the masks representing the objects. To extract the essential information from the detection model, a repeated detection approach is employed on biomedical images where various rotation angles are applied and a dice loss with the projection of the rotated detection results is utilized as a supervised signal in training our segmentation model. In biomedical imaging tasks involving convex objects, such as nuclei instance segmentation, PAL outperforms the known models (e. g. , BoxInst) that rely solely on box supervision. Furthermore, PAL achieves comparable performance with mask-supervised models including Mask R-CNN and Cascade Mask R-CNN. Interestingly, PAL also demonstrates remarkable performance on non-convex object instance segmentation tasks, for example, surgical instrument and organ instance segmentation.

NeurIPS Conference 2023 Conference Paper

Fast Model DeBias with Machine Unlearning

  • Ruizhe Chen
  • Jianfei Yang
  • Huimin Xiong
  • Jianhong Bai
  • Tianxiang Hu
  • Jin Hao
  • Yang Feng
  • Joey Tianyi Zhou

Recent discoveries have revealed that deep neural networks might behave in a biased manner in many real-world scenarios. For instance, deep networks trained on a large-scale face recognition dataset CelebA tend to predict blonde hair for females and black hair for males. Such biases not only jeopardize the robustness of models but also perpetuate and amplify social biases, which is especially concerning for automated decision-making processes in healthcare, recruitment, etc. , as they could exacerbate unfair economic and social inequalities among different groups. Existing debiasing methods suffer from high costs in bias labeling or model re-training, while also exhibiting a deficiency in terms of elucidating the origins of biases within the model. To this respect, we propose a fast model debiasing method (FMD) which offers an efficient approach to identify, evaluate and remove biases inherent in trained models. The FMD identifies biased attributes through an explicit counterfactual concept and quantifies the influence of data samples with influence functions. Moreover, we design a machine unlearning-based strategy to efficiently and effectively remove the bias in a trained model with a small counterfactual dataset. Experiments on the Colored MNIST, CelebA, and Adult Income datasets demonstrate that our method achieves superior or competing classification accuracies compared with state-of-the-art retraining-based methods while attaining significantly fewer biases and requiring much less debiasing cost. Notably, our method requires only a small external dataset and updating a minimal amount of model parameters, without the requirement of access to training data that may be too large or unavailable in practice.

NeurIPS Conference 2023 Conference Paper

Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer

  • Zikai Xiao
  • Zihan Chen
  • Songshang Liu
  • Hualiang Wang
  • Yang Feng
  • Jin Hao
  • Joey Tianyi Zhou
  • Jian Wu

Data privacy and long-tailed distribution are the norms rather than the exception in many real-world tasks. This paper investigates a federated long-tailed learning (Fed-LT) task in which each client holds a locally heterogeneous dataset; if the datasets can be globally aggregated, they jointly exhibit a long-tailed distribution. Under such a setting, existing federated optimization and/or centralized long-tailed learning methods hardly apply due to challenges in (a) characterizing the global long-tailed distribution under privacy constraints and (b) adjusting the local learning strategy to cope with the head-tail imbalance. In response, we propose a method termed $\texttt{Fed-GraB}$, comprised of a Self-adjusting Gradient Balancer (SGB) module that re-weights clients' gradients in a closed-loop manner, based on the feedback of global long-tailed distribution evaluated by a Direct Prior Analyzer (DPA) module. Using $\texttt{Fed-GraB}$, clients can effectively alleviate the distribution drift caused by data heterogeneity during the model training process and obtain a global model with better performance on the minority classes while maintaining the performance of the majority classes. Extensive experiments demonstrate that $\texttt{Fed-GraB}$ achieves state-of-the-art performance on representative datasets such as CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist.

IJCAI Conference 2023 Conference Paper

MolHF: A Hierarchical Normalizing Flow for Molecular Graph Generation

  • Yiheng Zhu
  • Zhenqiu Ouyang
  • Ben Liao
  • Jialu Wu
  • Yixuan Wu
  • Chang-Yu Hsieh
  • Tingjun Hou
  • Jian Wu

Molecular de novo design is a critical yet challenging task in scientific fields, aiming to design novel molecular structures with desired property profiles. Significant progress has been made by resorting to generative models for graphs. However, limited attention is paid to hierarchical generative models, which can exploit the inherent hierarchical structure (with rich semantic information) of the molecular graphs and generate complex molecules of larger size that we shall demonstrate to be difficult for most existing models. The primary challenge to hierarchical generation is the non-differentiable issue caused by the generation of intermediate discrete coarsened graph structures. To sidestep this issue, we cast the tricky hierarchical generation problem over discrete spaces as the reverse process of hierarchical representation learning and propose MolHF, a new hierarchical flow-based model that generates molecular graphs in a coarse-to-fine manner. Specifically, MolHF first generates bonds through a multi-scale architecture, then generates atoms based on the coarsened graph structure at each scale. We demonstrate that MolHF achieves state-of-the-art performance in random generation and property optimization, implying its high capacity to model data distribution. Furthermore, MolHF is the first flow-based model that can be applied to model larger molecules (polymer) with more than 100 heavy atoms. The code and models are available at https: //github. com/violet-sto/MolHF.

IJCAI Conference 2023 Conference Paper

Robust Image Ordinal Regression with Controllable Image Generation

  • Yi Cheng
  • Haochao Ying
  • Renjun Hu
  • Jinhong Wang
  • Wenhao Zheng
  • Xiao Zhang
  • Danny Chen
  • Jian Wu

Image ordinal regression has been mainly studied along the line of exploiting the order of categories. However, the issues of class imbalance and category overlap that are very common in ordinal regression were largely overlooked. As a result, the performance on minority categories is often unsatisfactory. In this paper, we propose a novel framework called CIG based on controllable image generation to directly tackle these two issues. Our main idea is to generate extra training samples with specific labels near category boundaries, and the sample generation is biased toward the less-represented categories. To achieve controllable image generation, we seek to separate structural and categorical information of images based on structural similarity, categorical similarity, and reconstruction constraints. We evaluate the effectiveness of our new CIG approach in three different image ordinal regression scenarios. The results demonstrate that CIG can be flexibly integrated with off-the-shelf image encoders or ordinal regression models to achieve improvement, and further, the improvement is more significant for minority categories.

NeurIPS Conference 2023 Conference Paper

Sample-efficient Multi-objective Molecular Optimization with GFlowNets

  • Yiheng Zhu
  • Jialu Wu
  • Chaowen Hu
  • Jiahuan Yan
  • kim hsieh
  • Tingjun Hou
  • Jian Wu

Many crucial scientific problems involve designing novel molecules with desired properties, which can be formulated as a black-box optimization problem over the discrete chemical space. In practice, multiple conflicting objectives and costly evaluations (e. g. , wet-lab experiments) make the diversity of candidates paramount. Computational methods have achieved initial success but still struggle with considering diversity in both objective and search space. To fill this gap, we propose a multi-objective Bayesian optimization (MOBO) algorithm leveraging the hypernetwork-based GFlowNets (HN-GFN) as an acquisition function optimizer, with the purpose of sampling a diverse batch of candidate molecular graphs from an approximate Pareto front. Using a single preference-conditioned hypernetwork, HN-GFN learns to explore various trade-offs between objectives. We further propose a hindsight-like off-policy strategy to share high-performing molecules among different preferences in order to speed up learning for HN-GFN. We empirically illustrate that HN-GFN has adequate capacity to generalize over preferences. Moreover, experiments in various real-world MOBO settings demonstrate that our framework predominantly outperforms existing methods in terms of candidate quality and sample efficiency. The code is available at https: //github. com/violet-sto/HN-GFN.

AAAI Conference 2023 Conference Paper

T2G-FORMER: Organizing Tabular Features into Relation Graphs Promotes Heterogeneous Feature Interaction

  • Jiahuan Yan
  • Jintai Chen
  • Yixuan Wu
  • Danny Z. Chen
  • Jian Wu

Recent development of deep neural networks (DNNs) for tabular learning has largely benefited from the capability of DNNs for automatic feature interaction. However, the heterogeneity nature of tabular features makes such features relatively independent, and developing effective methods to promote tabular feature interaction still remains an open problem. In this paper, we propose a novel Graph Estimator, which automatically estimates the relations among tabular features and builds graphs by assigning edges between related features. Such relation graphs organize independent tabular features into a kind of graph data such that interaction of nodes (tabular features) can be conducted in an orderly fashion. Based on our proposed Graph Estimator, we present a bespoke Transformer network tailored for tabular learning, called T2G-Former, which processes tabular data by performing tabular feature interaction guided by the relation graphs. A specific Cross-level Readout collects salient features predicted by the layers in T2G-Former across different levels, and attains global semantics for final prediction. Comprehensive experiments show that our T2G-Former achieves superior performance among DNNs and is competitive with non-deep Gradient Boosted Decision Tree models. The code and detailed results are available at https://github.com/jyansir/t2g-former.

NeurIPS Conference 2023 Conference Paper

Towards Distribution-Agnostic Generalized Category Discovery

  • Jianhong Bai
  • Zuozhu Liu
  • Hualiang Wang
  • Ruizhe Chen
  • Lianrui Mu
  • Xiaomeng Li
  • Joey Tianyi Zhou
  • Yang Feng

Data imbalance and open-ended distribution are two intrinsic characteristics of the real visual world. Though encouraging progress has been made in tackling each challenge separately, few works dedicated to combining them towards real-world scenarios. While several previous works have focused on classifying close-set samples and detecting open-set samples during testing, it's still essential to be able to classify unknown subjects as human beings. In this paper, we formally define a more realistic task as distribution-agnostic generalized category discovery (DA-GCD): generating fine-grained predictions for both close- and open-set classes in a long-tailed open-world setting. To tackle the challenging problem, we propose a Self- Ba lanced Co -Advice co n trastive framework (BaCon), which consists of a contrastive-learning branch and a pseudo-labeling branch, working collaboratively to provide interactive supervision to resolve the DA-GCD task. In particular, the contrastive-learning branch provides reliable distribution estimation to regularize the predictions of the pseudo-labeling branch, which in turn guides contrastive learning through self-balanced knowledge transfer and a proposed novel contrastive loss. We compare BaCon with state-of-the-art methods from two closely related fields: imbalanced semi-supervised learning and generalized category discovery. The effectiveness of BaCon is demonstrated with superior performance over all baselines and comprehensive analysis across various datasets. Our code is publicly available.

AAAI Conference 2022 System Paper

A Synthetic Prediction Market for Estimating Confidence in Published Work

  • Sarah Rajtmajer
  • Christopher Griffin
  • Jian Wu
  • Robert Fraleigh
  • Laxmaan Balaji
  • Anna Squicciarini
  • Anthony Kwasnica
  • David Pennock

Explainably estimating confidence in published scholarly work offers opportunity for faster and more robust scientific progress. We develop a synthetic prediction market to assess the credibility of published claims in the social and behavioral sciences literature. We demonstrate our system and detail our findings using a collection of known replication projects. We suggest that this work lays the foundation for a research agenda that creatively uses AI for peer review.

AAAI Conference 2022 Conference Paper

DANets: Deep Abstract Networks for Tabular Data Classification and Regression

  • Jintai Chen
  • Kuanlun Liao
  • Yao Wan
  • Danny Z. Chen
  • Jian Wu

Tabular data are ubiquitous in real world applications. Although many commonly-used neural components (e. g. , convolution) and extensible neural networks (e. g. , ResNet) have been developed by the machine learning community, few of them were effective for tabular data and few designs were adequately tailored for tabular data structures. In this paper, we propose a novel and flexible neural component for tabular data, called Abstract Layer (ABSTLAY), which learns to explicitly group correlative input features and generate higherlevel features for semantics abstraction. Also, we design a structure re-parameterization method to compress the trained ABSTLAY, thus reducing the computational complexity by a clear margin in the reference phase. A special basic block is built using ABSTLAYs, and we construct a family of Deep Abstract Networks (DANETs) for tabular data classification and regression by stacking such blocks. In DANETs, a special shortcut path is introduced to fetch information from raw tabular features, assisting feature interactions across different levels. Comprehensive experiments on seven real-world tabular datasets show that our ABSTLAY and DANETs are effective for tabular data classification and regression, and the computational complexity is superior to competitive methods. Besides, we evaluate the performance gains of DANET as it goes deep, verifying the extendibility of our method. Our code is available at https: //github. com/WhatAShot/DANet.

JBHI Journal 2022 Journal Article

Discriminative Cervical Lesion Detection in Colposcopic Images With Global Class Activation and Local Bin Excitation

  • Tingting Chen
  • Xuechen Liu
  • Ruiwei Feng
  • Wenzhe Wang
  • Chunnv Yuan
  • Weiguo Lu
  • Haizhen He
  • Honghao Gao

Accurate cervical lesion detection (CLD) methods using colposcopic images are highly demanded in computer-aided diagnosis (CAD) for automatic diagnosis of High-grade Squamous Intraepithelial Lesions (HSIL). However, compared to natural scene images, the specific characteristics of colposcopic images, such as low contrast, visual similarity, and ambiguous lesion boundaries, pose difficulties to accurately locating HSIL regions and also significantly impede the performance improvement of existing CLD approaches. To tackle these difficulties and better capture cervical lesions, we develop novel feature enhancing mechanisms from both global and local perspectives, and propose a new discriminative CLD framework, called CervixNet, with a Global Class Activation (GCA) module and a Local Bin Excitation (LBE) module. Specifically, the GCA module learns discriminative features by introducing an auxiliary classifier, and guides our model to focus on HSIL regions while ignoring noisy regions. It globally facilitates the feature extraction process and helps boost feature discriminability. Further, our LBE module excites lesion features in a local manner, and allows the lesion regions to be more fine-grained enhanced by explicitly modelling the inter-dependencies among bins of proposal feature. Extensive experiments on a number of 9888 clinical colposcopic images verify the superiority of our method (AP $_{. 75}$ = 20. 45) over state-of-the-art models on four widely used metrics.

IJCAI Conference 2022 Conference Paper

Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation

  • Zui Chen
  • Yansen Jing
  • Shengcheng Yuan
  • Yifei Xu
  • Jian Wu
  • Hang Zhao

Synthesizer is a type of electronic musical instrument that is now widely used in modern music production and sound design. Each parameters configuration of a synthesizer produces a unique timbre and can be viewed as a unique instrument. The problem of estimating a set of parameters configuration that best restore a sound timbre is an important yet complicated problem, i. e. : the synthesizer parameters estimation problem. We proposed a multi-modal deep-learning-based pipeline Sound2Synth, together with a network structure Prime-Dilated Convolution (PDC) specially designed to solve this problem. Our method achieved not only SOTA but also the first real-world applicable results on Dexed synthesizer, a popular FM synthesizer.

JBHI Journal 2021 Journal Article

A Deep Learning Approach for Colonoscopy Pathology WSI Analysis: Accurate Segmentation and Classification

  • Ruiwei Feng
  • Xuechen Liu
  • Jintai Chen
  • Danny Z. Chen
  • Honghao Gao
  • Jian Wu

Colorectal cancer (CRC) is one of the most life-threatening malignancies. Colonoscopy pathology examination can identify cells of early-stage colon tumors in small tissue image slices. But, such examination is time-consuming and exhausting on high resolution images. In this paper, we present a new framework for colonoscopy pathology whole slide image (WSI) analysis, including lesion segmentation and tissue diagnosis. Our framework contains an improved U-shape network with a VGG net as backbone, and two schemes for training and inference, respectively (the training scheme and inference scheme). Based on the characteristics of colonoscopy pathology WSI, we introduce a specific sampling strategy for sample selection and a transfer learning strategy for model training in our training scheme. Besides, we propose a specific loss function, class-wise DSC loss, to train the segmentation network. In our inference scheme, we apply a sliding-window based sampling strategy for patch generation and diploid ensemble (data ensemble and model ensemble) for the final prediction. We use the predicted segmentation mask to generate the classification probability for the likelihood of WSI being malignant. To our best knowledge, DigestPath 2019 is the first challenge and the first public dataset available on colonoscopy tissue screening and segmentation, and our proposed framework yields good performance on this dataset. Our new framework achieved a DSC of 0. 7789 and AUC of 1 on the online test dataset, and we won the $2\text{nd}$ place in the DigestPath 2019 Challenge (task 2). Our code is available at https://github.com/bhfs9999/colonoscopy_tissue_screen_and_segmentation.

JBHI Journal 2021 Journal Article

Deep Adaptive Blending Network for 3D Magnetic Resonance Image Denoising

  • Yi Xu
  • Kang Han
  • Yongming Zhou
  • Jian Wu
  • Xin Xie
  • Wei Xiang

The visual quality of magnetic resonance images (MRIs) is crucial for clinical diagnosis and scientific research. The main source of quality degradation is the noise generated during MRI acquisition. Although denoising MRI by deep learning methods shows great superiority compared with traditional methods, the deep learning methods reported to date in the literature cannot simultaneously leverage long-range and hierarchical information, and cannot adequately utilize the similarity in 3D MRI. In this paper, we address the two issues by proposing a deep adaptive blending network (DABN) characterized by a large receptive field residual dense block and an adaptive blending method. We first propose the large receptive field residual dense block that can capture long-range information and fuse hierarchical features simultaneously. Then we propose the adaptive blending method that produces denoised pixels by adaptively filtering 3D MRI, which explicitly utilizes the similarity in 3D MRI. Residual is also considered as a compensating item after adaptive filtering. The blending adaptive filter and residual are predicted by a network consisting of several large receptive field residual dense blocks. Experimental results show that the proposed DABN outperforms state-of-the-art denoising methods in both clinical and simulated MRI data.

IJCAI Conference 2021 Conference Paper

Dig into Multi-modal Cues for Video Retrieval with Hierarchical Alignment

  • Wenzhe Wang
  • Mengdan Zhang
  • Runnan Chen
  • Guanyu Cai
  • Penghao Zhou
  • Pai Peng
  • Xiaowei Guo
  • Jian Wu

Multi-modal cues presented in videos are usually beneficial for the challenging video-text retrieval task on internet-scale datasets. Recent video retrieval methods take advantage of multi-modal cues by aggregating them to holistic high-level semantics for matching with text representations in a global view. In contrast to this global alignment, the local alignment of detailed semantics encoded within both multi-modal cues and distinct phrases is still not well conducted. Thus, in this paper, we leverage the hierarchical video-text alignment to fully explore the detailed diverse characteristics in multi-modal cues for fine-grained alignment with local semantics from phrases, as well as to capture a high-level semantic correspondence. Specifically, multi-step attention is learned for progressively comprehensive local alignment and a holistic transformer is utilized to summarize multi-modal cues for global alignment. With hierarchical alignment, our model outperforms state-of-the-art methods on three public video retrieval datasets.

IJCAI Conference 2021 Conference Paper

Electrocardio Panorama: Synthesizing New ECG views with Self-supervision

  • Jintai Chen
  • Xiangshang Zheng
  • Hongyun Yu
  • Danny Z. Chen
  • Jian Wu

Multi-lead electrocardiogram (ECG) provides clinical information of heartbeats from several fixed viewpoints determined by the lead positioning. However, it is often not satisfactory to visualize ECG signals in these fixed and limited views, as some clinically useful information is represented only from a few specific ECG viewpoints. For the first time, we propose a new concept, Electrocardio Panorama, which allows visualizing ECG signals from any queried viewpoints. To build Electrocardio Panorama, we assume that an underlying electrocardio field exists, representing locations, magnitudes, and directions of ECG signals. We present a Neural electrocardio field Network (Nef-Net), which first predicts the electrocardio field representation by using a sparse set of one or few input ECG views and then synthesizes Electrocardio Panorama based on the predicted representations. Specially, to better disentangle electrocardio field information from viewpoint biases, a new Angular Encoding is proposed to process viewpoint angles. Also, we propose a self-supervised learning approach called Standin Learning, which helps model the electrocardio field without direct supervision. Further, with very few modifications, Nef-Net can synthesize ECG signals from scratch. Experiments verify that our Nef-Net performs well on Electrocardio Panorama synthesis, and outperforms the previous work on the auxiliary tasks (ECG view transformation and ECG synthesis from scratch). The codes and the division labels of cardiac cycles and ECG deflections on Tianchi ECG and PTB datasets are available at https: //github. com/WhatAShot/Electrocardio-Panorama.

JBHI Journal 2021 Journal Article

KerNet: A Novel Deep Learning Approach for Keratoconus and Sub-Clinical Keratoconus Detection Based on Raw Data of the Pentacam HR System

  • Ruiwei Feng
  • Zhe Xu
  • Xiangshang Zheng
  • Heping Hu
  • Xiuming Jin
  • Danny Z. Chen
  • Ke Yao
  • Jian Wu

Keratoconus is one of the most severe corneal diseases, which is difficult to detect at the early stage (i. e. , sub-clinical keratoconus) and possibly results in vision loss. In this paper, we propose a novel end-to-end deep learning approach, called KerNet, which processes the raw data of the Pentacam HR system (consisting of five numerical matrices) to detect keratoconus and sub-clinical keratoconus. Specifically, we propose a novel convolutional neural network, called KerNet, containing five branches as the backbone with a multi-level fusion architecture. The five branches receive five matrices separately and capture effectively the features of different matrices by several cascaded residual blocks. The multi-level fusion architecture (i. e. , low-level fusion and high-level fusion) moderately takes into account the correlation among five slices and fuses the extracted features for better prediction. Experimental results show that: (1) our novel approach outperforms state-of-the-art methods on an in-house dataset, by ~1% for keratoconus detection accuracy and ~4 for sub-clinical keratoconus detection accuracy; (2) the attention maps visualized by Grad-CAM show that our KerNet places more attention on the inferior temporal part for sub-clinical keratoconus, which has been proved as the identifying regions for ophthalmologists to detect sub-clinical keratoconus in previous clinical studies. To our best knowledge, we are the first to propose an end-to-end deep learning approach utilizing raw data obtained by the Pentacam HR system for keratoconus and subclinical keratoconus detection. Further, the prediction performance and the clinical significance of our KerNet are well evaluated and proved by two clinical experts. Our code is available at https://github.com/upzheng/Keratoconus.

AAAI Conference 2021 Conference Paper

To Choose or to Fuse? Scale Selection for Crowd Counting

  • Qingyu Song
  • Changan Wang
  • Yabiao Wang
  • Ying Tai
  • Chengjie Wang
  • Jilin Li
  • Jian Wu
  • Jiayi Ma

In this paper, we address the large scale variation problem in crowd counting by taking full advantage of the multiscale feature representations in a multi-level network. We implement such an idea by keeping the counting error of a patch as small as possible with a proper feature level selection strategy, since a specific feature level tends to perform better for a certain range of scales. However, without scale annotations, it is sub-optimal and error-prone to manually assign the predictions for heads of different scales to specific feature levels. Therefore, we propose a Scale-Adaptive Selection Network (SASNet), which automatically learns the internal correspondence between the scales and the feature levels. Instead of directly using the predictions from the most appropriate feature level as the final estimation, our SASNet also considers the predictions from other feature levels via weighted average, which helps to mitigate the gap between discrete feature levels and continuous scale variation. Since the heads in a local patch share roughly a same scale, we conduct the adaptive selection strategy in a patch-wise style. However, pixels within a patch contribute different counting errors due to the various difficulty degrees of learning. Thus, we further propose a Pyramid Region Awareness Loss (PRA Loss) to recursively select the most hard sub-regions within a patch until reaching the pixel level. With awareness of whether the parent patch is over-estimated or under-estimated, the fine-grained optimization with the PRA Loss for these region-aware hard pixels helps to alleviate the inconsistency problem between training target and evaluation metric. The state-of-the-art results on four datasets demonstrate the superiority of our approach. The code will be available at: https: //github. com/TencentYoutuResearch/CrowdCounting- SASNet.

UAI Conference 2019 Conference Paper

Practical Multi-fidelity Bayesian Optimization for Hyperparameter Tuning

  • Jian Wu
  • Saul Toscano-Palmerin
  • Peter I. Frazier
  • Andrew Gordon Wilson

Bayesian optimization is popular for optimizing time-consuming black-box objectives. Nonetheless, for hyperparameter tuning in deep neural networks, the time required to evaluate the validation error for even a few hyperparameter settings remains a bottleneck. Multi-fidelity optimization promises relief using cheaper proxies to such objectives — for example, validation error for a network trained using a subset of the training points or fewer iterations than required for convergence. We propose a highly flexible and practical approach to multi-fidelity Bayesian optimization, focused on efficiently optimizing hyperparameters for iteratively trained supervised learning models. We introduce a new acquisition function, the trace-aware knowledge-gradient, which efficiently leverages both multiple continuous fidelity controls and trace observations — values of the objective at a sequence of fidelities, available when varying fidelity using training iterations. We provide a provably convergent method for optimizing our acquisition function and show it outperforms state-of-the-art alternatives for hyperparameter tuning of deep neural networks and large-scale kernel learning.

NeurIPS Conference 2019 Conference Paper

Practical Two-Step Lookahead Bayesian Optimization

  • Jian Wu
  • Peter Frazier

Expected improvement and other acquisition functions widely used in Bayesian optimization use a "one-step" assumption: they value objective function evaluations assuming no future evaluations will be performed. Because we usually evaluate over multiple steps, this assumption may leave substantial room for improvement. Existing theory gives acquisition functions looking multiple steps in the future but calculating them requires solving a high-dimensional continuous-state continuous-action Markov decision process (MDP). Fast exact solutions of this MDP remain out of reach of today's methods. As a result, previous two- and multi-step lookahead Bayesian optimization algorithms are either too expensive to implement in most practical settings or resort to heuristics that may fail to fully realize the promise of two-step lookahead. This paper proposes a computationally efficient algorithm that provides an accurate solution to the two-step lookahead Bayesian optimization problem in seconds to at most several minutes of computation per batch of evaluations. The resulting acquisition function provides increased query efficiency and robustness compared with previous two- and multi-step lookahead methods in both single-threaded and batch experiments. This unlocks the value of two-step lookahead in practice. We demonstrate the value of our algorithm with extensive experiments on synthetic test functions and real-world problems.

IJCAI Conference 2018 Conference Paper

Sequential Recommender System based on Hierarchical Attention Networks

  • Haochao Ying
  • Fuzhen Zhuang
  • Fuzheng Zhang
  • Yanchi Liu
  • Guandong Xu
  • Xing Xie
  • Hui Xiong
  • Jian Wu

With a large amount of user activity data accumulated, it is crucial to exploit user sequential behavior for sequential recommendations. Conventionally, user general taste and recent demand are combined to promote recommendation performances. However, existing methods often neglect that user long-term preference keep evolving over time, and building a static representation for user general taste may not adequately reflect the dynamic characters. Moreover, they integrate user-item or item-item interactions through a linear way which limits the capability of model. To this end, in this paper, we propose a novel two-layer hierarchical attention network, which takes the above properties into account, to recommend the next item user might be interested. Specifically, the first attention layer learns user long-term preferences based on the historical purchased item representation, while the second one outputs final user representation through coupling user long-term and short-term preferences. The experimental study demonstrates the superiority of our method compared with other state-of-the-art ones.

NeurIPS Conference 2017 Conference Paper

Bayesian Optimization with Gradients

  • Jian Wu
  • Matthias Poloczek
  • Andrew Wilson
  • Peter Frazier

Bayesian optimization has shown success in global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to find good solutions with fewer objective function evaluations. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (dKG), which is one-step Bayes-optimal, asymptotically consistent, and provides greater one-step value of information than in the derivative-free setting. dKG accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the dKG acquisition function and its gradient using a novel fast discretization-free technique. We show dKG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors.

ICRA Conference 2017 Conference Paper

Delving deeper into convolutional neural networks for camera relocalization

  • Jian Wu
  • Liwei Ma
  • Xiaolin Hu 0001

Convolutional Neural Networks (CNNs) have been applied to camera relocalization, which is to infer the pose of the camera given a single monocular image. However, there are still many open problems for camera relocalization with CNNs. We delve into the CNNs for camera relocalization. First, a variant of Euler angles named Euler6 is proposed to represent orientation. Then a data augmentation method named pose synthesis is designed to reduce spsarsity of poses in the whole pose space to cope with overfitting in training. Third, a multi-task CNN named BranchNet is proposed to deal with the complex coupling of orientation and translation. The network consists of several shared convolutional layers and splits into two branches which predict orientation and translation, respectively. Experiments on the 7Scenes dataset show that incorporating these techniques one by one into an existing model PoseNet always leads to better results. Together these techniques reduce the orientation error by 15. 9% and the translation error by 38. 3% compared to the state-of-the-art model Bayesian PoseNet. We implement BranchNet on an Intel NUC mobile platform and reach a speed of 43 fps, which meets the real-time requirement of many robotic applications.

JBHI Journal 2016 Journal Article

A Wearable System for Recognizing American Sign Language in Real-Time Using IMU and Surface EMG Sensors

  • Jian Wu
  • Lu Sun
  • Roozbeh Jafari

A sign language recognition system translates signs performed by deaf individuals into text/speech in real time. Inertial measurement unit and surface electromyography (sEMG) are both useful modalities to detect hand/arm gestures. They are able to capture signs and the fusion of these two complementary sensor modalities will enhance system performance. In this paper, a wearable system for recognizing American Sign Language (ASL) in real time is proposed, fusing information from an inertial sensor and sEMG sensors. An information gain-based feature selection scheme is used to select the best subset of features from a broad range of well-established features. Four popular classification algorithms are evaluated for 80 commonly used ASL signs on four subjects. The experimental results show 96. 16% and 85. 24% average accuracies for intra-subject and intra-subject cross session evaluation, respectively, with the selected feature subset and a support vector machine classifier. The significance of adding sEMG for ASL recognition is explored and the best channel of sEMG is highlighted.

NeurIPS Conference 2016 Conference Paper

The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

  • Jian Wu
  • Peter Frazier

In many applications of black-box optimization, one can evaluate multiple points simultaneously, e. g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm --- the parallel knowledge gradient method. By construction, this method provides the one-step Bayes optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy.

IS Journal 2005 Journal Article

DartGrid II: a semantic grid platform for ITS

  • Zhaohui Wu
  • Shuiguang Deng
  • Jian Wu
  • Huajun Chen
  • Shuming Tang
  • Haijun Gao

Intelligent transportation systems offer an alternative approach to solving many problems by implementing advances in information, Internet, communication, and cybernetics technologies. Grid computing can support traffic data semantization, resource sharing, ITS subsystem cooperation, and global-scale distributed computing that connects all kinds of resources. We are currently using grid technology to build DartGrid II, a semantic ITS platform to support resource sharing, service flow management, and cross-domain cooperation.

NeurIPS Conference 1988 Conference Paper

Neural Analog Diffusion-Enhancement Layer and Spatio-Temporal Grouping in Early Vision

  • Allen Waxman
  • Michael Seibert
  • Robert Cunningham
  • Jian Wu

A new class of neural network aimed at early visual processing is described; we call it a Neural Analog Diffusion-Enhancement Layer or "NADEL. " The network consists of two levels which are coupled through feedfoward and shunted feedback connections. The lower level is a two-dimensional diffusion map which accepts visual features as input, and spreads activity over larger scales as a function of time. The upper layer is periodically fed the activity from the diffusion layer and locates local maxima in it (an extreme form of contrast enhancement) using a network of local comparators. These local maxima are fed back to the diffusion layer using an on-center/off-surround shunting anatomy. The maxima are also available as output of the network. The network dynamics serves to cluster features on multiple scales as a function of time, and can be used in a variety of early visual processing tasks such as: extraction of comers and high curvature points along edge contours, line end detection, gap filling in contours, generation of fixation points, perceptual grouping on multiple scales, correspondence and path impletion in long-range apparent motion, and building 2-D shape representations that are invariant to location, orientation, scale, and small deformation on the visual field.