Author name cluster

Bowen He

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

AdaTS: Learning Adaptive Time Series Representations via Dynamic Soft Contrasts

Denizhan Kara
Tomoyoshi Kimura
Jinyang Li
Bowen He
Yizhuo Chen
Yigong Hu
Hongjue Zhao
Shengzhong Liu

Learning robust representations from unlabeled time series is crucial, and contrastive learning offers a promising avenue. However, existing contrastive learning approaches for time series often struggle with defining meaningful similarities, tending to overlook inherent physical correlations and diverse, sequence-varying non-stationarity. This limits their representational quality and real-world adaptability. To address these limitations, we introduce AdaTS, a novel adaptive soft contrastive learning strategy. AdaTS offers a compute-efficient solution centered on dynamic instance-wise and temporal assignments to enhance time series representations, specifically by: (i) leveraging Time-Frequency Coherence for robust physics-guided similarity measurement; (ii) preserving relative instance similarities through ordinal consistency learning; and (iii) dynamically adapting to sequence-specific non-stationarity with dynamic temporal assignments. AdaTS is designed as a pluggable module to standard contrastive frameworks, achieving up to 13. 7% accuracy improvements across diverse time series datasets and three state-of-the-art contrastive frameworks while enhancing robustness against label scarcity. The code will be publicly available upon acceptance.

PDF Details

AAAI Conference 2024 Conference Paper

Improving Audio-Visual Segmentation with Bidirectional Generation

Dawei Hao
Yuxin Mao
Bowen He
Xiaodong Han
Yuchao Dai
Yiran Zhong

The aim of audio-visual segmentation (AVS) is to precisely differentiate audible objects within videos down to the pixel level. Traditional approaches often tackle this challenge by combining information from various modalities, where the contribution of each modality is implicitly or explicitly modeled. Nevertheless, the interconnections between different modalities tend to be overlooked in audio-visual modeling. In this paper, inspired by the human ability to mentally simulate the sound of an object and its visual appearance, we introduce a bidirectional generation framework. This framework establishes robust correlations between an object's visual characteristics and its associated sound, thereby enhancing the performance of AVS. To achieve this, we employ a visual-to-audio projection component that reconstructs audio features from object segmentation masks and minimizes reconstruction errors. Moreover, recognizing that many sounds are linked to object movements, we introduce an implicit volumetric motion estimation module to handle temporal dynamics that may be challenging to capture using conventional optical flow methods. To showcase the effectiveness of our approach, we conduct comprehensive experiments and analyses on the widely recognized AVSBench benchmark. As a result, we establish a new state-of-the-art performance level in the AVS benchmark, particularly excelling in the challenging MS3 subset which involves segmenting multiple sound sources. Code is released in: https://github.com/OpenNLPLab/AVS-bidirectional.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Q-functionals for Value-Based Continuous Control

Samuel Lobel
Sreehari Rammohan
Bowen He
Shangqun Yu
George Konidaris

We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value for a state-action pair, our network transforms a state into a function that can be rapidly evaluated in parallel for many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical architecture of off-policy continuous control, where a policy network is trained for the sole purpose of selecting actions from the Q-function. We represent our action-dependent Q-function as a weighted sum of basis functions (Fourier, Polynomial, etc) over the action space, where the weights are state-dependent and output by the Q-functional network. Fast sampling makes practical a variety of techniques that require Monte-Carlo integration over Q-functions, and enables action-selection strategies besides simple value-maximization. We characterize our framework, describe various implementations of Q-functionals, and demonstrate strong performance on a suite of continuous control tasks.

PDF Details DOI

ICLR Conference 2023 Conference Paper

Sequence modeling has important applications in natural language processing and computer vision. Recently, the transformer-based models have shown strong performance on various sequence modeling tasks, which rely on attention to capture pairwise token relations, and position embedding to inject positional information. While showing good performance, the transformer models are inefficient to scale to long input sequences, mainly due to the quadratic space-time complexity of attention. To overcome this inefficiency, we propose to model sequences with a relative position encoded Toeplitz matrix and use a Toeplitz matrix-vector production trick to reduce the space-time complexity of the sequence modeling to log linear. A lightweight sub-network called relative position encoder is proposed to generate relative position coefficients with a fixed budget of parameters, enabling the proposed Toeplitz neural network to deal with varying sequence lengths. In addition, despite being trained on 512-token sequences, our model can extrapolate input sequence length up to 14K tokens in inference with consistent performance. Extensive experiments on autoregressive and bidirectional language modeling, image modeling, and the challenging Long-range Arena Benchmark show that our method achieves better performance than its competitors in most downstream tasks while being significantly faster.

Details

AAAI Conference 2019 Conference Paper

A Domain Generalization Perspective on Listwise Context Modeling

Lin Zhu
Yihong Chen
Bowen He

As one of the most popular techniques for solving the ranking problem in information retrieval, Learning-to-rank (LETOR) has received a lot of attention both in academia and industry due to its importance in a wide variety of data mining applications. However, most of existing LETOR approaches choose to learn a single global ranking function to handle all queries, and ignore the substantial differences that exist between queries. In this paper, we propose a domain generalization strategy to tackle this problem. We propose Query- Invariant Listwise Context Modeling (QILCM), a novel neural architecture which eliminates the detrimental influence of inter-query variability by learning query-invariant latent representations, such that the ranking system could generalize better to unseen queries. We evaluate our techniques on benchmark datasets, demonstrating that QILCM outperforms previous state-of-the-art approaches by a substantial margin.