Author name cluster

Kai Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers

2 author rows

EAAI Journal 2026 Journal Article

A cascaded border-aware network for visual tracking

Qun Li
Haijun Zhang
Kai Yang
Zhili Zhou

Details DOI

JBHI Journal 2026 Journal Article

BioMedGPT: An Open Multimodal Large Language Model for BioMedicine

Yizhen Luo
Jiahuan Zhang
Siqi Fan
Kai Yang
Massimo Hong
Yushuai Wu
Mu Qiao
Zaiqing Nie

Recent advances in large language models (LLMs) like ChatGPT have shed light on the development of knowledgeable and versatile AI research assistants in various scientific domains. However, they fall short in biomedical applications due to a lack of proprietary biomedical knowledge and deficiencies in handling biological sequences for molecules and proteins. To address these issues, we present BioMedGPT, a multimodal large language model for assisting biomedical research. We first incorporate domain expertise into LLMs by incremental pre-training on large-scale biomedical literature. Then, we harmonize 2D molecular graphs, protein sequences, and natural language within a unified, parameter-efficient fusion architecture by fine-tuning on multimodal question-answering datasets. Through comprehensive experiments, we show that BioMedGPT performs on par with human experts in comprehending biomedical documents and answering research questions. It also exhibits promising capability in analyzing intricate functions and properties of novel molecules and proteins, surpassing state-of-the-art LLMs by 17. 1% and 49. 8% absolute gains respectively in ROUGE-L on molecule and protein question-answering.

Details DOI

EAAI Journal 2026 Journal Article

Real-time prediction of temperature field of thermal fatigue-damaged thermos-compression bonding electrode based on digital twin data and improved generative adversarial network model

Kai Chen
Kai Yang
Zuoen Deng
Jiadui Chen
Haisong Huang
Jingwei Yang

Details DOI

AAAI Conference 2026 Conference Paper

RoS-Guard: Robust and Scalable Online Change Detection with Delay-Optimal Guarantees

Zelin Zhu
Yancheng Huang
Kai Yang

Online change detection (OCD) aims to rapidly identify change points in streaming data and is critical in applications such as power system monitoring, wireless network sensing, and financial anomaly detection. Existing OCD methods typically assume precise system knowledge, which is unrealistic due to estimation errors and environmental variations. Moreover, existing OCD methods often struggle with efficiency in large-scale systems. To overcome these challenges, we propose RoS-Guard, a robust and optimal OCD algorithm tailored for linear systems with uncertainty. Through a tight relaxation and reformulation of the OCD optimization problem, RoS-Guard employs neural unrolling to enable efficient parallel computation via GPU acceleration. The algorithm provides theoretical guarantees on performance, including expected false alarm rate and worst-case average detection delay. Extensive experiments validate the effectiveness of RoS-Guard and demonstrate significant computational speedup in large-scale system scenarios.

PDF Details DOI

EAAI Journal 2025 Journal Article

A noise-robust framework for multi-step wind power forecasting via self-supervised joint optimization

Xiaodong Huang
Gangliang Li
Chengfeng Chen
Kai Yang
Shouqiang Liu

Details DOI

ECAI Conference 2025 Conference Paper

A Self-Adaptive Frequency Domain Network for Continuous Intraoperative Hypotension Prediction

Xian Zeng
Tianze Xu
Kai Yang
Jie Sun
Youran Wang
Jun Xu
Mucheng Ren

Intraoperative hypotension (IOH) is strongly associated with postoperative complications, including postoperative delirium and increased mortality, making its early prediction crucial in perioperative care. While several artificial intelligence-based models have been developed to provide IOH warnings, existing methods face limitations in incorporating both time and frequency domain information, capturing short- and long-term dependencies, and handling noise sensitivity in biosignal data. To address these challenges, we propose a novel Self-Adaptive Frequency Domain Network (SAFDNet). Specifically, SAFDNet integrates an adaptive spectral block, which leverages Fourier analysis to extract frequency-domain features and employs self-adaptive thresholding to mitigate noise. Additionally, an interactive attention block is introduced to capture both long-term and short-term dependencies in the data. Extensive internal and external validations on two large-scale real-world datasets demonstrate that SAFDNet achieves up to 97. 3% AUROC in IOH early warning, outperforming state-of-the-art models. Furthermore, SAFDNet exhibits robust predictive performance and low sensitivity to noise, making it well-suited for practical clinical applications.

Details

AAMAS Conference 2025 Conference Paper

CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning

Zeyuan Liu
Kai Yang
Jiafei Lyu
Xiu Li

Distribution shift is a major obstacle in offline reinforcement learning (RL). While existing conservative offline RL algorithms perform well in learning in-distribution policies, they often fail to generalize to unseen actions. To address this issue, we propose leveraging knowledge derived from the gradient fields of the dataset’s density to refine and adjust the original actions. Building on this, we introduce the Conservative Denoising Score-based Algorithm (CDSA), which utilizes score-based diffusion models to estimate the gradients of the dataset density and generates action correction subcomponents to refine the actions. This approach enables more accurate and efficient decision-making during the testing phase in Markov Decision Process (MDP) environments. By decoupling conservatism constraints from the policy, our method is broadly applicable to various offline RL algorithms. Experiments demonstrate that our approach significantly enhances baseline performance on D4RL datasets and exhibits plug-and-play compatibility with different pre-trained offline RL policies.

PDF

ICML Conference 2025 Conference Paper

Contextures: Representations from Contexts

Runtian Zhai
Kai Yang
Burak Varici
Che-Ping Tsai
J. Zico Kolter
Pradeep Ravikumar

Despite the empirical success of foundation models, we do not have a systematic characterization of the representations that these models learn. In this paper, we establish the contexture theory. It shows that a large class of representation learning methods can be characterized as learning from the association between the input and a context variable. Specifically, we show that many popular methods aim to approximate the top-d singular functions of the expectation operator induced by the context, in which case we say that the representation learns the contexture. We demonstrate the generality of the contexture theory by proving that representation learning within various learning paradigms – supervised, self-supervised, and manifold learning – can all be studied from such a perspective. We prove that representations that learn the contexture are optimal on those tasks that are compatible with the context. One important implication of our theory is that once the model is large enough to approximate the top singular functions, scaling up the model size yields diminishing returns, so further improvement requires better contexts. To this end, we study how to evaluate a context without knowing the downstream tasks. We propose a metric and show by experiments that it correlates well with the actual performance of the encoder on many real datasets.

Details

NeurIPS Conference 2025 Conference Paper

From Sequence to Structure: Uncovering Substructure Reasoning in Transformers

Xinnan Dai
Kai Yang
Jay Revolinsky
Kai Guo
Aoran Wang
Bohang Zhang
Jiliang Tang

Recent studies suggest that large language models (LLMs) possess the capability to solve graph reasoning tasks. Notably, even when graph structures are embedded within textual descriptions, LLMs can still effectively answer related questions. This raises a fundamental question: How can a decoder-only Transformer architecture understand underlying graph structures? To address this, we start with the substructure extraction task, interpreting the inner mechanisms inside the transformers and analyzing the impact of the input queries. Specifically, through both empirical results and theoretical analysis, we present Induced Substructure Filtration (ISF), a perspective that captures the substructure identification in the multi-layer transformers. We further validate the ISF process in LLMs, revealing consistent internal dynamics across layers. Building on these insights, we explore the broader capabilities of Transformers in handling diverse graph types. Specifically, we introduce the concept of thinking in substructures to efficiently extract complex composite patterns, and demonstrate that decoder-only Transformers can successfully extract substructures from attributed graphs, such as molecular graphs. Together, our findings offer a new insight on how sequence-based Transformers perform the substructure extraction task over graph data.

PDF Details

AAAI Conference 2025 Conference Paper

GPU-Accelerated Parallel Bilevel Optimization for Roubst 6G ISAC

Xingdi Chen
Kai Yang

This paper initiates the first exploratory study to investigate the robust integrated sensing and communication (ISAC) systems under channel estimation errors from the perspective of GPU-Accelerated bilevel optimization. Within this framework, the upper-level problem is dedicated to simultaneously optimizing communication and sensing objectives, quantified respectively by weighted sum rate and Cram\'er-Rao lower bound, while the lower-level problem considers the channel uncertainties. We then propose an efficient algorithm that can find a set of Pareto optimal solutions with different trade-offs among communication rates and sensing accuracy. The theoretical analysis regarding the convergence rate has also been provided. Furthermore, we design a bilevel optimization inspired deep neural network architecture for that can be realized efficiently on GPU platform. Experiments have been conducted to evaluate the performances of proposed methods. In particular, the proposed GPU-accelerated parallel bilevel optimization can accelerate the convergence speed by up to 50 times compared to conventional gradient-based methods. This characteristic renders it especially suitable for real-time applications, exemplified by the demanding requirements of robust ISAC in upcoming 6G networks.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Novelty-Guided Data Reuse for Efficient and Diversified Multi-Agent Reinforcement Learning

Yangkun Chen
Kai Yang
Jian Tao
Jiafei Lyu

Recently, deep Multi-Agent Reinforcement Learning (MARL) has demonstrated its potential to tackle complex cooperative tasks, pushing the boundaries of AI in collaborative environments. However, the efficiency of these systems is often compromised by inadequate sample utilization and a lack of diversity in learning strategies. To enhance MARL performance, we introduce a novel sample reuse approach that dynamically adjusts policy updates based on observation novelty. Specifically, we employ a Random Network Distillation (RND) network to gauge the novelty of each agent's current state, assigning additional sample update opportunities based on the uniqueness of the data. We name our method Multi-Agent Novelty-GuidEd sample Reuse (MANGER). This method increases sample efficiency while promoting exploration and diverse agent behaviors. Our evaluations confirm substantial improvements in MARL effectiveness in complex cooperative scenarios such as Google Research Football and super-hard StarCraft II micromanagement tasks.

PDF Details DOI

EAAI Journal 2024 Journal Article

A two-stage reinforcement learning-based approach for multi-entity task allocation

Aicheng Gong
Kai Yang
Jiafei Lyu
Xiu Li

Details DOI

IJCAI Conference 2024 Conference Paper

BATON: Aligning Text-to-Audio Model Using Human Preference Feedback

Huan Liao
Haonan Han
Kai Yang
Tianjiao Du
Rui Yang
Qinmei Xu
Zunnan Xu
Jingquan Liu

With the development of AI-Generated Content (AIGC), text-to-audio models are gaining widespread attention. However, it is challenging for these models to generate audio aligned with human preference due to the inherent information density of natural language and limited model understanding ability. To alleviate this issue, we formulate the BATON, the first framework specifically designed to enhance the alignment between generated audio and text prompt using human preference feedback. Our BATON comprises three key stages: Firstly, we curated a dataset containing both prompts and the corresponding generated audio, which was then annotated based on human feedback. Secondly, we introduced a reward model using the constructed dataset, which can mimic human preference by assigning rewards to input text-audio pairs. Finally, we employed the reward model to fine-tune an off-the-shelf text-to-audio model. The experiment results demonstrate that our BATON can significantly improve the generation quality of the original text-to-audio models, concerning audio integrity, temporal relationship, and alignment with human preference. Project page is available at https: //baton2024. github. io.

PDF Details DOI

ICML Conference 2024 Conference Paper

Do Efficient Transformers Really Save Computation?

Kai Yang
Jan Ackermann
Zhenyu He 0012
Guhao Feng
Bohang Zhang
Yunzhen Feng
Qiwei Ye
Di He 0001

As transformer-based language models are trained on increasingly large datasets and with vast numbers of parameters, finding more efficient alternatives to the standard Transformer has become very valuable. While many efficient Transformers and Transformer alternatives have been proposed, none provide theoretical guarantees that they are a suitable replacement for the standard Transformer. This makes it challenging to identify when to use a specific model and what directions to prioritize for further investigation. In this paper, we aim to understand the capabilities and limitations of efficient Transformers, specifically the Sparse Transformer and the Linear Transformer. We focus on their reasoning capability as exhibited by Chain-of-Thought (CoT) prompts and follow previous works to model them as Dynamic Programming (DP) problems. Our results show that while these models are expressive enough to solve general DP tasks, contrary to expectations, they require a model size that scales with the problem size. Nonetheless, we identify a class of DP problems for which these models can be more efficient than the standard Transformer. We confirm our theoretical results through experiments on representative DP tasks, adding to the understanding of efficient Transformers’ practical strengths and weaknesses.

Details

EAAI Journal 2024 Journal Article

Expanding the defect image dataset of composite material coating with enhanced image-to-image translation

Xinrui Tao
Hanjun Gao
Kai Yang
Qiong Wu

Details DOI

ICML Conference 2024 Conference Paper

Exploration and Anti-Exploration with Distributional Random Network Distillation

Kai Yang
Jian Tao
Jiafei Lyu
Xiu Li 0001

Exploration remains a critical issue in deep reinforcement learning for an agent to attain high returns in unknown environments. Although the prevailing exploration Random Network Distillation (RND) algorithm has been demonstrated to be effective in numerous environments, it often needs more discriminative power in bonus allocation. This paper highlights the “bonus inconsistency” issue within RND, pinpointing its primary limitation. To address this issue, we introduce the Distributional RND (DRND), a derivative of the RND. DRND enhances the exploration process by distilling a distribution of random networks and implicitly incorporating pseudo counts to improve the precision of bonus allocation. This refinement encourages agents to engage in more extensive exploration. Our method effectively mitigates the inconsistency issue without introducing significant computational overhead. Both theoretical analysis and experimental results demonstrate the superiority of our approach over the original RND algorithm. Our method excels in challenging online exploration scenarios and effectively serves as an anti-exploration mechanism in D4RL offline tasks. Our code is publicly available at https: //github. com/yk7333/DRND.

Details

EAAI Journal 2024 Journal Article

Image inpainting algorithm based on inference attention module and two-stage network

Yuantao Chen
Runlong Xia
Kai Yang
Ke Zou

Details DOI

AAAI Conference 2024 Short Paper

Improving IP Geolocation With Target-Centric IP Graph (Student Abstract)

Kai Yang
Jiayang Li
Wenxin Tai
Zhenhui Li
Ting Zhong
Guangqiang Yin
Yong Wang

Accurate IP geolocation is indispensable for location-aware applications. While recent advances based on router-centric IP graphs are considered cutting-edge, one challenge remain: the prevalence of sparse IP graphs (14.24% with fewer than 10 nodes, 9.73% isolated) limits graph learning. To mitigate this issue, we designate the target host as the central node and aggregate multiple last-hop routers to construct the target-centric IP graph, instead of relying solely on the router with the smallest last-hop latency as in previous works. Experiments on three real-world datasets show that our method significantly improves the geolocation accuracy compared to existing baselines.

PDF Details DOI

AAAI Conference 2024 Short Paper

Interpreting Temporal Knowledge Graph Reasoning (Student Abstract)

Bin Chen
Kai Yang
Wenxin Tai
Zhangtao Cheng
Leyuan Liu
Ting Zhong
Fan Zhou

Temporal knowledge graph reasoning is an essential task that holds immense value in diverse real-world applications. Existing studies mainly focus on leveraging structural and sequential dependencies, excelling in tasks like entity and link prediction. However, they confront a notable interpretability gap in their predictions, a pivotal facet for comprehending model behavior. In this study, we propose an innovative method, LSGAT, which not only exhibits remarkable precision in entity predictions but also enhances interpretability by identifying pivotal historical events influencing event predictions. LSGAT enables concise explanations for prediction outcomes, offering valuable insights into the otherwise enigmatic "black box" reasoning process. Through an exploration of the implications of the most influential events, it facilitates a deeper understanding of the underlying mechanisms governing predictions.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Provably Convergent Federated Trilevel Learning

Yang Jiao
Kai Yang
Tiancheng Wu
Chengtao Jian
Jianwei Huang

Trilevel learning, also called trilevel optimization (TLO), has been recognized as a powerful modelling tool for hierarchical decision process and widely applied in many machine learning applications, such as robust neural architecture search, hyperparameter optimization, and domain adaptation. Tackling TLO problems has presented a great challenge due to their nested decision-making structure. In addition, existing works on TLO face the following key challenges: 1) they all focus on the non-distributed setting, which may lead to privacy breach; 2) they do not offer any non-asymptotic convergence analysis which characterizes how fast an algorithm converges. To address the aforementioned challenges, this paper proposes an asynchronous federated trilevel optimization method to solve TLO problems. The proposed method utilizes u-cuts to construct a hyper-polyhedral approximation for the TLO problem and solve it in an asynchronous manner. We demonstrate that the proposed u-cuts are applicable to not only convex functions but also a wide range of non-convex functions that meet the u-weakly convex assumption. Furthermore, we theoretically analyze the non-asymptotic convergence rate for the proposed method by showing its iteration complexity to obtain ϵ-stationary point is upper bounded by O(1/ϵ²). Extensive experiments on real-world datasets have been conducted to elucidate the superiority of the proposed method, e.g., it has a faster convergence rate with a maximum acceleration of approximately 80%.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Robust Beamforming for Downlink Multi-Cell Systems: A Bilevel Optimization Perspective

Xingdi Chen
Yu Xiong
Kai Yang

Utilization of inter-base station cooperation for information processing has shown great potential in enhancing the overall quality of communication services (QoS) in wireless communication networks. Nevertheless, such cooperations require the knowledge of channel state information (CSI) at base stations (BSs), which is assumed to be perfectly known. However, CSI errors are inevitable in practice which necessitates beamforming technique that can achieve robust performance in the presence of channel estimation errors. Existing approaches relax the robust beamforming design problems into semidefinite programming (SDP), which can only achieve a solution that is far from being optimal. To this end, this paper views robust beamforming design problems from a bilevel optimization perspective. In particular, we focus on maximizing the worst-case weighted sum-rate (WSR) in the downlink multi-cell multi-user multiple-input single-output (MISO) system considering bounded CSI errors. We first reformulate this problem into a bilevel optimization problem and then develop an efficient algorithm based on the cutting plane method. A distributed optimization algorithm has also been developed to facilitate the parallel processing in practical settings. Numerical results are provided to confirm the effectiveness of the proposed algorithm in terms of performance and complexity, particularly in the presence of CSI uncertainties.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Tri-Level Navigator: LLM-Empowered Tri-Level Learning for Time Series OOD Generalization

Chengtao Jian
Kai Yang
Yang Jiao

Out-of-Distribution (OOD) generalization in machine learning is a burgeoning area of study. Its primary goal is to enhance the adaptability and resilience of machine learning models when faced with new, unseen, and potentially adversarial data that significantly diverges from their original training datasets. In this paper, we investigate time series OOD generalization via pre-trained Large Language Models (LLMs). We first propose a novel \textbf{T}ri-level learning framework for \textbf{T}ime \textbf{S}eries \textbf{O}OD generalization, termed TTSO, which considers both sample-level and group-level uncertainties. This formula offers a fresh theoretic perspective for formulating and analyzing OOD generalization problem. In addition, we provide a theoretical analysis to justify this method is well motivated. We then develop a stratified localization algorithm tailored for this tri-level optimization problem, theoretically demonstrating the guaranteed convergence of the proposed algorithm. Our analysis also reveals that the iteration complexity to obtain an $\epsilon$-stationary point is bounded by O($\frac{1}{\epsilon^{2}}$). Extensive experiments on real-world datasets have been conducted to elucidate the effectiveness of the proposed method.

PDF Details DOI

ICML Conference 2024 Conference Paper

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

Zhenyu He 0012
Guhao Feng
Shengjie Luo
Kai Yang
Liwei Wang 0001
Jingjing Xu
Zhi Zhang 0005
Hongxia Yang

In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the locations within a segment and helps the model capture the semantic information therein via absolute positional encoding. The inter-segment encoding specifies the segment index, models the relationships between segments, and aims to improve extrapolation capabilities via relative positional encoding. Theoretical analysis shows this disentanglement of positional information makes learning more effective. The empirical results also show that our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities.

Details

EAAI Journal 2023 Journal Article

An integrating spherical fuzzy AHP and axiomatic design approach and its application in human–machine interface design evaluation

Qinghua Liu
Jiadui Chen
Kai Yang
Dan Liu
Ling He
Qing Qin
Yuqing Wang

Details DOI

AAAI Conference 2023 Conference Paper

KerPrint: Local-Global Knowledge Graph Enhanced Diagnosis Prediction for Retrospective and Prospective Interpretations

Kai Yang
Yongxin Xu
Peinie Zou
Hongxin Ding
Junfeng Zhao
Yasha Wang
Bing Xie

While recent developments of deep learning models have led to record-breaking achievements in many areas, the lack of sufficient interpretation remains a problem for many specific applications, such as the diagnosis prediction task in healthcare. The previous knowledge graph(KG) enhanced approaches mainly focus on learning clinically meaningful representations, the importance of medical concepts, and even the knowledge paths from inputs to labels. However, it is infeasible to interpret the diagnosis prediction, which needs to consider different medical concepts, various medical relationships, and the time-effectiveness of knowledge triples in different patient contexts. More importantly, the retrospective and prospective interpretations of disease processes are valuable to clinicians for the patients' confounding diseases. We propose KerPrint, a novel KG enhanced approach for retrospective and prospective interpretations to tackle these problems. Specifically, we propose a time-aware KG attention method to solve the problem of knowledge decay over time for trustworthy retrospective interpretation. We also propose a novel element-wise attention method to select candidate global knowledge using comprehensive representations from the local KG for prospective interpretation. We validate the effectiveness of our KerPrint through an extensive experimental study on a real-world dataset and a public dataset. The results show that our proposed approach not only achieves significant improvement over knowledge-enhanced methods but also gives the interpretability of diagnosis prediction in both retrospective and prospective views.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

VecoCare: Visit Sequences-Clinical Notes Joint Learning for Diagnosis Prediction in Healthcare Data

Yongxin Xu
Kai Yang
Chaohe Zhang
Peinie Zou
Zhiyuan Wang
Hongxin Ding
Junfeng Zhao
Yasha Wang

Due to the insufficiency of electronic health records (EHR) data utilized in practical diagnosis prediction scenarios, most works are devoted to learning powerful patient representations either from structured EHR data (e. g. , temporal medical events, lab test results, etc. ) or unstructured data (e. g. , clinical notes, etc. ). However, synthesizing rich information from both of them still needs to be explored. Firstly, the heterogeneous semantic biases across them heavily hinder the synthesis of representation spaces, which is critical for diagnosis prediction. Secondly, the intermingled quality of partial clinical notes leads to inadequate representations of to-be-predicted patients. Thirdly, typical attention mechanisms mainly focus on aggregating information from similar patients, ignoring important auxiliary information from others. To tackle these challenges, we propose a novel visit sequences-clinical notes joint learning approach, dubbed VecoCare. It performs a Gromov-Wasserstein Distance (GWD)-based contrastive learning task and an adaptive masked language model task in a sequential pre-training manner to reduce heterogeneous semantic biases. After pre-training, VecoCare further aggregates information from both similar and dissimilar patients through a dual-channel retrieval mechanism. We conduct diagnosis prediction experiments on two real-world datasets, which indicates that VecoCare outperforms state-of-the-art approaches. Moreover, the findings discovered by VecoCare are consistent with the medical researches.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

A Unified Model for Multi-class Anomaly Detection

Zhiyuan You
Lei Cui
Yujun Shen
Kai Yang
Xin Lu
Yu Zheng
Xinyi Le

Despite the rapid advance of unsupervised anomaly detection, existing methods require to train separate models for different objects. In this work, we present UniAD that accomplishes anomaly detection for multiple classes with a unified framework. Under such a challenging setting, popular reconstruction networks may fall into an "identical shortcut", where both normal and anomalous samples can be well recovered, and hence fail to spot outliers. To tackle this obstacle, we make three improvements. First, we revisit the formulations of fully-connected layer, convolutional layer, as well as attention layer, and confirm the important role of query embedding (i. e. , within attention layer) in preventing the network from learning the shortcut. We therefore come up with a layer-wise query decoder to help model the multi-class distribution. Second, we employ a neighbor masked attention module to further avoid the information leak from the input feature to the reconstructed output feature. Third, we propose a feature jittering strategy that urges the model to recover the correct message even with noisy inputs. We evaluate our algorithm on MVTec-AD and CIFAR-10 datasets, where we surpass the state-of-the-art alternatives by a sufficiently large margin. For example, when learning a unified model for 15 categories in MVTec-AD, we surpass the second competitor on the tasks of both anomaly detection (from 88. 1% to 96. 5%) and anomaly localization (from 89. 5% to 96. 8%). Code is available at https: //github. com/zhiyuanyou/UniAD.

PDF Details

NeurIPS Conference 2022 Conference Paper

Distributed Distributionally Robust Optimization with Non-Convex Objectives

Yang Jiao
Kai Yang
Dongjin Song

Distributionally Robust Optimization (DRO), which aims to find an optimal decision that minimizes the worst case cost over the ambiguity set of probability distribution, has been applied in diverse applications, e. g. , network behavior analysis, risk management, etc. However, existing DRO techniques face three key challenges: 1) how to deal with the asynchronous updating in a distributed environment; 2) how to leverage the prior distribution effectively; 3) how to properly adjust the degree of robustness according to difference scenarios. To this end, we propose an asynchronous distributed algorithm, named Asynchronous Single-looP alternatIve gRadient projEction (ASPIRE) algorithm with the itErative Active SEt method (EASE) to tackle the distributed distributionally robust optimization (DDRO) problem. Furthermore, a new uncertainty set, i. e. , constrained $D$-norm uncertainty set, is developed to effectively leverage the prior distribution and flexibly control the degree of robustness. Finally, our theoretical analysis elucidates that the proposed algorithm is guaranteed to converge and the iteration complexity is also analyzed. Extensive empirical studies on real-world datasets demonstrate that the proposed method can not only achieve fast convergence, remain robust against data heterogeneity and malicious attacks, but also tradeoff robustness with performance.

PDF Details

AAAI Conference 2020 Conference Paper

COTSAE: CO-Training of Structure and Attribute Embeddings for Entity Alignment

Kai Yang
Shaoqin Liu
Junfeng Zhao
Yasha Wang
Bing Xie

Entity alignment is a fundamental and vital task in Knowledge Graph (KG) construction and fusion. Previous works mainly focus on capturing the structural semantics of entities by learning the entity embeddings on the relational triples and pre-aligned ”seed entities”. Some works also seek to incorporate the attribute information to assist reﬁning the entity embeddings. However, there are still many problems not considered, which dramatically limits the utilization of attribute information in the entity alignment. Different KGs may have lots of different attribute types, and even the same attribute may have diverse data structures and value granularities. Most importantly, attributes may have various ”contributions” to the entity alignment. To solve these problems, we propose COTSAE that combines the structure and attribute information of entities by co-training two embedding learning components, respectively. We also propose a joint attention method in our model to learn the attentions of attribute types and values cooperatively. We veriﬁed our COTSAE on several datasets from real-world KGs, and the results showed that it is signiﬁcantly better than the latest entity alignment methods. The structure and attribute information can complement each other and both contribute to performance improvement.

PDF Details

ICRA Conference 2020 Conference Paper

Learning Affordance Space in Physical World for Vision-based Robotic Object Manipulation

Huadong Wu
Zhanpeng Zhang
Hui Cheng
Kai Yang
Jiaming Liu
Ziying Guo

What is a proper representation for objects in manipulation? What would human try to perceive when manipulating a new object in a new environment? In fact, instead of focusing on the texture and illumination, human can infer the "affordance" [36] of the objects from vision. Here "affordance" describes the object's intrinsic property that affords a particular type of manipulation. In this work, we investigate whether such affordance can be learned by a deep neural network. In particular, we propose an Affordance Space Perception Network (ASPN) that takes an image as input and outputs an affordance map. Different from existing works that infer the pixel-wise probability affordance map in image space, our affordance is defined in the real world space, thus eliminates the need of hand-eye calibration. In addition, we extend the representation ability of affordance by defining it in a 3D affordance space and propose a novel training strategy to improve the performance. Trained purely with simulation data, ASPN can achieve significant performance in the real world. It is a task-agnostic framework and can handle different objects, scenes and viewpoints. Extensive real-world experiments demonstrate the accuracy and robustness of our approach. We achieve the success rates of 94. 2% for singular-object pushing and 92. 4% for multiple-object pushing. We also achieve the success rates of 97. 2% for singular-object grasping and 95. 4% for multiple-object grasping, which outperform current state-of-the-art methods.

Details

TCS Journal 2018 Journal Article

A compiler for MSVL and its applications

Kai Yang
Zhenhua Duan
Cong Tian
Nan Zhang

Details DOI

EAAI Journal 2018 Journal Article

Modeling and optimization of a road–rail intermodal transport system under uncertain information

Rui Wang
Kai Yang
Lixing Yang
Ziyou Gao

Details DOI

IJCAI Conference 2017 Conference Paper

Autoencoder Regularized Network For Driving Style Representation Learning

Weishan Dong
Ting Yuan
Kai Yang
Changsheng Li
Shilei Zhang

In this paper, we study learning generalized driving style representations from automobile GPS trip data. We propose a novel Autoencoder Regularized deep neural Network (ARNet) and a trip encoding framework trip2vec to learn drivers' driving styles directly from GPS records, by combining supervised and unsupervised feature learning in a unified architecture. Experiments on a challenging driver number estimation problem and the driver identification problem show that ARNet can learn a good generalized driving style representation: It significantly outperforms existing methods and alternative architectures by reaching the least estimation error on average (0. 68, less than one driver) and the highest identification accuracy (by at least 3% improvement) compared with traditional supervised learning methods.

PDF Details

AAAI Conference 2017 Conference Paper

TaGiTeD: Predictive Task Guided Tensor Decomposition for Representation Learning from Electronic Health Records

Kai Yang
Xiang Li
Haifeng Liu
Jing Mei
Guotong Xie
Junfeng Zhao
Bing Xie
Fei Wang

With the better availability of healthcare data, such as Electronic Health Records (EHR), more and more data analytics methodologies are developed aiming at digging insights from them to improve the quality of care delivery. There are many challenges on analyzing EHR, such as high dimensionality and event sparsity. Moreover, different from other application domains, the EHR analysis algorithms need to be highly interpretable to make them clinically useful. This makes representation learning from EHRs of key importance. In this paper, we propose an algorithm called Predictive Task Guided Tensor Decomposition (TaGiTeD), to analyze EHRs. Speciﬁcally, TaGiTeD learns event interaction patterns that are highly predictive for certain tasks from EHRs with supervised tensor decomposition. Compared with unsupervised methods, TaGiTeD can learn effective EHR representations in a more focused way. This is crucial because most of the medical problems have very limited patient samples, which are not enough for unsupervised algorithms to learn meaningful representations form. We apply TaGiTeD on real world EHR data warehouse and demonstrate that TaGiTeD can learn representations that are both interpretable and predictive.

PDF Details

EAAI Journal 2011 Journal Article

Development of a fuzzy goal programming model for optimization of lead time and cost in an overlapped product development project using a Gaussian Adaptive Particle Swarm Optimization-based approach

Satish K. Tyagi
Kai Yang
Annu Tyagi
Suren N. Dwivedi

Details DOI