Author name cluster

Yanhao Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

1 author row

AAAI Conference 2026 Conference Paper

FILTER: A Framework for Defending Against Backdoor Attacks in Vertical Federated Learning

Zhanyi Hu
Cen Chen
Yanhao Wang

Vertical Federated Learning (VFL) is a distributed machine learning paradigm in which participants train models with vertically partitioned data. Many previous studies have identified backdoor vulnerabilities in VFL systems. However, limited effort has been devoted to developing defenses against such attacks. Unlike centralized machine learning or horizontal FL, VFL poses new challenges for defending against backdoor attacks, particularly because the central server lacks control over the entire model. In this paper, we first explore defenses against backdoor attacks in VFL when the attacker possesses sufficient knowledge of the label information. Specifically, we propose FILTER, a framework for defending against backdoor attacks in VFL to ensure the integrity of VFL systems during training in the presence of malicious participants. To address backdoor risks in VFL, it incorporates two novel filters: an embedding-based filter and a loss-based filter, which effectively identify and remove poisoned samples in later stages of training. Through extensive experiments on five benchmark datasets against four state-of-the-art backdoor attacks, we demonstrate that FILTER significantly reduces the success rate of attacks while maintaining accuracy on clean data close to that of the models trained without such defenses.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Enhancing Portfolio Optimization via Heuristic-Guided Inverse Reinforcement Learning with Multi-Objective Reward and Graph-based Policy Learning

Wenyi Zhang
Renjun Jia
Yanhao Wang
Dawei Cheng
Minghao Zhao
Cen Chen

Portfolio optimization encounters persistent challenges in adapting to dynamic markets due to static assumptions and high-dimensional decision spaces. Although reinforcement learning (RL) has emerged as a potential solution, conventional reward engineering often fails to capture complex market dynamics. Recent advances in deep RL and graph neural networks have attempted to enhance market microstructure modeling. However, these methods still struggle with the systematic integration of financial knowledge. To address the above issues, we propose a novel heuristic-guided inverse reinforcement learning framework for portfolio optimization. Specifically, our framework provides an interpretable expert strategy generation mechanism that takes into account sector diversification and correlation constraints. Then, a multi-objective reward optimization method is adopted to adaptively strike a balance between returns and risks. Furthermore, it also utilizes heterogeneous graph policy learning with hierarchical attention mechanisms to explicitly model inter-stock relationships. Finally, we conduct extensive experiments on real-world financial market data to demonstrate that our framework outperforms several state-of-the-art deep learning and RL baselines in terms of risk-adjusted returns. We provide case studies to showcase the ability of our framework to balance return maximization and risk containment. Our code is publicly available at https: //github. com/ChloeWenyiZhang/SmartFolio/.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Individually Fair Diversity Maximization

Ruien Li
Yanhao Wang

We consider the problem of diversity maximization from the perspective of individual fairness: given a set $P$ of $n$ points in a metric space, we aim to extract a subset $S$ of size $k$ from $P$ so that (1) the diversity of $S$ is maximized and (2) $S$ is \emph{individually fair} in the sense that every point in $P$ has at least one of its $\frac{n}{k}$-nearest neighbors as its ``representative'' in $S$. We propose $\left(O(1), 3\right)$-bicriteria approximation algorithms for the individually fair variants of the three most common diversity maximization problems, namely, max-min diversification, max-sum diversification, and sum-min diversification. Specifically, the proposed algorithms provide a set of points where every point in the dataset finds a point within a distance at most $3$ times its distance to its $\frac{n}{k}$-nearest neighbor while achieving a diversity value at most $O(1)$ times lower than the optimal solution. Numerical experiments on real-world and synthetic datasets demonstrate that the proposed algorithms generate solutions that are individually fairer than those produced by unconstrained algorithms and incur only modest losses in diversity.

PDF Details

NeurIPS Conference 2025 Conference Paper

Localized Data Shapley: Accelerating Valuation for Nearest Neighbor Algorithms

Guangyi Zhang
Yanhao Wang
Chengliang Chai
Qiyu Liu
Wei Wang

Data Shapley values provide a principled approach for quantifying the contribution of individual training examples to machine learning models. However, computing these values often requires computational complexity that is exponential in the data size, and this has led researchers to pursue efficient algorithms tailored to specific machine learning models. Building on the prior success of the Shapley valuation for $K$-nearest neighbor (KNN) models, in this paper, we introduce a localized data Shapley framework that significantly accelerates the valuation of data points. Our approach leverages the distance-based local structure in the data space to decompose the global valuation problem into smaller, localized computations. Our primary contribution is an efficient valuation algorithm for a threshold-based KNN variant and shows that it provides provable speedups over the baseline under mild assumptions. Extensive experiments on real-life datasets demonstrate that our methods achieve a substantial speedup compared to previous approaches.

PDF Details

TMLR Journal 2024 Journal Article

Fair Representation in Submodular Subset Selection: A Pareto Optimization Approach

Adriano Fazzone
Yanhao Wang
Francesco Bonchi

Many machine learning applications, such as feature selection, recommendation, and social advertising, require the joint optimization of the global utility and the representativeness for different groups of items or users. To meet such requirements, we propose a novel multi-objective combinatorial optimization problem called Submodular Maximization with Fair Representation (SMFR), which selects subsets from a ground set, subject to a knapsack or matroid constraint, to maximize a submodular (utility) function $f$ as well as a set of $d$ submodular (representativeness) functions $g_1, \dots, g_d$. We show that the maximization of $f$ might conflict with the maximization of $g_1, \dots, g_d$, so that no single solution can optimize all these objectives at the same time. Therefore, we propose a Pareto optimization approach to SMFR, which finds a set of solutions to approximate all Pareto-optimal solutions with different trade-offs between the objectives. Our method converts an instance of SMFR into several submodular cover instances by adjusting the weights of the objective functions; then it computes a set of solutions by running the greedy algorithm on each submodular cover instance. We prove that our method provides approximation guarantees for SMFR under knapsack or matroid constraints. Finally, we demonstrate the effectiveness of SMFR and our proposed approach in two real-world problems: maximum coverage and recommendation.

PDF Details

AAAI Conference 2023 Conference Paper

Improved Algorithm for Regret Ratio Minimization in Multi-Objective Submodular Maximization

Yanhao Wang
Jiping Zheng
Fanxu Meng

Submodular maximization has attracted extensive attention due to its numerous applications in machine learning and artificial intelligence. Many real-world problems require maximizing multiple submodular objective functions at the same time. In such cases, a common approach is to select a representative subset of Pareto optimal solutions with different trade-offs among multiple objectives. To this end, in this paper, we investigate the regret ratio minimization (RRM) problem in multi-objective submodular maximization, which aims to find at most k solutions to best approximate all Pareto optimal solutions w.r.t. any linear combination of objective functions. We propose a novel HS-RRM algorithm by transforming RRM into HittingSet problems based on the notions of ε-kernel and δ-net, where any α-approximation algorithm for single-objective submodular maximization is used as an oracle. We improve upon the previous best-known bound on the maximum regret ratio (MRR) of the output of HS-RRM and show that the new bound is nearly asymptotically optimal for any fixed number d of objective functions. Experiments on real-world and synthetic data confirm that HS-RRM achieves lower MRRs than existing algorithms.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways (Extended Abstract)

Francesco Fabbri
Yanhao Wang
Francesco Bonchi
Carlos Castillo
Michael Mathioudakis

Recommender systems typically suggest to users content similar to what they consumed in the past. A user, if happening to be exposed to strongly polarized content, might be steered towards more and more radicalized content by subsequent recommendations, eventually being trapped in what we call a "radicalization pathway". In this paper, we investigate how to mitigate radicalization pathways using a graph-based approach. We model the set of recommendations in a what-to-watch-next (W2W) recommender as a directed graph, where nodes correspond to content items, links to recommendations, and paths to possible user sessions. We measure the segregation score of a node representing radicalized content as the expected length of a random walk from that node to any node representing non-radicalized content. A high segregation score thus implies a larger chance of getting users trapped in radicalization pathways. We aim to reduce the prevalence of radicalization pathways by selecting a small number of edges to rewire, so as to minimize the maximum of segregation scores among all radicalized nodes while maintaining the relevance of recommendations. We propose an efficient yet effective greedy heuristic based on the absorbing random walk theory for the rewiring problem. Our experiments on real-world datasets confirm the effectiveness of our proposal.

PDF Details DOI

AAAI Conference 2023 Conference Paper

SAH: Shifting-Aware Asymmetric Hashing for Reverse k Maximum Inner Product Search

Qiang Huang
Yanhao Wang
Anthony K. H. Tung

This paper investigates a new yet challenging problem called Reverse k-Maximum Inner Product Search (RkMIPS). Given a query (item) vector, a set of item vectors, and a set of user vectors, the problem of RkMIPS aims to find a set of user vectors whose inner products with the query vector are one of the k largest among the query and item vectors. We propose the first subquadratic-time algorithm, i.e., Shifting-aware Asymmetric Hashing (SAH), to tackle the RkMIPS problem. To speed up the Maximum Inner Product Search (MIPS) on item vectors, we design a shifting-invariant asymmetric transformation and develop a novel sublinear-time Shifting-Aware Asymmetric Locality Sensitive Hashing (SA-ALSH) scheme. Furthermore, we devise a new blocking strategy based on the Cone-Tree to effectively prune user vectors (in a batch). We prove that SAH achieves a theoretical guarantee for solving the RMIPS problem. Experimental results on five real-world datasets show that SAH runs 4~8x faster than the state-of-the-art methods for RkMIPS while achieving F1-scores of over 90%. The code is available at https://github.com/HuangQiang/SAH.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Yet Another Traffic Classifier: A Masked Autoencoder Based Traffic Transformer with Multi-Level Flow Representation

Ruijie Zhao
Mingwei Zhan
Xianwen Deng
Yanhao Wang
Yijun Wang
Guan Gui
Zhi Xue

Traffic classification is a critical task in network security and management. Recent research has demonstrated the effectiveness of the deep learning-based traffic classification method. However, the following limitations remain: (1) the traffic representation is simply generated from raw packet bytes, resulting in the absence of important information; (2) the model structure of directly applying deep learning algorithms does not take traffic characteristics into account; and (3) scenario-specific classifier training usually requires a labor-intensive and time-consuming process to label data. In this paper, we introduce a masked autoencoder (MAE) based traffic transformer with multi-level flow representation to tackle these problems. To model raw traffic data, we design a formatted traffic representation matrix with hierarchical flow information. After that, we develop an efficient Traffic Transformer, in which packet-level and flow-level attention mechanisms implement more efficient feature extraction with lower complexity. At last, we utilize the MAE paradigm to pre-train our classifier with a large amount of unlabeled data, and perform fine-tuning with a few labeled data for a series of traffic classification tasks. Experiment findings reveal that our method outperforms state-of-the-art methods on five real-world traffic datasets by a large margin. The code is available at https://github.com/NSSL-SJTU/YaTC.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

3E-Solver: An Effortless, Easy-to-Update, and End-to-End Solver with Semi-Supervised Learning for Breaking Text-Based Captchas

Xianwen Deng
Ruijie Zhao
Yanhao Wang
Libo Chen
Yijun Wang
Zhi Xue

Text-based captchas are the most widely used security mechanism currently. Due to the limitations and specificity of the segmentation algorithm, the early segmentation-based attack method has been unable to deal with the current captchas with newly introduced security features (e. g. , occluding lines and overlapping). Recently, some works have designed captcha solvers based on deep learning methods with powerful feature extraction capabilities, which have greater generality and higher accuracy. However, these works still suffer from two main intrinsic limitations: (1) many labor costs are required to label the training data, and (2) the solver cannot be updated with unlabeled data to recognize captchas more accurately. In this paper, we present a novel solver using improved FixMatch for semi-supervised captcha recognition to tackle these problems. Specifically, we first build an end-to-end baseline model to effectively break text-based captchas by leveraging encoder-decoder architecture and attention mechanism. Then we construct our solver with a few labeled samples and many unlabeled samples by improved FixMatch, which introduces teacher forcing, adaptive batch normalization, and consistency loss to achieve more effective training. Experiment results show that our solver outperforms state-of-the-arts by a large margin on current captcha schemes. We hope that our work can help security experts to revisit the design and usability of text-based captchas. The source code of this work is available at https: //github. com/SJTU-dxw/3E-Solver-CAPTCHA.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Blindfolded Attackers Still Threatening: Strict Black-Box Adversarial Attacks on Graphs

Jiarong Xu
Yizhou Sun
Xin Jiang
Yanhao Wang
Chunping Wang
Jiangang Lu
Yang Yang

Adversarial attacks on graphs have attracted considerable research interests. Existing works assume the attacker is either (partly) aware of the victim model, or able to send queries to it. These assumptions are, however, unrealistic. To bridge the gap between theoretical graph attacks and real-world scenarios, in this work, we propose a novel and more realistic setting: strict black-box graph attack, in which the attacker has no knowledge about the victim model at all and is not allowed to send any queries. To design such an attack strategy, we first propose a generic graph filter to unify different families of graph-based models. The strength of attacks can then be quantified by the change in the graph filter before and after attack. By maximizing this change, we are able to find an effective attack strategy, regardless of the underlying model. To solve this optimization problem, we also propose a relaxation technique and approximation theories to reduce the difficulty as well as the computational expense. Experiments demonstrate that, even with no exposure to the model, the Macro-F1 drops 5. 5% in node classification and 29. 5% in graph classification, which is a significant result compared with existent works.

PDF Details