Arrow Research search

Author name cluster

Lin Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

63 papers
2 author rows

Possible papers

63

AAAI Conference 2026 Conference Paper

CoMA-SLAM: Collaborative Multi-Agent Gaussian SLAM with Geometric Consistency

  • Lin Chen
  • Yongxin Su
  • Jvboxi Wang
  • Pengcheng Han
  • Zhenyu Xia
  • Shuhui Bu
  • Kun Li
  • Boni Hu

Although Gaussian scene representation has achieved remarkable success in tracking and mapping, most existing methods are confined to single-agent systems. Current multi-agent solutions typically rely on centralized architectures, which struggle to account for communication bandwidth constraints. Furthermore, the inherent depth ambiguity of 3D Gaussian splatting poses notable challenges in maintaining geometric consistency. To address these challenges, we introduce CoMA-SLAM, the first distributed multi-agent Gaussian SLAM framework. By leveraging 2D Gaussian surfels and robust initialization strategy, CoMA-SLAM enhances tracking accuracy and geometry consistency. It efficiently manages communication bandwidth while dynamically scaling with the number of agents. Through the integration of intra- and inter-loop closure, distributed keyframe optimization and submap centric update, our framework ensures global consistency and robustly alignment. Synthetic and real-world experiments demonstrate that CoMA-SLAM outperforms state-of-the-art methods in pose accuracy, rendering fidelity, and geometric consistency while maintaining competitive efficiency across distributed multi-agent systems. Notably, by avoiding data transmission to a centralized server, our method reduces communication bandwidth by 99.8% compared to centralized approaches.

JBHI Journal 2026 Journal Article

Myocardial Infarction Detection with Incomplete Multi-View Data via Dual-Branch Gating Completion and Dirichlet Weighting

  • Yadi Wang
  • Yulin Xie
  • Yi Xie
  • Lin Chen
  • Bingbing Jiang

Multi-view myocardial infarction (MI) detection often relies on electrocardiogram (ECG) data, which suffers from noise sensitivity and limited early diagnostic value. Echocardiography provides richer temporal-spatial information but frequently encounters missing views due to clinical acquisition constraints. Existing fusion methods typically adopt static or overly complex dynamic weighting, limiting their adaptability to varying view quality and hindering real-time applicability. To this end, this paper proposes a view completion method based on a dual-branch gating structure, combining Transformer and Graph Neural Network (GNN) to complete incomplete multi-view information. Specifically, the model uses the Transformer encoder to model global temporal dependencies, introduces GNN to strengthen the local structural relationship between views, and uses the gating mechanism to achieve collaborative completion of the aforementioned two core components. Furthermore, this paper designs an uncertainty-driven dynamic weighted fusion strategy based on Dirichlet distribution, which can adaptively adjust the fusion weights according to the prediction confidence of each view, overcoming the limitations of traditional static weighting. Experiments on the HMC-QU dataset show that the proposed method achieves 92. 31% accuracy, 90. 00% precision, and 100. 00% specificity, outperforming state-of-the-art models and demonstrating strong potential for clinical deployment.

ICML Conference 2025 Conference Paper

Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation

  • Michal Lukasik
  • Lin Chen
  • Harikrishna Narasimhan
  • Aditya Krishna Menon
  • Wittawat Jitkrittum
  • Felix X. Yu
  • Sashank J. Reddi
  • Gang Fu

Bipartite ranking is a fundamental supervised learning problem, with the goal of learning a ranking over instances with maximal area under the ROC curve (AUC) against a single binary target label. However, one may often observe multiple binary target labels, e. g. , from distinct human annotators. How can one synthesize such labels into a single coherent ranking? In this work, we formally analyze two approaches to this problem—loss aggregation and label aggregation—by characterizing their Bayes-optimal solutions. We show that while both approaches can yield Pareto-optimal solutions, loss aggregation can exhibit label dictatorship: one can inadvertently (and undesirably) favor one label over others. This suggests that label aggregation can be preferable to loss aggregation, which we empirically verify.

IROS Conference 2025 Conference Paper

Decentralized Multi-robot Navigation Policy with Enhanced Security Using Graph GRU Policy Network

  • Lin Chen
  • Yuxuan Ao
  • Zhen Zhou
  • Yaonan Wang
  • Danwei Wang

Formulating a multi-robot obstacle avoidance policy is essential for enabling safe and efficient navigation in multi-robot environments, forming a critical component of the effective operation of multi-robot systems. Recently, reinforcement learning has been applied to improve the performance of decentralized, policy-driven robots in task execution. However, ensuring the safety of these agents during movement remains a significant challenge due to the inherent risks associated with the reinforcement learning process, such as frequent collisions. To address this issue and enhance the safety of policy-guided multi-robot navigation, we propose a novel policy based on imitation learning. This framework introduces a novel policy neural network that integrates a graph attention mechanism with the GRU network structure. The key innovation lies in utilizing the interactions between neighboring robots to enhance the safety of their movements. In a multi-robot simulation environment, robot behaviors are directed by the proposed policy. A comparative analysis was conducted between our approach and RL-RVO, one of the advanced methods in the field. The results demonstrate that our approach outperforms RL-RVO, achieving a higher success rate and significantly improving safety performance.

NeurIPS Conference 2025 Conference Paper

Spark Transformer: Reactivating Sparsity in Transformer FFN and Attention

  • Chong You
  • Kan Wu
  • Zhipeng Jia
  • Lin Chen
  • Srinadh Bhojanapalli
  • Jiaxian Guo
  • Utku Evci
  • Jan Wassenberg

The discovery of the *lazy neuron phenomenon* (Li et al. , 2022), where fewer than 10% of the feedforward networks (FFN) parameters in trained Transformers are activated per token, has spurred significant interests in *activation sparsity* for enhancing large model efficiency. While notable progress has been made in translating such sparsity to wall-time benefits across CPUs, GPUs, and TPUs, modern Transformers have moved away from the ReLU activation function crucial to this phenomenon. Existing efforts on re-introducing activation sparsity, e. g. , by reverting to ReLU or applying top-k masking, often degrade model quality, increase parameter count, or complicate training. Sparse attention, the application of sparse activation to the attention mechanism, often face similar challenges. This paper introduces the Spark Transformer, a novel architecture that achieves high activation sparsity in both FFN and the attention mechanism while maintaining model quality, parameter count, and standard training procedures. Our method realizes sparsity via top-$k$ masking for explicit control over sparsity level. Crucially, we introduce *statistical top-k*, a hardware-accelerator-friendly, linear-time approximate algorithm that avoids costly sorting and mitigates significant training slowdown from standard top-k operators. Furthermore, Spark Transformer reallocates existing FFN parameters and attention key embeddings to form a low-cost predictor for identifying activated entries. This design not only mitigates quality loss from enforced sparsity, but also enhances wall-time benefit. Pretrained with the Gemma-2 recipe, Spark Transformer demonstrates competitive performance on standard benchmarks while exhibiting significant sparsity: only 8\% of FFN neurons are activated, and each token attends to a maximum of 256 tokens. This translates to a 2. 5x reduction in FLOPs, leading to decoding wall-time speedups of up to 1. 79x on CPU and 1. 40xon GPU.

AAAI Conference 2025 Conference Paper

VFM-Adapter: Adapting Visual Foundation Models for Dense Prediction with Dynamic Hybrid Operation Mapping

  • Zheng Chen
  • Yu Zeng
  • Zehui Chen
  • Hongzhi Gao
  • Lin Chen
  • Jiaming Liu
  • Feng Zhao

Although pre-trained large vision foundation models (VFM) yield superior results on various downstream tasks, full fine-tuning is often impractical due to its high computational cost and storage requirements. Recent advancements in parameter-efficient fine-tuning (PEFT) of VFM for image classification show significant promise. However, the application of PEFT techniques to dense prediction tasks remains largely unexplored. Our analysis of existing methods reveals that the underlying premise of utilizing low-rank parameter matrices, despite their efficacy in specific applications, may not be adequately suitable for dense prediction tasks. To this end, we propose a novel PEFT learning approach tailored for dense prediction tasks, namely VFM-Adapter. Specifically, the VFM-Adapter introduces a hybrid operation mapping technique that seamlessly integrates local information with global modeling to the adapter module. It capitalizes on the distinct inductive biases inherent in different operations. Additionally, we dynamically generate parameters for the VFM-Adapter, enabling flexibility of feature extraction given specific inputs. To validate the efficacy of VFM-Adapter, we conduct extensive experiments across object detection, semantic segmentation, and instance segmentation tasks. Results on multiple benchmarks consistently demonstrate the superiority of our method over previous approaches. Notably, with only three percent of the trainable parameters of the SAM-Base backbone, our approach achieves competitive or even superior performance compared to full fine-tuning. The code will be available.

NeurIPS Conference 2025 Conference Paper

VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning

  • Qiuchen Wang
  • Ruixue Ding
  • Yu Zeng
  • Zehui Chen
  • Lin Chen
  • Shihang Wang
  • Pengjun Xie
  • Fei Huang

Effectively retrieving, reasoning and understanding visually rich information remains a challenge for traditional Retrieval-Augmented Generation (RAG) methods. On the one hand, traditional text-based methods cannot handle visual-related information. On the other hand, current vision-based RAG approaches are often limited by fixed pipelines and frequently struggle to reason effectively due to the insufficient activation of the fundamental capabilities of models. As reinforcement learning (RL) has been proven to be beneficial for model reasoning, we introduce VRAG-RL, a novel RL framework tailored for complex reasoning across visually rich information. With this framework, VLMs interact with search engines, autonomously sampling single-turn or multi-turn reasoning trajectories with the help of visual perception tokens and undergoing continual optimization based on these samples. Our approach highlights key limitations of RL in RAG domains: (i) Prior Multi-modal RAG approaches tend to merely incorporate images into the context, leading to insufficient reasoning token allocation and neglecting visual-specific perception; and (ii) When models interact with search engines, their queries often fail to retrieve relevant information due to the inability to articulate requirements, thereby leading to suboptimal performance. To address these challenges, we define an action space tailored for visually rich inputs, with actions including cropping and scaling, allowing the model to gather information from a coarse-to-fine perspective. Furthermore, to bridge the gap between users' original inquiries and the retriever, we employ a simple yet effective reward that integrates query rewriting and retrieval performance with a model-based reward. Our VRAG-RL optimizes VLMs for RAG tasks using specially designed RL strategies, aligning the model with real-world applications. Extensive experiments on diverse and challenging benchmarks show that our VRAG-RL outperforms existing methods by 20\% (Qwen2. 5-VL-7B) and 30\% (Qwen2. 5-VL-3B), demonstrating the effectiveness of our approach. The code is available at https: //github. com/Alibaba-NLP/VRAG.

NeurIPS Conference 2024 Conference Paper

Are We on the Right Way for Evaluating Large Vision-Language Models?

  • Lin Chen
  • Jinsong Li
  • Xiaoyi Dong
  • Pan Zhang
  • Yuhang Zang
  • Zehui Chen
  • Haodong Duan
  • Jiaqi Wang

Large vision-language models (LVLMs) have recently achieved rapid progress, sparking numerous studies to evaluate their multi-modal capabilities. However, we dig into current evaluation works and identify two primary issues: 1) Visual content is unnecessary for many samples. The answers can be directly inferred from the questions and options, or the world knowledge embedded in LLMs. This phenomenon is prevalent across current benchmarks. For instance, GeminiPro achieves 42. 7% on the MMMU benchmark without any visual input, and outperforms the random choice baseline across six benchmarks near 24% on average. 2) Unintentional data leakage exists in LLM and LVLM training. LLM and LVLM could still answer some visual-necessary questions without visual content, indicating the memorizing of these samples within large-scale training data. For example, Sphinx-X-MoE gets 43. 6% on MMMU without accessing images, surpassing its LLM backbone with 17. 9%. Both problems lead to misjudgments of actual multi-modal gains and potentially misguide the study of LVLM. To this end, we present MMStar, an elite vision-indispensable multi-modal benchmark comprising 1, 500 samples meticulously selected by humans. MMStar benchmarks 6 core capabilities and 18 detailed axes, aiming to evaluate LVLMs' multi-modal capacities with carefully balanced and purified samples. These samples are first roughly selected from current benchmarks with an automated pipeline, human review is then involved to ensure each curated sample exhibits visual dependency, minimal data leakage, and requires advanced multi-modal capabilities. Moreover, two metrics are developed to measure data leakage and actual performance gain in multi-modal training. We evaluate 16 leading LVLMs on MMStar to assess their multi-modal capabilities, and on 7 benchmarks with the proposed metrics to investigate their data leakage and actual multi-modal gain.

ICLR Conference 2024 Conference Paper

Dynamic Neural Response Tuning

  • Tian Qiu
  • Wenxiang Xu
  • Lin Chen
  • Linyun Zhou
  • Zunlei Feng
  • Mingli Song

Artificial Neural Networks (ANNs) have gained widespread applications across various areas in recent years. The ANN design was initially inspired by principles of biology. The biological neural network's fundamental response process comprises information transmission and aggregation. The information transmission in biological neurons is often achieved by triggering action potentials that propagate through axons. ANNs utilize activation mechanisms to simulate such biological behavior. However, previous studies have only considered static response conditions, while the biological neuron's response conditions are typically dynamic, depending on multiple factors such as neuronal properties and the real-time environment. Therefore, the dynamic response conditions of biological neurons could help improve the static ones of existing activations in ANNs. Additionally, the biological neuron's aggregated response exhibits high specificity for different categories, allowing the nervous system to differentiate and identify objects. Inspired by these biological patterns, we propose a novel Dynamic Neural Response Tuning (DNRT) mechanism, which aligns the response patterns of ANNs with those of biological neurons. DNRT comprises Response-Adaptive Activation (RAA) and Aggregated Response Regularization (ARR), mimicking the biological neuron's information transmission and aggregation behaviors. RAA dynamically adjusts the response condition based on the characteristics and strength of the input signal. ARR is devised to enhance the network's ability to learn category specificity by imposing constraints on the network's response distribution. Extensive experimental studies indicate that the proposed DNRT is highly interpretable, applicable to various mainstream network architectures, and can achieve remarkable performance compared with existing neural response mechanisms in multiple tasks and domains. Code is available at https://github.com/horrible-dong/DNRT.

AAAI Conference 2024 Conference Paper

Federated Learning with Extremely Noisy Clients via Negative Distillation

  • Yang Lu
  • Lin Chen
  • Yonggang Zhang
  • Yiliang Zhang
  • Bo Han
  • Yiu-ming Cheung
  • Hanzi Wang

Federated learning (FL) has shown remarkable success in cooperatively training deep models, while typically struggling with noisy labels. Advanced works propose to tackle label noise by a re-weighting strategy with a strong assumption, i.e., mild label noise. However, it may be violated in many real-world FL scenarios because of highly contaminated clients, resulting in extreme noise ratios, e.g., >90%. To tackle extremely noisy clients, we study the robustness of the re-weighting strategy, showing a pessimistic conclusion: minimizing the weight of clients trained over noisy data outperforms re-weighting strategies. To leverage models trained on noisy clients, we propose a novel approach, called negative distillation (FedNed). FedNed first identifies noisy clients and employs rather than discards the noisy clients in a knowledge distillation manner. In particular, clients identified as noisy ones are required to train models using noisy labels and pseudo-labels obtained by global models. The model trained on noisy labels serves as a ‘bad teacher’ in knowledge distillation, aiming to decrease the risk of providing incorrect information. Meanwhile, the model trained on pseudo-labels is involved in model aggregation if not identified as a noisy client. Consequently, through pseudo-labeling, FedNed gradually increases the trustworthiness of models trained on noisy clients, while leveraging all clients for model aggregation through negative distillation. To verify the efficacy of FedNed, we conduct extensive experiments under various settings, demonstrating that FedNed can consistently outperform baselines and achieve state-of-the-art performance.

ICLR Conference 2024 Conference Paper

Learning from Aggregate responses: Instance Level versus Bag Level Loss Functions

  • Adel Javanmard
  • Lin Chen
  • Vahab Mirrokni
  • Ashwinkumar Badanidiyuru
  • Gang Fu

Due to the rise of privacy concerns, in many practical applications, the training data is aggregated before being shared with the learner to protect the privacy of users' sensitive responses. In an aggregate learning framework, the dataset is grouped into bags of samples, where each bag is available only with an aggregate response, providing a summary of individuals' responses in that bag. In this paper, we study two natural loss functions for learning from aggregate responses: the bag-level loss and the instance-level loss. In the former, the model is learned by minimizing a loss between the aggregate responses and aggregate model predictions, while in the latter, the model aims to fit individual predictions to the aggregate responses. In this work, we show that the instance-level loss can be perceived as a regularized form of the bag-level loss. This observation allows us to compare the two approaches with respect to the bias and variance of the resulting estimators and to introduce a novel interpolating estimator that combines the two approaches. For linear regression tasks, we provide a precise characterization of the risk of the interpolating estimator in an asymptotic regime where the size of the training set grows in proportion to the feature dimension. Our analysis enables us to theoretically understand the effect of different factors, such as bag size, on the model's prediction risk. Additionally, we propose a mechanism for differentially private learning from aggregate responses and derive the optimal bag size in terms of the prediction risk-privacy trade-off. We also carry out thorough experiments to corroborate our theory and show the efficacy of the interpolating estimator.

AAAI Conference 2024 Conference Paper

Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection

  • Hongzhi Gao
  • Zheng Chen
  • Zehui Chen
  • Lin Chen
  • Jiaming Liu
  • Shanghang Zhang
  • Feng Zhao

Training high-accuracy 3D detectors necessitates massive labeled 3D annotations with 7 degree-of-freedom, which is laborious and time-consuming. Therefore, the form of point annotations is proposed to offer significant prospects for practical applications in 3D detection, which is not only more accessible and less expensive but also provides strong spatial information for object localization. In this paper, we empirically discover that it is non-trivial to merely adapt Point-DETR to its 3D form, encountering two main bottlenecks: 1) it fails to encode strong 3D prior into the model, and 2) it generates low-quality pseudo labels in distant regions due to the extreme sparsity of LiDAR points. To overcome these challenges, we introduce Point-DETR3D, a teacher-student framework for weakly semi-supervised 3D detection, designed to fully capitalize on point-wise supervision within a constrained instance-wise annotation budget. Different from Point-DETR which encodes 3D positional information solely through a point encoder, we propose an explicit positional query initialization strategy to enhance the positional prior. Considering the low quality of pseudo labels at distant regions produced by the teacher model, we enhance the detector's perception by incorporating dense imagery data through a novel Cross-Modal Deformable RoI Fusion (D-RoI). Moreover, an innovative point-guided self-supervised learning technique is proposed to allow for fully exploiting point priors, even in student models. Extensive experiments on representative nuScenes dataset demonstrate our Point-DETR3D obtains significant improvements compared to previous works. Notably, with only 5% of labeled data, Point-DETR3D achieves over 90% performance of its fully supervised counterpart.

ICLR Conference 2024 Conference Paper

On Bias-Variance Alignment in Deep Models

  • Lin Chen
  • Michal Lukasik
  • Wittawat Jitkrittum
  • Chong You
  • Sanjiv Kumar

Classical wisdom in machine learning holds that the generalization error can be decomposed into bias and variance, and these two terms exhibit a \emph{trade-off}. However, in this paper, we show that for an ensemble of deep learning based classification models, bias and variance are \emph{aligned} at a sample level, where squared bias is approximately \emph{equal} to variance for correctly classified sample points. We present empirical evidence confirming this phenomenon in a variety of deep learning models and datasets. Moreover, we study this phenomenon from two theoretical perspectives: calibration and neural collapse. We first show theoretically that under the assumption that the models are well calibrated, we can observe the bias-variance alignment. Second, starting from the picture provided by the neural collapse theory, we show an approximate correlation between bias and variance.

NeurIPS Conference 2024 Conference Paper

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

  • Yuxuan Qiao
  • Haodong Duan
  • Xinyu Fang
  • Junming Yang
  • Lin Chen
  • Songyang Zhang
  • Jiaqi Wang
  • Dahua Lin

Vision Language Models (VLMs) demonstrate remarkable proficiency in addressing a wide array of visual questions, which requires strong perception and reasoning faculties. Assessing these two competencies independently is crucial for model refinement, despite the inherent difficulty due to the intertwined nature of seeing and reasoning in existing VLMs. To tackle this issue, we present Prism, an innovative framework designed to disentangle the perception and reasoning processes involved in visual question solving. Prism comprises two distinct stages: a perception stage that utilizes a VLM to extract and articulate visual information in textual form, and a reasoning stage that formulates responses based on the extracted visual information using a Large Language Model (LLM). This modular design enables the systematic comparison and assessment of both proprietary and open-source VLM for their perception and reasoning strengths. Our analytical framework provides several valuable insights, underscoring Prism's potential as a cost-effective solution for vision-language tasks. By combining a streamlined VLM focused on perception with a powerful LLM tailored for reasoning, Prism achieves superior results in general vision-language tasks while substantially cutting down on training and operational expenses. Quantitative evaluations show that Prism, when configured with a vanilla 2B LLaVA and freely accessible GPT-3. 5, delivers performance on par with VLMs $10 \times$ larger on the rigorous multimodal benchmark MMStar.

NeurIPS Conference 2024 Conference Paper

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

  • Lin Chen
  • Xilin Wei
  • Jinsong Li
  • Xiaoyi Dong
  • Pan Zhang
  • Yuhang Zang
  • Zehui Chen
  • Haodong Duan

We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. The series comprises: 1) ShareGPT4Video, 40K GPT4V annotated dense captions of videos with various lengths and sources, developed through carefully designed data filtering and annotating strategy. 2) ShareCaptioner-Video, an efficient and capable captioning model for arbitrary videos, with 4. 8M high-quality aesthetic videos annotated by it. 3) ShareGPT4Video-8B, a simple yet superb LVLM that reached SOTA performance on three advancing video benchmarks. To achieve this, taking aside the non-scalable costly human annotators, we find using GPT4V to caption video with a naive multi-frame or frame-concatenation input strategy leads to less detailed and sometimes temporal-confused results. We argue the challenge of designing a high-quality video captioning strategy lies in three aspects: 1) Inter-frame precise temporal change understanding. 2) Intra-frame detailed content description. 3) Frame-number scalability for arbitrary-length videos. To this end, we meticulously designed a differential video captioning strategy, which is stable, scalable, and efficient for generating captions for videos with arbitrary resolution, aspect ratios, and length. Based on it, we construct ShareGPT4Video, which contains 40K high-quality videos spanning a wide range of categories, and the resulting captions encompass rich world knowledge, object attributes, camera movements, and crucially, detailed and precise temporal descriptions of events. Based on ShareGPT4Video, we further develop ShareCaptioner-Video, a superior captioner capable of efficiently generating high-quality captions for arbitrary videos. We annotated 4. 8M aesthetically appealing videos by it and verified their effectiveness on a 10-second text2video generation task. For video understanding, we verified the effectiveness of ShareGPT4Video on several current LVLM architectures and presented our superb new LVLM ShareGPT4Video-8B. All the models, strategies, and annotations will be open-sourced and we hope this project can serve as a pivotal resource for advancing both the LVLMs and T2VMs community.

AAAI Conference 2024 Conference Paper

ViT-Calibrator: Decision Stream Calibration for Vision Transformer

  • Lin Chen
  • Zhijie Jia
  • Lechao Cheng
  • Yang Gao
  • Jie Lei
  • Yijun Bei
  • Zunlei Feng

A surge of interest has emerged in utilizing Transformers in diverse vision tasks owing to its formidable performance. However, existing approaches primarily focus on optimizing internal model architecture designs that often entail significant trial and error with high burdens. In this work, we propose a new paradigm dubbed Decision Stream Calibration that boosts the performance of general Vision Transformers. To achieve this, we shed light on the information propagation mechanism in the learning procedure by exploring the correlation between different tokens and the relevance coefficient of multiple dimensions. Upon further analysis, it was discovered that 1) the final decision is associated with tokens of foreground targets, while token features of foreground target will be transmitted into the next layer as much as possible, and the useless token features of background area will be eliminated gradually in the forward propagation. 2) Each category is solely associated with specific sparse dimensions in the tokens. Based on the discoveries mentioned above, we designed a two-stage calibration scheme, namely ViT-Calibrator, including token propagation calibration stage and dimension propagation calibration stage. Extensive experiments on commonly used datasets show that the proposed approach can achieve promising results.

IJCAI Conference 2024 Conference Paper

VulnerabilityMap: An Open Framework for Mapping Vulnerability among Urban Disadvantaged Populations in the United States

  • Lin Chen
  • Yong Li
  • Pan Hui

Cities are crucibles of numerous opportunities, but also hotbeds of inequality. The plight of disadvantaged populations who are ``left behind'' within urban environments has been an increasingly pressing concern, which poses substantial threats to the realization of the UN SDG agenda. However, a comprehensive framework for studying this urban dilemma is currently absent, preventing researchers from developing AI models for social good prediction and intervention. To fill this gap, we construct VulnerabilityMap, a framework to meticulously dissect the challenges faced by urban disadvantaged populations, unraveling their vulnerability to a spectrum of shocks and stresses that are categorized through the prism of Maslow's hierarchy of needs. Specifically, we systematically collect large-scale multi-sourced census and web-based data covering more than 328 million people in the United States regarding demographic features, neighborhood environments, offline mobility behaviors, and online social connections. These features are further related to vulnerability outcomes from short-term shocks such as COVID-19 and long-term physiological, social, and self-actualization stresses. Leveraging our framework, we construct machine learning models that exhibit strong performance in predicting vulnerability outcomes from various disadvantage features, which shows the promising utility of our framework to support targeted AI models. Moreover, we provide model-based explainability analysis to interpret the reasons underlying model predictions, shedding light on intricate social factors that trap certain populations inside vulnerable situations. Our constructed dataset is publicly available at https: //github. com/LinChen-65/VulnerabilityMap/.

AAAI Conference 2023 Conference Paper

Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping

  • Yujie Zeng
  • Wenlong He
  • Ihor Vasyltsov
  • Jiali Pang
  • Lin Chen

Transformer models are widely used in AI applications such as Natural Language Processing (NLP), Computer Vision (CV), etc. However, enormous computation workload be-comes an obstacle to train large transformer models efficiently. Recently, some methods focus on reducing the computation workload during the training by skipping some layers. How-ever, these methods use simple probability distribution and coarse-grained probability calculation, which significantly affect the model accuracy. To address the issue, in this paper we propose a novel method to accelerate training—Sensitivity-Based Layer Dropping (SBLD). SBLD uses lay-er-wise sensitivity data to switch on/off transformer layers in proper order to keep high accuracy. Besides, we adjust the probability of skipping transformer layers with a scheduler to accelerate training speed and get faster convergence. Our results show that SBLD solves the accuracy drop issue com-pared with prior layer dropping methods. Our SBLD method can decrease end-to-end training time by 19.67% during training of GPT-3 Medium model, the same time increasing the accuracy by 1.65% w.r.t. baseline. Furthermore, for SwinV2-L model the obtained Top-1 and Top-5 accuracies are also higher vs. the baseline. Thus, the proposed method is efficient and practical to improve the large transformer model training.

JAAMAS Journal 2023 Journal Article

Electoral manipulation via influence: probabilistic model

  • Liangde Tao
  • Lin Chen
  • Weidong Shi

Abstract We consider a natural generalization of the fundamental electoral manipulation problem, where a briber can change the opinion or preference of voters through influence. This is motivated by modern political campaigns where candidates try to convince voters through media such as TV, newspaper, Internet. Compared with the classical bribery problem, we do not assume the briber will directly exchange money for votes from individual voters, but rather assume that the briber has a set of potential campaign strategies. Each campaign strategy represents some way of casting influence on voters. A campaign strategy has some cost and can influence a subset of voters. If a voter belongs to the audience of a campaign strategy, then he/she will be influenced. A voter will be more likely to change his/her opinion/preference if he/she has received influence from a larger number of campaign strategies. We model this through an independent activation model which is widely adopted in social science research and study the computational complexity. In this paper, we give a full characterization by showing NP-hardness results and establishing a near-optimal fixed-parameter tractable algorithm that gives a solution arbitrarily close to the optimal solution.

ICLR Conference 2023 Conference Paper

Sequential Attention for Feature Selection

  • Taisuke Yasuda 0002
  • MohammadHossein Bateni
  • Lin Chen
  • Matthew Fahrbach
  • Gang Fu
  • Vahab Mirrokni

Feature selection is the problem of selecting a subset of features for a machine learning model that maximizes model quality subject to a budget constraint. For neural networks, prior methods, including those based on $\ell_1$ regularization, attention, and other techniques, typically select the entire feature subset in one evaluation round, ignoring the residual value of features during selection, i.e., the marginal contribution of a feature given that other features have already been selected. We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks. This algorithm is based on an efficient one-pass implementation of greedy forward selection and uses attention weights at each step as a proxy for feature importance. We give theoretical insights into our algorithm for linear regression by showing that an adaptation to this setting is equivalent to the classical Orthogonal Matching Pursuit (OMP) algorithm, and thus inherits all of its provable guarantees. Our theoretical and empirical analyses offer new explanations towards the effectiveness of attention and its connections to overparameterization, which may be of independent interest.

NeurIPS Conference 2023 Conference Paper

Wyze Rule: Federated Rule Dataset for Rule Recommendation Benchmarking

  • Mohammad Mahdi Kamani
  • Yuhang Yao
  • Hanjia Lyu
  • Zhongwei Cheng
  • Lin Chen
  • Liangju Li
  • Carlee Joe-Wong
  • Jiebo Luo

In the rapidly evolving landscape of smart home automation, the potential of IoT devices is vast. In this realm, rules are the main tool utilized for this automation, which are predefined conditions or triggers that establish connections between devices, enabling seamless automation of specific processes. However, one significant challenge researchers face is the lack of comprehensive datasets to explore and advance the field of smart home rule recommendations. These datasets are essential for developing and evaluating intelligent algorithms that can effectively recommend rules for automating processes while preserving the privacy of the users, as it involves personal information about users' daily lives. To bridge this gap, we present the Wyze Rule Dataset, a large-scale dataset designed specifically for smart home rule recommendation research. Wyze Rule encompasses over 1 million rules gathered from a diverse user base of 300, 000 individuals from Wyze Labs, offering an extensive and varied collection of real-world data. With a focus on federated learning, our dataset is tailored to address the unique challenges of a cross-device federated learning setting in the recommendation domain, featuring a large-scale number of clients with widely heterogeneous data. To establish a benchmark for comparison and evaluation, we have meticulously implemented multiple baselines in both centralized and federated settings. Researchers can leverage these baselines to gauge the performance and effectiveness of their rule recommendation systems, driving advancements in the domain. The Wyze Rule Dataset is publicly accessible through HuggingFace 's dataset API.

NeurIPS Conference 2022 Conference Paper

Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation

  • Lin Chen
  • Zhixiang Wei
  • Xin Jin
  • Huaian Chen
  • Miao Zheng
  • Kai Chen
  • Yi Jin

In unsupervised domain adaptation (UDA), directly adapting from the source to the target domain usually suffers significant discrepancies and leads to insufficient alignment. Thus, many UDA works attempt to vanish the domain gap gradually and softly via various intermediate spaces, dubbed domain bridging (DB). However, for dense prediction tasks such as domain adaptive semantic segmentation (DASS), existing solutions have mostly relied on rough style transfer and how to elegantly bridge domains is still under-explored. In this work, we resort to data mixing to establish a deliberated domain bridging (DDB) for DASS, through which the joint distributions of source and target domains are aligned and interacted with each in the intermediate space. At the heart of DDB lies a dual-path domain bridging step for generating two intermediate domains using the coarse-wise and the fine-wise data mixing techniques, alongside a cross-path knowledge distillation step for taking two complementary models trained on generated intermediate samples as ‘teachers’ to develop a superior ‘student’ in a multi-teacher distillation manner. These two optimization steps work in an alternating way and reinforce each other to give rise to DDB with strong adaptation power. Extensive experiments on adaptive segmentation tasks with different settings demonstrate that our DDB significantly outperforms state-of-the-art methods.

TIST Journal 2022 Journal Article

Hierarchical Multi-agent Model for Reinforced Medical Resource Allocation with Imperfect Information

  • Qianyue Hao
  • Fengli Xu
  • Lin Chen
  • Pan Hui
  • Yong Li

With the advent of the COVID-19 pandemic, the shortage in medical resources became increasingly more evident. Therefore, efficient strategies for medical resource allocation are urgently needed. However, conventional rule-based methods employed by public health experts have limited capability in dealing with the complex and dynamic pandemic-spreading situation. In addition, model-based optimization methods such as dynamic programming (DP) fail to work since we cannot obtain a precise model in real-world situations most of the time. Model-free reinforcement learning (RL) is a powerful tool for decision-making; however, three key challenges exist in solving this problem via RL: (1) complex situations and countless choices for decision-making in the real world; (2) imperfect information due to the latency of pandemic spreading; and (3) limitations on conducting experiments in the real world since we cannot set up pandemic outbreaks arbitrarily. In this article, we propose a hierarchical RL framework with several specially designed components. We design a decomposed action space with a corresponding training algorithm to deal with the countless choices, ensuring efficient and real-time strategies. We design a recurrent neural network–based framework to utilize the imperfect information obtained from the environment. We also design a multi-agent voting method, which modifies the decision-making process considering the randomness during model training and, thus, improves the performance. We build a pandemic-spreading simulator based on real-world data, serving as the experimental platform. We then conduct extensive experiments. The results show that our method outperforms all baselines, which reduces infections and deaths by 14.25% on average without the multi-agent voting method and up to 15.44% with it.

AAMAS Conference 2022 Conference Paper

How Hard is Bribery in Elections with Randomly Selected Voters

  • Liangde Tao
  • Lin Chen
  • Lei Xu
  • Weidong Shi
  • Ahmed Sunny
  • Md Mahabub Uz Zaman

Many research works in computational social choice assume a fixed set of voters in an election and study the resistance of different voting rules against electoral manipulation. In recent years, however, a new technique known as random sample voting has been adopted in many multi-agent systems. One of the most prominent examples is blockchain. Many proof-of-stake based blockchain systems like Algorand will randomly select a subset of participants of the system to form a committee, and only the committee members will be involved in the decision of some important system parameters. This can be viewed as running an election where the voter committee (i. e. , the voters whose votes will be counted) is randomly selected. It is generally expected that the introduction of such randomness should make the election more resistant to electoral manipulation, despite the lack of theoretical analysis. In this paper, we present a systematic study on the resistance of an election with a randomly selected voter committee against bribery. Since the committee is randomly generated, by bribing any fixed subset of voters, the designated candidate may or may not win. Consequently, we consider the problem of finding a feasible solution that maximizes the winning probability of the designated candidate. We show that for most voting rules, this problem becomes extremely difficult for the briber as even finding any non-trivial solution with non-zero objective value becomes NP-hard. However, for plurality and veto, there exists a polynomial time approximation scheme that computes a near-optimal solution efficiently. The algorithm builds upon a novel integer programming formulation together with techniques from 𝑛-fold integer programming, which may be of a separate interest.

IJCAI Conference 2022 Conference Paper

Local Differential Privacy Meets Computational Social Choice - Resilience under Voter Deletion

  • Liangde Tao
  • Lin Chen
  • Lei Xu
  • Weidong Shi

The resilience of a voting system has been a central topic in computational social choice. Many voting rules, like plurality, are shown to be vulnerable as the attacker can target specific voters to manipulate the result. What if a local differential privacy (LDP) mechanism is adopted such that the true preference of a voter is never revealed in pre-election polls? In this case, the attacker can only infer stochastic information about a voter's true preference, and this may cause the manipulation of the electoral result significantly harder. The goal of this paper is to provide a quantitative study on the effect of adopting LDP mechanisms on a voting system. We introduce the metric PoLDP (power of LDP) that quantitatively measures the difference between the attacker's manipulation cost under LDP mechanisms and that without LDP mechanisms. The larger PoLDP is, the more robustness LDP mechanisms can add to a voting system. We give a full characterization of PoLDP for the voting system with plurality rule and provide general guidance towards the application of LDP mechanisms.

AAMAS Conference 2021 Conference Paper

A Game Theoretical Analysis of Non-Linear Blockchain System

  • Lin Chen
  • Lei Xu
  • Zhimin Gao
  • Ahmed Imtiaz Sunny
  • Keshav Kasichainula
  • Weidong Shi

Recent advances in the blockchain research have been made in two important directions. One is refined resilience analysis utilizing game theory to study the consequences of selfish behavior of users (miners), and the other is the extension from a linear (chain) structure to a non-linear (graphical) structure for performance improvements, such as IOTA and Graphcoin. The first question that comes to mind is what improvements that a blockchain system would see by leveraging these new advances. In this paper, we consider three major properties for a blockchain system: 𝛼-partial verification, scalability, and finality-duration. We establish a formal framework and prove that no blockchain system can achieve 𝛼-partial verification for any fixed constant 𝛼, high scalability, and low finality-duration simultaneously. We observe that classical blockchain systems like Bitcoin achieves full verification (𝛼 = 1) and low finality-duration, Ethereum 2. 0 Sharding achieves low finality-duration and high scalability. We are interested in whether it is possible to partially satisfy the three properties.

ICLR Conference 2021 Conference Paper

Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS

  • Lin Chen
  • Sheng Xu

We prove that the reproducing kernel Hilbert spaces (RKHS) of a deep neural tangent kernel and the Laplace kernel include the same set of functions, when both kernels are restricted to the sphere $\mathbb{S}^{d-1}$. Additionally, we prove that the exponential power kernel with a smaller power (making the kernel less smooth) leads to a larger RKHS, when it is restricted to the sphere $\mathbb{S}^{d-1}$ and when it is defined on the entire $\mathbb{R}^d$.

NeurIPS Conference 2021 Conference Paper

Multiple Descent: Design Your Own Generalization Curve

  • Lin Chen
  • Yifei Min
  • Mikhail Belkin
  • Amin Karbasi

This paper explores the generalization loss of linear regression in variably parameterized families of models, both under-parameterized and over-parameterized. We show that the generalization curve can have an arbitrary number of peaks, and moreover, the locations of those peaks can be explicitly controlled. Our results highlight the fact that both the classical U-shaped generalization curve and the recently observed double descent curve are not intrinsic properties of the model family. Instead, their emergence is due to the interaction between the properties of the data and the inductive biases of learning algorithms.

NeurIPS Conference 2020 Conference Paper

Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition

  • Lin Chen
  • Qian Yu
  • Hannah Lawrence
  • Amin Karbasi

We study the problem of switching-constrained online convex optimization (OCO), where the player has a limited number of opportunities to change her action. While the discrete analog of this online learning task has been studied extensively, previous work in the continuous setting has neither established the minimax rate nor algorithmically achieved it. In this paper, we show that $ T $-round switching-constrained OCO with fewer than $ K $ switches has a minimax regret of $ \Theta(\frac{T}{\sqrt{K}}) $. In particular, it is at least $ \frac{T}{\sqrt{2K}} $ for one dimension and at least $ \frac{T}{\sqrt{K}} $ for higher dimensions. The lower bound in higher dimensions is attained by an orthogonal subspace argument. In one dimension, a novel adversarial strategy yields the lower bound of $O(\frac{T}{\sqrt{K}})$, but a precise minimax analysis including constants is more involved. To establish the tighter one-dimensional result, we introduce the \emph{fugal game} relaxation, whose minimax regret lower bounds that of switching-constrained OCO. We show that the minimax regret of the fugal game is at least $ \frac{T}{\sqrt{2K}} $ and thereby establish the optimal minimax lower bound in one dimension. To establish the dimension-independent upper bound, we next show that a mini-batching algorithm provides an $ O(\frac{T}{\sqrt{K}}) $ upper bound, and therefore conclude that the minimax regret of switching-constrained OCO is $ \Theta(\frac{T}{\sqrt{K}}) $ for any $K$. This is in sharp contrast to its discrete counterpart, the switching-constrained prediction-from-experts problem, which exhibits a phase transition in minimax regret between the low-switching and high-switching regimes.

ICML Conference 2019 Conference Paper

Categorical Feature Compression via Submodular Optimization

  • MohammadHossein Bateni
  • Lin Chen
  • Hossein Esfandiari
  • Thomas Fu
  • Vahab Mirrokni
  • Afshin Rostamizadeh

In the era of big data, learning from categorical features with very large vocabularies (e. g. , 28 million for the Criteo click prediction dataset) has become a practical challenge for machine learning researchers and practitioners. We design a highly-scalable vocabulary compression algorithm that seeks to maximize the mutual information between the compressed categorical feature and the target binary labels and we furthermore show that its solution is guaranteed to be within a $1-1/e \approx 63%$ factor of the global optimal solution. Although in some settings, entropy-based set functions are known to be submodular, this is not the case for the mutual information objective we consider (mutual information with respect to the target labels). To address this, we introduce a novel re-parametrization of the mutual information objective, which we prove is submodular, and also design a data structure to query the submodular function in amortized $O(\log n )$ time (where $n$ is the input vocabulary size). Our complete algorithm is shown to operate in $O(n \log n )$ time. Additionally, we design a distributed implementation in which the query data structure is decomposed across $O(k)$ machines such that each machine only requires $O(\frac n k)$ space, while still preserving the approximation guarantee and using only logarithmic rounds of computation. We also provide analysis of simple alternative heuristic compression methods to demonstrate they cannot achieve any approximation guarantee. Using the large-scale Criteo learning task, we demonstrate better performance in retaining mutual information and also verify competitive learning performance compared to other baseline methods.

IJCAI Conference 2019 Conference Paper

Election with Bribe-Effect Uncertainty: A Dichotomy Result

  • Lin Chen
  • Lei Xu
  • Shouhuai Xu
  • Zhimin Gao
  • Weidong Shi

We consider the electoral bribery problem in computational social choice. In this context, extensive studies have been carried out to analyze the computational vulnerability of various voting (or election) rules. However, essentially all prior studies assume a deterministic model where each voter has an associated threshold value, which is used as follows. A voter will take a bribe and vote according to the attacker's (i. e. , briber's) preference when the amount of the bribe is above the threshold, and a voter will not take a bribe when the amount of the bribe is not above the threshold (in this case, the voter will vote according to its own preference, rather than the attacker's). In this paper, we initiate the study of a more realistic model where each voter is associated with a willingness function, rather than a fixed threshold value. The willingness function characterizes the likelihood a bribed voter would vote according to the attacker's preference; we call this bribe-effect uncertainty. We characterize the computational complexity of the electoral bribery problem in this new model. In particular, we discover a dichotomy result: a certain mathematical property of the willingness function dictates whether or not the computational hardness can serve as a deterrence to bribery attackers.

AAAI Conference 2019 Conference Paper

Election with Bribed Voter Uncertainty: Hardness and Approximation Algorithm

  • Lin Chen
  • Lei Xu
  • Shouhuai Xu
  • Zhimin Gao
  • Weidong Shi

Bribery in election (or computational social choice in general) is an important problem that has received a considerable amount of attention. In the classic bribery problem, the briber (or attacker) bribes some voters in attempting to make the briber’s designated candidate win an election. In this paper, we introduce a novel variant of the bribery problem, “Election with Bribed Voter Uncertainty” or BVU for short, accommodating the uncertainty that the vote of a bribed voter may or may not be counted. This uncertainty occurs either because a bribed voter may not cast its vote in fear of being caught, or because a bribed voter is indeed caught and therefore its vote is discarded. As a first step towards ultimately understanding and addressing this important problem, we show that it does not admit any multiplicative O(1)-approximation algorithm modulo standard complexity assumptions. We further show that there is an approximation algorithm that returns a solution with an additive-ε error in FPT time for any fixed ε.

NeurIPS Conference 2019 Conference Paper

Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond

  • Lin Chen
  • Hossein Esfandiari
  • Gang Fu
  • Vahab Mirrokni

Computing approximate nearest neighbors in high dimensional spaces is a central problem in large-scale data mining with a wide range of applications in machine learning and data science. A popular and effective technique in computing nearest neighbors approximately is the locality-sensitive hashing (LSH) scheme. In this paper, we aim to develop LSH schemes for distance functions that measure the distance between two probability distributions, particularly for f-divergences as well as a generalization to capture mutual information loss. First, we provide a general framework to design LHS schemes for f-divergence distance functions and develop LSH schemes for the generalized Jensen-Shannon divergence and triangular discrimination in this framework. We show a two-sided approximation result for approximation of the generalized Jensen-Shannon divergence by the Hellinger distance, which may be of independent interest. Next, we show a general method of reducing the problem of designing an LSH scheme for a Krein kernel (which can be expressed as the difference of two positive definite kernels) to the problem of maximum inner product search. We exemplify this method by applying it to the mutual information loss, due to its several important applications such as model compression.

NeurIPS Conference 2019 Conference Paper

Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback

  • Mingrui Zhang
  • Lin Chen
  • Hamed Hassani
  • Amin Karbasi

In this paper, we propose three online algorithms for submodular maximization. The first one, Mono-Frank-Wolfe, reduces the number of per-function gradient evaluations from $T^{1/2}$ [Chen2018Online] and $T^{3/2}$ [chen2018projection] to 1, and achieves a $(1-1/e)$-regret bound of $O(T^{4/5})$. The second one, Bandit-Frank-Wolfe, is the first bandit algorithm for continuous DR-submodular maximization, which achieves a $(1-1/e)$-regret bound of $O(T^{8/9})$. Finally, we extend Bandit-Frank-Wolfe to a bandit algorithm for discrete submodular maximization, Responsive-Frank-Wolfe, which attains a $(1-1/e)$-regret bound of $O(T^{8/9})$ in the responsive bandit setting.

AAMAS Conference 2018 Conference Paper

Protecting Election from Bribery: New Approach and Computational Complexity Characterization

  • Lin Chen
  • Lei Xu
  • Shouhuai Xu
  • Zhimin Gao
  • Nolan Shah
  • Yang Lu
  • Weidong Shi

The bribery problem in elections has received a considerable amount of attention. In this paper, we initiate the study of a related, but new problem, the protection problem, namely protecting elections from bribery. In this problem, there is a defender who is given a defense budget and can use the budget to award some of the voters such that they cannot be bribed anymore. This naturally leads to the following bi-level decision problem: Is it possible for the defender with a given defense budget to protect an election from being manipulated by the attacker with a given attack budget for bribing voters? We characterize the computational complexity of the protection problem. We show that it is in general significantly harder than the bribery problem. However, the protection problem can be solved, under certain circumstances, in polynomial time.

AAAI Conference 2017 Conference Paper

Discriminative Semi-Supervised Dictionary Learning with Entropy Regularization for Pattern Classification

  • Meng Yang
  • Lin Chen

Dictionary learning has played an important role in the success of sparse representation, which triggers the rapid developments of unsupervised and supervised dictionary learning methods. However, in most practical applications, there are usually quite limited labeled training samples while it is relatively easy to acquire abundant unlabeled training samples. Thus semi-supervised dictionary learning that aims to effectively explore the discrimination of unlabeled training data has attracted much attention of researchers. Although various regularizations have been introduced in the prevailing semi-supervised dictionary learning, how to design an effective unified model of dictionary learning and unlabeled-data class estimating and how to well explore the discrimination in the labeled and unlabeled data are still open. In this paper, we propose a novel discriminative semisupervised dictionary learning model (DSSDL) by introducing discriminative representation, an identical coding of unlabeled data to the coding of testing data final classification, and an entropy regularization term. The coding strategy of unlabeled data can not only avoid the affect of its incorrect class estimation, but also make the learned discrimination be well exploited in the final classification. The introduced regularization of entropy can avoid overemphasizing on some uncertain estimated classes for unlabeled samples. Apart from the enhanced discrimination in the learned dictionary by the discriminative representation, an extended dictionary is used to mainly explore the discrimination embedded in the unlabeled data. Extensive experiments on face recognition, digit recognition and texture classification show the effectiveness of the proposed method.

NeurIPS Conference 2017 Conference Paper

Interactive Submodular Bandit

  • Lin Chen
  • Andreas Krause
  • Amin Karbasi

In many machine learning applications, submodular functions have been used as a model for evaluating the utility or payoff of a set such as news items to recommend, sensors to deploy in a terrain, nodes to influence in a social network, to name a few. At the heart of all these applications is the assumption that the underlying utility/payoff function is known a priori, hence maximizing it is in principle possible. In real life situations, however, the utility function is not fully known in advance and can only be estimated via interactions. For instance, whether a user likes a movie or not can be reliably evaluated only after it was shown to her. Or, the range of influence of a user in a social network can be estimated only after she is selected to advertise the product. We model such problems as an interactive submodular bandit optimization, where in each round we receive a context (e. g. , previously selected movies) and have to choose an action (e. g. , propose a new movie). We then receive a noisy feedback about the utility of the action (e. g. , ratings) which we model as a submodular function over the context-action space. We develop SM-UCB that efficiently trades off exploration (collecting more data) and exploration (proposing a good action given gathered data) and achieves a $O(\sqrt{T})$ regret bound after $T$ rounds of interaction. Given a bounded-RKHS norm kernel over the context-action-payoff space that governs the smoothness of the utility function, SM-UCB keeps an upper-confidence bound on the payoff function that allows it to asymptotically achieve no-regret. Finally, we evaluate our results on four concrete applications, including movie recommendation (on the MovieLense data set), news recommendation (on Yahoo! Webscope dataset), interactive influence maximization (on a subset of the Facebook network), and personalized data summarization (on Reuters Corpus). In all these applications, we observe that SM-UCB consistently outperforms the prior art.

AAAI Conference 2017 Conference Paper

Near-Optimal Active Learning of Halfspaces via Query Synthesis in the Noisy Setting

  • Lin Chen
  • Hamed Hassani
  • Amin Karbasi

In this paper, we consider the problem of actively learning a linear classifier through query synthesis where the learner can construct artificial queries in order to estimate the true decision boundaries. This problem has recently gained a lot of interest in automated science and adversarial reverse engineering for which only heuristic algorithms are known. In such applications, queries can be constructed de novo to elicit information (e. g. , automated science) or to evade detection with minimal cost (e. g. , adversarial reverse engineering). We develop a general framework, called dimension coupling (DC), that 1) reduces a d-dimensional learning problem to d−1 lowdimensional sub-problems, 2) solves each sub-problem efficiently, 3) appropriately aggregates the results and outputs a linear classifier, and 4) provides a theoretical guarantee for all possible schemes of aggregation. The proposed method is proved resilient to noise. We show that the DC framework avoids the curse of dimensionality: its computational complexity scales linearly with the dimension. Moreover, we show that the query complexity of DC is near optimal (within a constant factor of the optimum algorithm). To further support our theoretical analysis, we compare the performance of DC with the existing work. We observe that DC consistently outperforms the prior arts in terms of query complexity while often running orders of magnitude faster.

IJCAI Conference 2016 Conference Paper

Clustering-Based Joint Feature Selection for Semantic Attribute Prediction

  • Lin Chen
  • Baoxin Li

Semantic attributes have been proposed to bridge the semantic gap between low-level feature representation and high-level semantic understanding of visual objects. Obtaining a good representation of semantic attributes usually requires learning from high-dimensional low-level features, which not only significantly increases the time and space requirement but also degrades the performance due to numerous irrelevant features. Since multi-attribute prediction can be generalized as a multi-task learning problem, sparse-based multi-task feature selection approaches have been introduced, utilizing the relatedness among multiple attributes. However, such approaches either do not investigate the pattern of the relatedness among attributes, or require prior knowledge about the pattern. In this paper, we propose a novel feature selection approach which embeds attribute correlation modeling in multi-attribute joint feature selection. Experiments on both synthetic dataset and multiple public benchmark datasets demonstrate that the proposed approach effectively captures the correlation among multiple attributes and significantly outperforms the state-of-the-art approaches.

NeurIPS Conference 2016 Conference Paper

Estimating the Size of a Large Network and its Communities from a Random Sample

  • Lin Chen
  • Amin Karbasi
  • Forrest Crawford

Most real-world networks are too large to be measured or studied directly and there is substantial interest in estimating global network properties from smaller sub-samples. One of the most important global properties is the number of vertices/nodes in the network. Estimating the number of vertices in a large network is a major challenge in computer science, epidemiology, demography, and intelligence analysis. In this paper we consider a population random graph G = (V; E) from the stochastic block model (SBM) with K communities/blocks. A sample is obtained by randomly choosing a subset W and letting G(W) be the induced subgraph in G of the vertices in W. In addition to G(W), we observe the total degree of each sampled vertex and its block membership. Given this partial information, we propose an efficient PopULation Size Estimation algorithm, called PULSE, that accurately estimates the size of the whole population as well as the size of each community. To support our theoretical analysis, we perform an exhaustive set of experiments to study the effects of sample size, K, and SBM model parameters on the accuracy of the estimates. The experimental results also demonstrate that PULSE significantly outperforms a widely-used method called the network scale-up estimator in a wide variety of scenarios.

AAAI Conference 2016 Conference Paper

Seeing the Unseen Network: Inferring Hidden Social Ties from Respondent-Driven Sampling

  • Lin Chen
  • Forrest Crawford
  • Amin Karbasi

Learning about the social structure of hidden and hard-toreach populations — such as drug users and sex workers — is a major goal of epidemiological and public health research on risk behaviors and disease prevention. Respondentdriven sampling (RDS) is a peer-referral process widely used by many health organizations, where research subjects recruit other subjects from their social network. In such surveys, researchers observe who recruited whom, along with the time of recruitment and the total number of acquaintances (network degree) of respondents. However, due to privacy concerns, the identities of acquaintances are not disclosed. In this work, we show how to reconstruct the underlying network structure through which the subjects are recruited. We formulate the dynamics of RDS as a continuous-time diffusion process over the underlying graph and derive the likelihood of the recruitment time series under an arbitrary inter-recruitment time distribution. We develop an efficient stochastic optimization algorithm called RENDER (REspoNdent-Driven nEtwork Reconstruction) that finds the network that best explains the collected data. We support our analytical results through an exhaustive set of experiments on both synthetic and real data.