Arrow Research search

Author name cluster

Bo Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

226 papers
2 author rows

Possible papers

226

AAAI Conference 2026 Conference Paper

Bidirectional Noise Injection: Enhancing Diffusion Models via Coordinated Input-Output Perturbation

  • Tianyi Zheng
  • Jiayang Gao
  • Peng-tao Jiang
  • Fengxiang Yang
  • Ben Wan
  • Hao Zhang
  • Jinwei Chen
  • Jia Wang

Diffusion models have demonstrated remarkable success in image generation, yet a persistent challenge remains: the bias between model predictions and the target distribution. In this paper, we propose a Bidirectional Noise Injection framework for enhancing diffusion models, implemented via Coordinated Input-Output Perturbation (CIOP). Our approach mitigates this bias by randomly applying synchronized noise injection to both the model inputs and the prediction targets during the training stage. This stochastic, synchronized noise injected acts as a smoothing mechanism that effectively reduces the 2-Wasserstein distance between the predicted and target distributions, as substantiated by our theoretical analysis based on optimal transport theory. Extensive experiments on multiple benchmark datasets and various generative tasks demonstrate that our method improves generation quality and training efficiency without incurring additional computational cost. Furthermore, the design of CIOP enables seamless integration with existing diffusion model improvements and advanced frameworks, thereby broadening its applicability. These results highlight the potential of Bidirectional Noise Injection via CIOP to alleviate bias in diffusion-based generative models across a wide range of settings.

JBHI Journal 2026 Journal Article

DBGT-PLA: Dual-Branch Graph–Transformer Fusion for Interpretable Protein– Ligand Affinity Prediction

  • Ying Wang
  • Jing Hu
  • Junlin Xu
  • Bo Li

Protein-ligand binding affinity prediction is critical for drug discovery, yet existing methods struggle to jointly model local atomic interactions and global contextual dependencies. To address this, we propose the Interpretable Dual-Branch Graph–Transformer framework for Protein–Ligand Affinity prediction (DBGT-PLA), a novel dual-branch architecture that integrates graph neural network (GNN) with a stability-enhanced Transformer equipped with learnable positional embeddings and a NaN-filtering mechanism that handles potential Not-a-Number (NaN) values arising from numerical instability or data preprocessing. We design a Gated Residual Learning (GRL) Fusion module that performs dimension-wise adaptive integration between local graph topology and global Transformer context. This mechanism enables multi-level feature coordination through a residual path, achieving biophysically consistent alignment between atomic-level interactions and global conformational dependencies. Furthermore, we introduce an edge-level Shapley attribution framework tailored to protein–ligand interaction graphs, quantifying contributions of chemical bonds (e. g. , hydrophobic contacts) and non-covalent interactions. Experiments show DBGT-PLA reduces RMSE by 18. 3% (from 1. 522 to 1. 244 on the Holdout Set 2019), outperforming state-of-the-art models. Crucially, our explainability module reveals that the ligand edges dominate affinity predictions, accounting for nearly 70%. This work not only advances predictive accuracy but also offers unprecedented, quantitative insights into interaction determinants, which can guide rational drug optimization. The code of DBGT-PLA is publicly available at https://github.com/wangwying/DBGT-PLA

AAAI Conference 2026 Conference Paper

Encode Geometric Diagram as Geo-Graph in Geometry Problem Solving

  • Wenjun Wu
  • Lingling Zhang
  • Bo Zhao
  • Bo Li
  • Xinyu Zhang
  • Yaqiang Wu

Geometry Problem Solving has become a hot topic these years due to its complexity of enabling the machine with geometric abstraction, multi-modal reasoning and mathematical capabilities. Majority of research works place their attention on the fusion of multi-modal data or the synergistic combination of neural and symbolic systems for performance improvement. However, their neglect of the unique characteristics of geometric diagrams, which distinguish them from natural images, impedes the further exploring of critical information in geometric diagrams. In this work, we introduce the novel concept of geo-graph and propose the Geo-Graph Geometry Problem Solving model which encodes the geometric diagram from a new perspective. The geo-graph is designed to include semantic, structural and spatial information in the diagram, which is crucial to subsequent problem reasoning stage. To facilitate the model's comprehension of the actual layout of geometric diagram, spatial and connecting attentions are devised to serve as intrinsic knowledge guidance for feature propagation. An extra cross-modal attention is used as external guidance to instruct the encoding of geo-graph to be related to specific problem target. Fused multi-modal features are then sent into a commonly used encoder-decoder framework for final solution generation. The model is first trained with three carefully designed pre-training tasks to establish its fundamental knowledge of geo-graph, leveraging numerous varied samples generated through a geo-graph-based augmentation method. Experiments on popular geometry problem solving datasets demonstrate the effectiveness and superiority of our model for geometric diagram encoding.

AAAI Conference 2026 Conference Paper

GitTaskBench: A Benchmark for Code Agents Solving Real-World Tasks Through Code Repository Leveraging

  • Ziyi Ni
  • Huacan Wang
  • Shuo Zhang
  • Shuo Lu
  • Ziyang He
  • WangYou
  • Zhenheng Tang
  • Sen Hu

Beyond scratch coding, exploiting large-scale code repositories (e.g., GitHub) for practical tasks is vital in real-world software development, yet current benchmarks rarely evaluate code agents in such authentic, workflow-driven scenarios. To bridge this gap, we introduce GitTaskBench, a benchmark designed to systematically assess this capability via 54 realistic tasks across 7 modalities and 7 domains. Each task pairs a relevant repository with an automated, human-curated evaluation harness specifying practical success criteria. Beyond measuring execution and task success, we also propose the alpha-value metric to quantify the economic benefit of agent performance, which integrates task success rates, token cost, and average developer salaries. Experiments across three state-of-the-art agent frameworks with multiple advanced LLMs show that leveraging code repositories for complex task solving remains challenging: even the best-performing system, OpenHands+Claude 3.7, solves only 48.15% of tasks. Error analysis attributes over half of failures to seemingly mundane yet critical steps like environment setup and dependency resolution, highlighting the need for more robust workflow management and increased timeout preparedness. By releasing GitTaskBench, we aim to drive progress and attention toward repository-aware code reasoning, execution, and deployment---moving agents closer to solving complex, end-to-end real-world tasks.

JBHI Journal 2026 Journal Article

Hierarchical Deep Decision Tree-Based Network for Odontogenic Cystic Lesion Classification in CBCT Images

  • Zimo Huang
  • Hao Wang
  • Bo Li
  • Eduardo Delamare
  • Shengfu Huang
  • Lei Bi
  • Jinman Kim

Odontogenic cystic lesions (OCLs) are complex jaw abnormalities that require a precise diagnosis of the disease for treatment. Visual OCL diagnosis is commonly based on reviewing cone-beam computed tomography (CBCT) to identify morpho-pathological features associated with specific lesion types in a hierarchical manner. Current state-of-the-art methods focus on extracting features from the image without any guidance beyond the lesion diagnosis, and do not fully leverage the hierarchical relationship between the lesion diagnosis and morphological features. In this study, we propose a hierarchical deep decision tree network (H2DT-Net) with three modules: a deep decision tree-based hierarchical learning module (DHLM) to leverage inter-categorical relationships; a feature category embedding module (FCEM) to capture representations from both diagnostic and morpho-pathological domains and support the DHLM; and a lesion localised attention module (LLAM) to facilitate the feature extraction process by generating lesion-focused attention maps. Evaluated on 289 CBCT images, H2DT-Net achieved state-of-the-art performance in OCL classification. We further demonstrate that our method is effective in clinical settings, where it outperformed six maxillofacial clinicians in diagnostic assessment.

AAAI Conference 2026 Conference Paper

Information Elicitation Mechanisms for Bayesian Auctions (Abstract Reprint)

  • Jing Chen
  • Bo Li
  • Yingkai Li

In this paper we design information elicitation mechanisms for Bayesian auctions. While in Bayesian mechanism design the distributions of the players’ private types are often assumed to be common knowledge, information elicitation considers the situation where the players know the distributions better than the decision maker. To weaken the information assumption in Bayesian auctions, we consider an information structure where the knowledge about the distributions is arbitrarily scattered among the players. In such an unstructured information setting, we design mechanisms for unit-demand auctions and additive auctions that aggregate the players’ knowledge, generating revenue that are constant approximations to the optimal Bayesian mechanisms with a common prior. Our mechanisms are 2-step dominant-strategy truthful, and the approximation ratios improve gracefully with the amount of knowledge the players collectively have.

AAAI Conference 2026 Conference Paper

Language Drift in Multilingual Retrieval-Augmented Generation: Characterization and Decoding-Time Mitigation

  • Bo Li
  • Zhenghua Xu
  • Rui Xie

Multilingual Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to perform knowledge-intensive tasks in multilingual settings by leveraging retrieved documents as external evidence. However, when the retrieved evidence differs in language from the user query and in-context exemplars, the model often exhibits language drift by generating responses in an unintended language. This phenomenon is especially pronounced during reasoning-intensive decoding, such as Chain-of-Thought (CoT) generation, where intermediate steps introduce further language instability. In this paper, we systematically study output language drift in multilingual RAG across multiple datasets, languages, and LLM backbones. Our controlled experiments reveal that the drift results not from comprehension failure but from decoder-level collapse, where dominant token distributions and high-frequency English patterns dominate the intended generation language. We further observe that English serves as a semantic attractor under cross-lingual conditions, emerging as both the strongest interference source and the most frequent fallback language. To mitigate this, we propose Soft Constrained Decoding (SCD), a lightweight, training-free decoding strategy that gently steers generation toward the target language by penalizing non-target-language tokens. SCD is model-agnostic and can be applied to any generation algorithm without modifying the architecture or requiring additional data. Experiments across three multilingual datasets and multiple typologically diverse languages show that SCD consistently improves language alignment and task performance, providing an effective and generalizable solution in multilingual RAG.

AAAI Conference 2026 Conference Paper

MARS: Multi-Agent Adaptive Reasoning with Socratic Guidance for Automated Prompt Optimization

  • Jian Zhang
  • Zhangqi Wang
  • Haiping Zhu
  • Kangda Cheng
  • Kai He
  • Bo Li
  • Qika Lin
  • Jun Liu

Large language models (LLMs) typically operate in a question-answering paradigm, where the quality of the input prompt critically affects the response. Automated Prompt Optimization (APO) aims to overcome the cognitive biases of manually crafted prompts and explore a broader prompt design space. However, existing APO methods often suffer from rigid template structures and inefficient exploration in the prompt space. To this end, we propose a Multi-Agent Adaptive Reasoning with Socratic guidance framework (MARS) for APO. MARS consists of five complementary agents and formulates the optimization process as a Partially Observable Markov Decision Process (POMDP), enabling adaptive prompt refinement through explicit state modeling and interactive feedback. Specifically, a Planner agent generates flexible optimization trajectories, a Teacher-Critic-Student triad engages in Socratic-style dialogue to iteratively optimize the prompt based on pseudo-gradient signals in the text space, and a Target agent executes the prompt in downstream tasks to provide performance feedback. MARS integrates reasoning, feedback, and state transition into a unified hidden-state evolution process, improving both the effectiveness and interpretability of optimization. Extensive experiments on multiple datasets demonstrate that MARS outperforms existing APO methods in terms of optimization performance, search efficiency, and interpretability.

AAAI Conference 2026 Conference Paper

Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG

  • Bo Li
  • Tian Tian
  • Zhenghua Xu
  • Hao Cheng
  • Shikun Zhang
  • Wei Ye

Dynamic retrieval-augmented generation (RAG) allows large language models (LLMs) to fetch external knowledge on demand, offering greater adaptability than static RAG. A central challenge in this setting lies in determining the optimal timing for retrieval. Existing methods often trigger retrieval based on low token-level confidence, which may lead to delayed intervention after errors have already propagated. We introduce Entropy-Trend Constraint (ETC), a training-free method that determines optimal retrieval timing by modeling the dynamics of token-level uncertainty. Specifically, ETC utilizes first- and second-order differences of the entropy sequence to detect emerging uncertainty trends, enabling earlier and more precise retrieval. Experiments on six QA benchmarks with three LLM backbones demonstrate that ETC consistently outperforms strong baselines while reducing retrieval frequency. ETC is particularly effective in domain-specific scenarios, exhibiting robust generalization capabilities. Ablation studies and qualitative analyses further confirm that trend-aware uncertainty modeling yields more effective retrieval timing. The method is plug-and-play, model-agnostic, and readily integrable into existing decoding pipelines. Implementation code is included in the supplementary materials.

TMLR Journal 2026 Journal Article

Nondeterministic Polynomial-time Problem Challenge: An Ever-Scaling Reasoning Benchmark for LLMs

  • Chang Yang
  • Ruiyu Wang
  • Junzhe Jiang
  • Qi Jiang
  • Qinggang Zhang
  • Yanchen Deng
  • Shuxin Li
  • Shuyue Hu

Reasoning is the fundamental capability of large language models (LLMs). Due to the rapid progress of LLMs, there are two main issues of current benchmarks: i) these benchmarks can be crushed in a short time (less than 1 year), and ii) these benchmarks may be easily hacked. To handle these issues, we propose the ever-scalingness for building the benchmarks which are scaling over complexity, instance, oversight and coverage. This paper presents Nondeterministic Polynomial-time Problem Challenge (NPPC), an ever-scaling reasoning benchmark for LLMs. Specifically, the NPPC has three main modules: i) npgym, which provides a unified interface of 25 well-known NP-complete problems and can generate any number of instances with any levels of complexities, ii) npsolver, which provides a unified interface to evaluate the problem instances with both online and offline models via APIs and local deployments, respectively, and iii) npeval, which provides the comprehensive and ready-to-use tools to analyze the performances of LLMs over different problems, the number of tokens, the aha moments, the reasoning errors and the solution errors. Extensive experiments over widely-used LLMs demonstrate: i) NPPC can successfully decrease the performances of advanced LLMs to below 10%, demonstrating that NPPC is not crushed by current models, ii) DeepSeek-R1, Claude-3.7-Sonnet, and o1/o3-mini are the most powerful LLMs, where DeepSeek-R1 can outperform Claude-3.7-Sonnet and o1/o3-mini in most NP-complete problems considered, and iii) the numbers of tokens, aha moments in the advanced LLMs, e.g., Claude-3.7-Sonnet and DeepSeek-R1, are observed first to increase and then decrease when the problem instances become more and more difficult. Through continuously scaling analysis, NPPC can provide critical insights into LLMs' reasoning capabilities, exposing fundamental limitations and suggesting future directions for further improvements.

AAAI Conference 2026 Conference Paper

Think, Speak, Decide: Language-Augmented Multi-Agent Reinforcement Learning for Economic Decision-Making

  • Heyang Ma
  • Qirui Mi
  • Qipeng Yang
  • Zijun Fan
  • Bo Li
  • Haifeng Zhang

Economic decision‑making depends not only on structured signals—such as prices and taxes—but also on unstructured language, including peer dialogue and media narratives. While multi‑agent reinforcement learning (MARL) has shown promise in optimizing economic decisions, it struggles with the semantic ambiguity and contextual richness of language. We propose LAMP (Language‑Augmented Multi‑Agent Policy), the first framework to integrate language into economic decision‑making, narrowing the gap to real‑world settings. LAMP follows a Think–Speak–Decide pipeline: (1) Think interprets numerical observations to extract short‑term shocks and long‑term trends, caching high‑value reasoning trajectories. (2) Speak crafts and exchanges strategic messages based on the reasoning, updating beliefs by parsing peer communications. (3) Decide fuses numerical data, reasoning, and reflections into a MARL policy to optimize language‑augmented decision‑making. Experiments in economic simulation show that LAMP outperforms both MARL and LLM‑only baselines in cumulative return (+63.5%, +34.0%), robustness (+18.8%, +59.4%), and interpretability. These results demonstrate the potential of language‑augmented policies to deliver more effective and robust economic strategies.

AAAI Conference 2026 Conference Paper

Uncovering and Aligning Anomalous Attention Heads to Defend Against NLP Backdoor Attacks

  • Haotian Jin
  • Yang Li
  • Haihui Fan
  • Lin Shen
  • Xiangfang Li
  • Bo Li

Backdoor attacks pose a serious threat to the security of large language models (LLMs), causing them to exhibit anomalous behavior under specific trigger conditions. The design of backdoor triggers has evolved from fixed triggers to dynamic or implicit triggers. This increased flexibility in trigger design makes it challenging for defenders to accurately identify their specific forms. Most existing backdoor defense methods are limited to specific types of triggers or rely on an additional clean model for support. To address this issue, we propose a backdoor detection method based on attention similarity, enabling backdoor detection without prior knowledge of the trigger. Our study reveals that models subjected to backdoor attacks exhibit unusually high similarity among attention heads when exposed to triggers. Based on this observation, we propose an attention safety alignment approach combined with head-wise fine-tuning to rectify potentially contaminated attention heads, thereby effectively mitigating the impact of backdoor attacks. Extensive experimental results demonstrate that our method significantly reduces the success rate of backdoor attacks while preserving the model’s performance on downstream tasks.

JBHI Journal 2026 Journal Article

You Need Glimpse Before Segmentation: Stochastic Detector-Actor-Critic for Medical Image Segmentation

  • Zhenghua Xu
  • Yunxin Liu
  • Di Yuan
  • Bo Li
  • Weipeng Liu
  • Thomas Lukasiewicz

Medical images often contain more redundant background areas than natural images, potentially introducing noise and degrading image segmentation performance. Inspired by doctors' diagnostic processes, where they identify the lesion area before conducting a detailed analysis, we introduce a novel S tochastic D etector- A ctor- C ritic (SDAC) framework to tackle this challenge. SDAC initially glimpses the entire image using a detector network and policy gradient algorithms to filter out irrelevant background regions and focus on crucial, smaller areas for segmentation. The Actor-Critic algorithm then dynamically creates segmentation masks pixel by pixel without user intervention or coarse masks, forming a robust segmentation module. Both processes are trained jointly to reduce error propagation and ensure stability and ease of implementation. Our experiments on two commonly used medical image segmentation datasets demonstrate that SDAC achieves competitive results comparable to state-of-the-art methods while using 10x fewer parameters than the best-performing baseline in terms of DICE and IoU metrics. We also conduct detailed ablation studies to enhance understanding and facilitate practical use. Furthermore, SDAC performs well in low-resource settings (i. e. , 50-shot or 100-shot), making it ideal for real-world scenarios. Its lightweight design make SDAC an excellent baseline for medical image segmentation tasks.

AAAI Conference 2025 Conference Paper

Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning

  • Yuti Liu
  • Shice Liu
  • Junyuan Gao
  • Peng-tao Jiang
  • Hao Zhang
  • Jinwei Chen
  • Bo Li

Image Aesthetic Assessment (IAA) is a vital and intricate task that entails analyzing and assessing an image's aesthetic values, and identifying its highlights and areas for improvement. Traditional methods of IAA often concentrate on a single aesthetic task and suffer from inadequate labeled datasets, thus impairing in-depth aesthetic comprehension. Despite efforts to overcome this challenge through the application of Multi-modal Large Language Models (MLLMs), such models remain underdeveloped for IAA purposes. To address this, we propose a comprehensive aesthetic MLLM capable of nuanced aesthetic insight. Central to our approach is an innovative multi-scale text-guided self-supervised learning technique. This technique features a multi-scale feature alignment module and capitalizes on a wealth of unlabeled data in a self-supervised manner to structurally and functionally enhance aesthetic ability. The empirical evidence indicates that accompanied with extensive instruct-tuning, our model sets new state-of-the-art benchmarks across multiple tasks, including aesthetic scoring, aesthetic commenting, and personalized image aesthetic assessment. Remarkably, it also demonstrates zero-shot learning capabilities in the emerging task of aesthetic suggesting. Furthermore, for personalized image aesthetic assessment, we harness the potential of in-context learning and showcase its inherent advantages.

NeurIPS Conference 2025 Conference Paper

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

  • Xiaojun Jia
  • Sensen Gao
  • Simeng Qin
  • Tianyu Pang
  • Chao Du
  • Yihao Huang
  • Xinfeng Li
  • Yiming Li

Multimodal large language models (MLLMs) remain vulnerable to transferable adversarial examples. While existing methods typically achieve targeted attacks by aligning global features—such as CLIP’s [CLS] token—between adversarial and target samples, they often overlook the rich local information encoded in patch tokens. This leads to suboptimal alignment and limited transferability, particularly for closed-source models. To address this limitation, we propose a targeted transferable adversarial attack method based on feature optimal alignment, called FOA-Attack, to improve adversarial transfer capability. Specifically, at the global level, we introduce a global feature loss based on cosine similarity to align the coarse-grained features of adversarial samples with those of target samples. At the local level, given the rich local representations within Transformers, we leverage clustering techniques to extract compact local patterns to alleviate redundant local features. We then formulate local feature alignment between adversarial and target samples as an optimal transport (OT) problem and propose a local clustering optimal transport loss to refine fine-grained feature alignment. Additionally, we propose a dynamic ensemble model weighting strategy to adaptively balance the influence of multiple models during adversarial example generation, thereby further improving transferability. Extensive experiments across various models demonstrate the superiority of the proposed method, outperforming state-of-the-art methods, especially in transferring to closed-source MLLMs.

NeurIPS Conference 2025 Conference Paper

AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

  • Andy Zhou
  • Kevin Wu
  • Francesco Pinto
  • Zhaorun Chen
  • Yi Zeng
  • Yu Yang
  • Shuang Yang
  • Sanmi Koyejo

As large language models (LLMs) become increasingly capable, security and safety evaluation are crucial. While current red teaming approaches have made strides in assessing LLM vulnerabilities, they often rely heavily on human input and lack comprehensive coverage of emerging attack vectors. This paper introduces AutoRedTeamer, a novel framework for fully automated, end-to-end red teaming against LLMs. AutoRedTeamer combines a multi-agent architecture with a memory-guided attack selection mechanism to enable continuous discovery and integration of new attack vectors. The dual-agent framework consists of a red teaming agent that can operate from high-level risk categories alone to generate and execute test cases, and a strategy proposer agent that autonomously discovers and implements new attacks by analyzing recent research. This modular design allows AutoRedTeamer to adapt to emerging threats while maintaining strong performance on existing attack vectors. We demonstrate AutoRedTeamer’s effectiveness across diverse evaluation settings, achieving 20% higher attack success rates on HarmBench against Llama-3. 1-70B while reducing computational costs by 46% compared to existing approaches. AutoRedTeamer also matches the diversity of human-curated benchmarks in generating test cases, providing a comprehensive, scalable, and continuously evolving framework for evaluating the security of AI systems.

JBHI Journal 2025 Journal Article

Bilateral Proxy Federated Domain Generalization for Privacy-Preserving Medical Image Diagnosis

  • Huilin Lai
  • Ye Luo
  • Bo Li
  • Jianwei Lu
  • Junsong Yuan

Contemporary domain generalization methods have demonstrated effectiveness in aiding the generalized diagnosis of medical images with multi-source data by joint optimization. However, the centralized training paradigm employed by these approaches becomes infeasible when data are non-shared across domains due to the high privacy of medical data. Despite attempts by existing federated domain generalization methods to address this issue, the simultaneous attainment of strict privacy protection and a satisfactory level of generalization ability on out-of-distribution data remains a persistent challenge. In this paper, to tackle this challenging problem, we propose a novel approach called the Bilateral Proxy Framework (BPF). The BPF leverages the client-side proxies to facilitate the strict privacy-preserving communications with the server and ensure smoother and more stable convergences of local models through mutual distillation. Meanwhile, the server-side proxy adopts a distance-based strategy and a parameter moving average scheme, which enhances the stability and robustness of the global model, particularly by averting abrupt parameter changes that could result in fluctuations or overfitting. Through these advancements, our framework strives to enhance the generalization capability of the global model, enabling more accurate and reliable medical image diagnosis in federated settings. The effectiveness of our method is demonstrated with superior performance over state-of-the-arts on both simulated and real-world distribution medical image diagnosis tasks.

NeurIPS Conference 2025 Conference Paper

Boosting Adversarial Transferability with Spatial Adversarial Alignment

  • Zhaoyu Chen
  • HaiJing Guo
  • Kaixun Jiang
  • Jiyuan Fu
  • Xinyu Zhou
  • Dingkang Yang
  • Hao Tang
  • Bo Li

Deep neural networks are vulnerable to adversarial examples that exhibit transferability across various models. Numerous approaches are proposed to enhance the transferability of adversarial examples, including advanced optimization, data augmentation, and model modifications. However, these methods still show limited transferability, partiovovocularly in cross-architecture scenarios, such as from CNN to ViT. To achieve high transferability, we propose a technique termed Spatial Adversarial Alignment (SAA), which employs an alignment loss and leverages a witness model to fine-tune the surrogate model. Specifically, SAA consists of two key parts: spatial-aware alignment and adversarial-aware alignment. First, we minimize the divergences of features between the two models in both global and local regions, facilitating spatial alignment. Second, we introduce a self-adversarial strategy that leverages adversarial examples to impose further constraints, aligning features from an adversarial perspective. Through this alignment, the surrogate model is trained to concentrate on the common features extracted by the witness model. This facilitates adversarial attacks on these shared features, thereby yielding perturbations that exhibit enhanced transferability. Extensive experiments on various architectures on ImageNet show that aligned surrogate models based on SAA can provide higher transferable adversarial examples, especially in cross-architecture attacks.

AAAI Conference 2025 Conference Paper

Boosting Vision State Space Model with Fractal Scanning

  • Haoke Xiao
  • Lv Tang
  • Peng-tao Jiang
  • Hao Zhang
  • Jinwei Chen
  • Bo Li

Recently, foundational models have significantly advanced in different tasks, accompanied by Transformer as the general backbone. However, Transformer's quadratic complexity poses challenges for handling longer sequences and higher resolution images, which may limit foundational models further development. To alleviate this issue, various efficient State Space Models (SSMs) like Mamba have emerged, initially matching Transformer performance and gradually surpassing it. To improve the performance of SSMs in computer vision tasks, one crucial viewpoint is effective serialization of images. Existing vision Mambas, which rely on a linear scanning mechanism, often struggle to capture complex spatial relationships in 2D images. This results in feature loss during serialization and negatively impacts model performance. To overcome this limitation, we propose the use of fractal scanning curves for image serialization to enhance the Mambas’ ability to accurately model complex spatial dependencies. Additionally, unlike existing vision Mambas, which are designed with various curve scanning directions that increase the complexity, contradicting the original intent of Mamba to enhance model performance. We novelty introduce the Fractal Fusion Pathway (FFP) for our FractalMamba, which can enhance its performance efficiently. Extensive experiments underscore the superiority of our proposed FractalMamba.

NeurIPS Conference 2025 Conference Paper

C-SafeGen: Certified Safe LLM Generation with Claim-Based Streaming Guardrails

  • Mintong Kang
  • Zhaorun Chen
  • Bo Li

Despite the remarkable capabilities of large language models (LLMs) across diverse applications, they remain vulnerable to generating content that violates safety regulations and policies. To mitigate these risks, LLMs undergo safety alignment; however, they can still be effectively jailbroken. Off-the-shelf guardrail models are commonly deployed to monitor generations, but these models primarily focus on detection rather than ensuring safe decoding of LLM outputs. Moreover, existing efforts lack rigorous safety guarantees, which are crucial for the universal deployment of LLMs and certifiable compliance with regulatory standards. In this paper, we propose a Claim-based Stream Decoding (CSD) algorithm coupled with a statistical risk guarantee framework using conformal analysis. Specifically, our CSD algorithm integrates a stream guardrail model to safeguard sequential claims generated by LLMs and incorporates a backtracking mechanism to revise claims flagged with high safety risks. We provide theoretical guarantees demonstrating that the CSD algorithm achieves the desired generation distribution subject to safety constraints. Furthermore, we introduce a generation risk certification framework and derive a high-probability upper bound on the safety risk of the proposed CSD algorithm. We extend our approach to online settings, where user queries arrive sequentially, and prove that our method can asymptotically control safety risk to any desired level. Empirical evaluations demonstrate the effectiveness and efficiency of the CSD algorithm compared to state-of-the-art safety decoding approaches. Additionally, we validate the soundness and tightness of the derived safety risk upper bound using realistic data in both offline and online scenarios.

NeurIPS Conference 2025 Conference Paper

ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference

  • Xiang Liu
  • Zhenheng Tang
  • Peijie Dong
  • Zeyu Li
  • Bo Li
  • Xuming Hu
  • Xiaowen Chu

Large Language Models (LLMs) require significant GPU memory when processing long texts, with the key value (KV) cache consuming up to 70\% of total memory during inference. Although existing compression methods reduce memory by evaluating the importance of individual tokens, they overlook critical semantic relationships between tokens, resulting in fragmented context and degraded performance. We introduce \method{}, which fundamentally reimagines KV cache compression by treating semantic chunks - rather than isolated tokens - as basic compression units. This approach preserves complete linguistic structures and contextual integrity, ensuring that essential meaning is retained even under aggressive compression. Our innovation includes a novel layer-wise index reuse technique that exploits the higher cross-layer similarity of preserved indices in \method{}, reducing computational overhead and improving throughput by 26. 5\%. Comprehensive evaluations on challenging benchmarks: LongBench, Needle-In-A-HayStack, GSM8K, and JailbreakV demonstrate that \method{} outperforms state-of-the-art methods by up to 8. 7\% in precision while maintaining the same compression ratio. These results confirm that semantic-aware compression significantly enhances both efficiency and performance for long-context LLM inference, providing a simple yet effective solution to the memory bottleneck problem. \emph{The code is available at \href{https: //github. com/NVIDIA/kvpress}{link}. }

JMLR Journal 2025 Journal Article

Combining Climate Models using Bayesian Regression Trees and Random Paths

  • John C. Yannotty
  • Thomas J. Santner
  • Bo Li
  • Matthew T. Pratola

General circulation models (GCMs) are essential tools for climate studies. Such climate models may have varying accuracy across the input domain, but no model is uniformly best. One can improve climate model prediction performance by integrating multiple models using input-dependent weights. Weight functions modeled using Bayesian Additive Regression Trees (BART) were recently shown to be useful in nuclear physics applications. However, a restriction of that approach was the piecewise constant weight functions. To smoothly integrate multiple climate models, we propose a new tree-based model, Random Path BART (RPBART), that incorporates random path assignments in BART to produce smooth weight functions and smooth predictions, all in a matrix-free formulation. RPBART requires a more complex prior specification, for which we introduce a semivariogram to guide hyperparameter selection. This approach is easy to interpret, computationally cheap, and avoids expensive cross-validation. Finally, we propose a posterior projection technique to enable detailed analysis of the fitted weight functions. This allows us to identify a sparse set of climate models that recovers the underlying system within a given spatial region as well as quantifying model discrepancy given the available model set. Our method is demonstrated on an ensemble of 8 GCMs modeling average monthly surface temperature. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

AAAI Conference 2025 Conference Paper

COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems Against Semantic Attacks

  • Zijian Huang
  • Wenda Chu
  • Linyi Li
  • Chejian Xu
  • Bo Li

Multi-sensor fusion systems (MSFs) play a vital role as the perception module in modern autonomous vehicles (AVs). Therefore, ensuring their robustness against common and realistic adversarial semantic transformations, such as rotation and shifting in the physical world, is crucial for the safety of AVs. While empirical evidence suggests that MSFs exhibit improved robustness compared to single-modal models, they are still vulnerable to adversarial semantic transformations. In addition, although many empirical defenses have been proposed, several works show that these defenses can be further attacked by new adaptive attacks. So far, there is no certified defense proposed for MSFs. In this work, we propose the first robustness certification framework COMMIT to certify the robustness of multi-sensor fusion systems against semantic attacks. In particular, we propose a practical anisotropic noise mechanism that leverages randomized smoothing on multi-modal data and performs a grid-based splitting method to characterize complex semantic transformations. We also propose efficient algorithms to compute the certification in terms of object detection accuracy and IoU for large-scale MSF models. Empirically, we evaluate the efficacy of COMMIT in different settings and provide a comprehensive benchmark of certified robustness for different MSF models using the CARLA simulation platform. We show that the certification for MSF models is at most 48.39% higher than that of single-modal models, which validates the advantages of MSF models. We believe our certification framework and benchmark will contribute an important step towards certifiably robust AVs in practice.

AAAI Conference 2025 Conference Paper

DiffScene: Diffusion-Based Safety-Critical Scenario Generation for Autonomous Vehicles

  • Chejian Xu
  • Aleksandr Petiushko
  • Ding Zhao
  • Bo Li

The field of Autonomous Driving (AD) has witnessed significant progress in recent years. Among the various challenges faced, the safety evaluation of autonomous vehicles (AVs) stands out as a critical concern. Traditional evaluation methods are both costly and inefficient, often requiring extensive driving mileage in order to encounter rare safety-critical scenarios, which are distributed on the long tail of the complex real-world driving landscape. In this paper, we propose a unified approach, Diffusion-Based Safety-Critical Scenario Generation (DiffScene), to generate high-quality safety-critical scenarios which are both realistic and safety-critical for efficient AV evaluation. In particular, we propose a diffusion-based generation framework, leveraging the power of approximating the distribution of low-density spaces for diffusion models. We design several adversarial optimization objectives to guide the diffusion generation under predefined adversarial budgets. These objectives, such as safety-based objective, functionality-based objective, and constraint-based objective, ensure the generation of safety-critical scenarios while adhering to specific constraints. Extensive experimentation has been conducted to validate the efficacy of our approach. Compared with 6 SOTA baselines, DiffScene generates scenarios that are (1) more safety-critical under 3 metrics, (2) more realistic under 5 distance functions, and (3) more transferable to different AV algorithms. In addition, we demonstrate that training AV algorithms with scenarios generated by DiffScene leads to significantly higher performance in terms of the safety-critical metrics compared to baselines. These findings highlight the potential of DiffScene in addressing the challenges of AV safety evaluation, paving the way for safer AV development.

IJCAI Conference 2025 Conference Paper

Endogenous Recovery via Within-modality Prototypes for Incomplete Multimodal Hashing

  • Sa Zhu
  • Dayan Wu
  • Chenming Wu
  • Pengwen Dai
  • Bo Li

Multimodal hashing projects multimodal data into compact binary codes, enabling rapid and storage-efficient retrieval of large-scale multimedia content. In practical scenarios, the issue of missing modality frequently arises when dealing with multimodal data. Existing incomplete multimodal hashing techniques directly recover missing modalities by neural networks, resulting in a disjointed representation space between the recovered and true data. In this paper, we present a novel recovery paradigm, namely Prototype-based Modality Completion Hashing (PMCH). Instead of directly synthesizing it from available modalities, PMCH adaptively aggregates associated within-modality prototypes to recover missing modality data. Specifically, PMCH introduces an within-modality prototype learning module to optimize representative prototypes for each modality. These prototypes act as recovery anchors and reside within the same representation space as their corresponding modality data. Subsequently, PMCH adaptively aggregates the associated within-modality prototypes with coefficients derived from the modality-specific Weight-Net. By utilizing prototypes from the same modality, the semantic disparity between the reconstructed and authentic data can be substantially diminished. Extensive experiments on three widely used benchmark datasets demonstrate that PMCH can effectively recover the missing modality, and attain state-of-the-art performance in both complete and incomplete multimodal retrieval scenarios. Code is available at https: //github. com/Sasa77777779/PMCH. git.

NeurIPS Conference 2025 Conference Paper

Enhancing Diffusion-based Unrestricted Adversarial Attacks via Adversary Preferences Alignment

  • Kaixun Jiang
  • Zhaoyu Chen
  • HaiJing Guo
  • Jinglun Li
  • Jiyuan Fu
  • Pinxue Guo
  • Hao Tang
  • Bo Li

Preference alignment in diffusion models has primarily focused on benign human preferences (e. g. , aesthetic). In this paper, we propose a novel perspective: framing unrestricted adversarial example generation as a problem of aligning with adversary preferences. Unlike benign alignment, adversarial alignment involves two inherently conflicting preferences: visual consistency and attack effectiveness, which often lead to unstable optimization and reward hacking (e. g. , reducing visual quality to improve attack success). To address this, we propose APA (Adversary Preferences Alignment), a two-stage framework that decouples conflicting preferences and optimizes each with differentiable rewards. In the first stage, APA fine-tunes LoRA to improve visual consistency using rule-based similarity reward. In the second stage, APA updates either the image latent or prompt embedding based on feedback from a substitute classifier, guided by trajectory-level and step-wise rewards. To enhance black-box transferability, we further incorporate a diffusion augmentation strategy. Experiments demonstrate that APA achieves significantly better attack transferability while maintaining high visual consistency, inspiring further research to approach adversarial attacks from an alignment perspective.

AAAI Conference 2025 Conference Paper

HFF-Tracker: A Hierarchical Fine-grained Fusion Tracker for Referring Multi-Object Tracking

  • Zeyong Zhao
  • Yanchao Hao
  • Minghao Zhang
  • Qingbin Liu
  • Bo Li
  • Dianbo Sui
  • Shizhu He
  • Xi Chen

Referring Multi-Object Tracking (RMOT) aims to track multiple objects based on a provided language expression. Although prior studies have sought to accomplish this by integrating an textual module into the multi-object tracker, these methods combine text and image features in a basic way, neglecting the importance of text features. In this study, we propose a Hierarchical Fine-grained text-image Fusion tracker, named HFF-Tracker, which can perform fine-grained fusion of pixel-level visual features and text features across various semantic levels. Specifically, we have devised a Hierarchical Multi-Modal Fusion (HMMF) module to merge text and image features at an early stage in a hierarchical and detailed manner. The Text-Guided Decoder (TGD) is designed to provide the query with prior semantic information during the decoding process. Additionally, we have crafted a Text-Guided Prediction Head (TGPH) that utilizes text information to enhance the performance of the prediction head. Furthermore, we have implemented an adaptive Look-Back training strategy to maximize the utilization of valuable labeled data. Extensive experiments on the Refer-KITTI dataset and the Refer-KITTI-V2 dataset demonstrate that our proposed HFF-Tracker outperforms other state-of-the-art methods with remarkable margins.

JAAMAS Journal 2025 Journal Article

Information elicitation mechanisms for Bayesian auctions

  • Jing Chen
  • Bo Li
  • Yingkai Li

Abstract In this paper we design information elicitation mechanisms for Bayesian auctions. While in Bayesian mechanism design the distributions of the players’ private types are often assumed to be common knowledge, information elicitation considers the situation where the players know the distributions better than the decision maker. To weaken the information assumption in Bayesian auctions, we consider an information structure where the knowledge about the distributions is arbitrarily scattered among the players. In such an unstructured information setting, we design mechanisms for unit-demand auctions and additive auctions that aggregate the players’ knowledge, generating revenue that are constant approximations to the optimal Bayesian mechanisms with a common prior. Our mechanisms are 2-step dominant-strategy truthful and the approximation ratios improve gracefully with the amount of knowledge the players collectively have.

NeurIPS Conference 2025 Conference Paper

IPAD: Inverse Prompt for AI Detection - A Robust and Interpretable LLM-Generated Text Detector

  • Zheng Chen
  • Yushi Feng
  • Jisheng Dang
  • Changyang He
  • Yue Deng
  • Hongxi Pu
  • Haoxuan Li
  • Bo Li

Large Language Models (LLMs) have attained human-level fluency in text generation, which complicates the distinguishing between human-written and LLM generated texts. This increases the risk of misuse and highlights the need for reliable detectors. Yet, existing detectors exhibit poor robustness on out-of-distribution (OOD) data and attacked data, which is critical for real-world scenarios. Also, they struggle to provide interpretable evidence to support their decisions, thus undermining reliability. In light of these challenges, we propose IPAD (Inverse Prompt for AI Detection), a novel framework consisting of a Prompt Inverter that identifies predicted prompts that could have generated the input text, and two Distinguishers that examine the probability that the input texts align with the predicted prompts. Empirical evaluations demonstrate that IPAD outperforms the strongest baselines by 9. 05% (Average Recall) on in-distribution data, 12. 93% (AUROC) on out-of-distribution (OOD) data, and 5. 48% (AUROC) on attacked data. IPAD also performs robust on structured datasets. Furthermore, an interpretability assessment is conducted to illustrate that IPAD enhances the AI detection trustworthiness by allowing users to directly examine the decision-making evidence, which provides interpretable support for its state-of-the-art detection results.

TMLR Journal 2025 Journal Article

LLaVA-OneVision: Easy Visual Task Transfer

  • Bo Li
  • Yuanhan Zhang
  • Dong Guo
  • Renrui Zhang
  • Feng Li
  • Hao Zhang
  • Kaichen Zhang
  • Peiyuan Zhang

We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision is the first single model that can simultaneously push the performance boundaries of open LMMs in three important computer vision scenarios: single-image, multi-image, and video scenarios. Importantly, the design of LLaVA-OneVision allows strong transfer learning across different modalities/scenarios, yielding new emerging capabilities. In particular, strong video understanding and cross-scenario capabilities are demonstrated through task transfer from images to videos.

TMLR Journal 2025 Journal Article

LLaVA-Video: Video Instruction Tuning With Synthetic Data

  • Yuanhan Zhang
  • Jinming Wu
  • Wei Li
  • Bo Li
  • Zejun Ma
  • Ziwei Liu
  • Chunyuan Li

The development of video large multimodal models (LMMs) has been hindered by the difficulty of curating large amounts of high-quality raw data from the web. To address this, we consider an alternative approach, creating a high-quality synthetic dataset specifically for video instruction-following, namely LLaVA-Video-178K. This dataset includes key tasks such as detailed captioning, open-ended question-answering (QA), and multiple-choice QA. By training on this proposed dataset, in combination with existing visual instruction tuning data, we introduce LLaVA-Video, a new video LMM. Our experiments demonstrate that LLaVA-Video achieves strong performance across various video benchmarks, highlighting the effectiveness of our dataset. We plan to release the dataset, its generation pipeline, and the model checkpoints.

AAAI Conference 2025 Conference Paper

Logic-Q: Improving Deep Reinforcement Learning-based Quantitative Trading via Program Sketch-based Tuning

  • Zhiming Li
  • Junzhe Jiang
  • Yushi Cao
  • Aixin Cui
  • Bozhi Wu
  • Bo Li
  • Yang Liu
  • Danny Dongning Sun

Deep reinforcement learning (DRL) has revolutionized quantitative trading (Q-trading) by achieving decent performance without significant human expert knowledge. Despite its achievements, we observe that the current state-of-the-art DRL models are still ineffective in identifying the market trends, causing them to miss good trading opportunity or suffer from large drawdowns when encountering market crashes. To address this limitation, a natural approach is to incorporate human expert knowledge in identifying market trends. Whereas, such knowledge is abstract and hard to be quantified. In order to effectively leverage abstract human expert knowledge, in this paper, we propose a universal logic-guided deep reinforcement learning framework for Q-trading, called Logic-Q. In particular, Logic-Q adopts the program synthesis by sketching paradigm and introduces a logic-guided model design that leverages a lightweight, plug-and-play market trend-aware program sketch to determine the market trend and correspondingly adjusts the DRL policy in a post-hoc manner. Extensive evaluations of two popular quantitative trading tasks demonstrate that Logic-Q can significantly improve the performance of previous state-of-the-art DRL trading strategies.

TMLR Journal 2025 Journal Article

Long Context Transfer from Language to Vision

  • Peiyuan Zhang
  • Kaichen Zhang
  • Bo Li
  • Guangtao Zeng
  • Jingkang Yang
  • Yuanhan Zhang
  • Ziyue Wang
  • Haoran Tan

Video sequences offer valuable temporal information, but existing large multimodal models (LMMs) fall short in understanding extremely long videos. Many works address this by reducing the number of visual tokens using visual resamplers. Alternatively, in this paper, we approach this problem from the perspective of the language model. By simply extrapolating the context length of the language backbone, we enable LMMs to comprehend orders of magnitude more visual tokens without any video training. We call this phenomenon \textit{long context transfer} and carefully ablate its properties. To effectively measure LMMs' ability to generalize to long contexts in the vision modality, we develop V-NIAH (Visual Needle-In-A-Haystack), a purely synthetic long vision benchmark inspired by the language model' s NIAH test. Our proposed Long Video Assistant (LongVA) can process 2000 frames or over 200K visual tokens without additional complexities. With its extended context length, LongVA achieves state-of-the-art performance on Video-MME and MLVU among 7B-scale models by densely sampling more input frames.

AAAI Conference 2025 Conference Paper

MalDetectFormer: Leveraging Sparse SpatioTemporal Information for Effective Malicious Traffic Detection

  • Shuai Zhang
  • Yu Fan
  • Haoyi Zhou
  • Bo Li

Malicious traffic detection is one of the main challenges in the field of cybersecurity. Although modern deep learning methods have made progress in identifying malicious traffic, they often overlook the persistent nature of attack behaviors, making it difficult to distinguish between malicious and normal traffic at a single observation point. To address this issue, we propose MalDetectFormer, which aims to accurately capture the spatiotemporal dynamics of malicious traffic. By incorporating a sparse attention mechanism, MalDetectFormer can efficiently focus on key characteristics of traffic nodes while overcoming the challenges faced by traditional long-sequence processing. Additionally, by adopting a time-cyclic attention mechanism, the model can identify and capture persistent attack patterns of malicious traffic. Experiments conducted on benchmark datasets demonstrate the advantages of the proposed MalDetectFormer in both malicious traffic detection and malicious attack recognition tasks.

JBHI Journal 2025 Journal Article

MDP-GRL: Multi-disease Prediction by Graph-enabled Representation Learning

  • Yongan Guo
  • Yeqi Huang
  • Yuao Wang
  • Yun Liu
  • Shenqi Jing
  • Tao Shan
  • Yuan Miao
  • Bo Li

In recent years, automatic disease prediction based on electronic health records (EHRs) has emerged as a focal area of research in medical informatics. While successfully facilitating disease diagnosis, this technique still suffers from many limitations caused by the complexity of medical data, particularly the diverse relations and shared risk factors among multiple diseases. Besides, the data sparsity and imbalanced problem in EHR also undermines the effectiveness of existing approaches. Therefore, new approaches are urgently needed to accommodate the EHR features better and make effective predictions on individuals' potential diseases. To address the above challenges, this paper proposes MDP-GRL, a novel m ulti-label d isease p rediction model based on g raph-enabled r epresentation l earning. Specifically, MDP-GRL constructs a medical knowledge graph (MKG) based on the patient and disease information in EHR and then employs a graph neural network (GNN) to realise the disease prediction. To address the data sparsity issue, it incorporates supplementary data for both patients and diseases, i. e. , enriching patient nodes by personal basic information, examination indicators, and illness history, and supplementing disease information with comorbidity information, prevalent populations, common causes, and diagnostic basis. To mitigate the data complexity issue, MDP-GRL considers four different relation patterns in MKG, which optimizes the modelling capabilities. To address the data imbalance problem, it introduces an attention mechanism and self-adversarial negative sampling strategy, which further enhance MDP-GRL's ability to identify error-prone and minority samples. Comprehensive experiments and ablation studies are conducted based on the MIMIC-IV dataset. The results demonstrate MDP-GRL's superiority in multi-disease prediction compared with state-of-the-art approaches.

ICLR Conference 2025 Conference Paper

Multi-Task Dense Predictions via Unleashing the Power of Diffusion

  • Yuqi Yang
  • Peng-Tao Jiang
  • Qibin Hou
  • Hao Zhang 0063
  • Jinwei Chen 0003
  • Bo Li

Diffusion models have exhibited extraordinary performance in dense prediction tasks. However, there are few works exploring the diffusion pipeline for multi-task dense predictions. In this paper, we unlock the potential of diffusion models in solving multi-task dense predictions and propose a novel diffusion-based method, called TaskDiffusion, which leverages the conditional diffusion process in the decoder. Instead of denoising the noisy labels for different tasks separately, we propose a novel joint denoising diffusion process to capture the task relations during denoising. To be specific, our method first encodes the task-specific labels into a task-integration feature space to unify the encoding strategy. This allows us to get rid of the cumbersome task-specific encoding process. In addition, we also propose a cross-task diffusion decoder conditioned on task-specific multi-level features, which can model the interactions among different tasks and levels explicitly while preserving efficiency. Experiments show that our TaskDiffusion outperforms previous state-of-the-art methods for all dense prediction tasks on the widely-used PASCAL-Context and NYUD-v2 datasets. Our code is available at https://github.com/YuqiYang213/TaskDiffusion.

ICML Conference 2025 Conference Paper

Outlier Gradient Analysis: Efficiently Identifying Detrimental Training Samples for Deep Learning Models

  • Anshuman Chhabra
  • Bo Li
  • Jian Chen 0016
  • Prasant Mohapatra
  • Hongfu Liu 0001

A core data-centric learning challenge is the identification of training samples that are detrimental to model performance. Influence functions serve as a prominent tool for this task and offer a robust framework for assessing training data influence on model predictions. Despite their widespread use, their high computational cost associated with calculating the inverse of the Hessian matrix pose constraints, particularly when analyzing large-sized deep models. In this paper, we establish a bridge between identifying detrimental training samples via influence functions and outlier gradient detection. This transformation not only presents a straightforward and Hessian-free formulation but also provides insights into the role of the gradient in sample impact. Through systematic empirical evaluations, we first validate the hypothesis of our proposed outlier gradient analysis approach on synthetic datasets. We then demonstrate its effectiveness in detecting mislabeled samples in vision models and selecting data samples for improving performance of natural language processing transformer models. We also extend its use to influential sample identification for fine-tuning Large Language Models.

IJCAI Conference 2025 Conference Paper

PAMol: Pocket-Aware Drug Design Method with Hypergraph Representation of Protein Pocket Structure and Feature Fusion

  • Xiaoli Lin
  • Xiongwei Liao
  • Jun Pang
  • Bo Li
  • Xiaolong Zhang

Efficient generation of targeted drug molecules is crucial in the field of drug discovery. Most existing methods neglect the high-order information in the structure of protein pockets, limiting the performance of generated drug molecules. This paper proposes a pocket-aware drug design framework, namely PAMol, constructing the hypergraph to represent the spatial structure of protein pockets, effectively capturing high-order relations and neighborhood information within the pocket structures. This framework also fuses different modal embeddings from proteins and molecules, to generate high-quality molecules. In addition, a conditional molecule generation module uses the high-order structural information in protein pockets as constraints to more accurately generate molecules for specific targets. The performance of PAMol has been assessed by analyzing generated molecules in terms of vina score, high affinity, QED, SA, LogP, Lipinski, diversity, and time. Experimental results demonstrate the potential of PAMol for targeted drug design. The source code is available at https: //github. com/YICHUANSYQ/PAMol. git.

NeurIPS Conference 2025 Conference Paper

Photography Perspective Composition: Towards Aesthetic Perspective Recommendation

  • Lujian Yao
  • Siming Zheng
  • Xinbin Yuan
  • Zhuoxuan Cai
  • Pu Wu
  • Jinwei Chen
  • Bo Li
  • Peng-tao Jiang

Traditional photography composition approaches are dominated by 2D cropping-based methods. However, these methods fall short when scenes contain poorly arranged subjects. Professional photographers often employ perspective adjustment as a form of 3D recomposition, modifying the projected 2D relationships between subjects while maintaining their actual spatial positions to achieve better compositional balance. Inspired by this artistic practice, we propose photography perspective composition (PPC), extending beyond traditional cropping-based methods. However, implementing the PPC faces significant challenges: the scarcity of perspective transformation datasets and undefined assessment criteria for perspective quality. To address these challenges, we present three key contributions: (1) An automated framework for building PPC datasets through expert photographs. (2) A video generation approach that demonstrates the transformation process from less favorable to aesthetically enhanced perspectives. (3) A perspective quality assessment (PQA) model constructed based on human performance. Our approach is concise and requires no additional prompt instructions or camera trajectories, helping and guiding ordinary users to enhance their composition skills.

NeurIPS Conference 2025 Conference Paper

PolyGuard: Massive Multi-Domain Safety Policy-Grounded Guardrail Dataset

  • Mintong Kang
  • Zhaorun Chen
  • Chejian Xu
  • Jiawei Zhang
  • Chengquan Guo
  • Minzhou Pan
  • Ivan Revilla
  • Yu Sun

As large language models (LLMs) become widespread across diverse applications, concerns about the security and safety of LLM interactions have intensified. Numerous guardrail models and benchmarks have been developed to ensure LLM content safety. However, existing guardrail benchmarks are often built upon ad hoc risk taxonomies that lack a principled grounding in standardized safety policies, limiting their alignment with real-world operational requirements. Moreover, they tend to overlook domain-specific risks, while the same risk category can carry different implications across different domains. To bridge these gaps, we introduce PolyGuard, the first massive multi-domain safety policy-grounded guardrail dataset. PolyGuard offers: (1) broad domain coverage across eight safety-critical domains, such as finance, law, and codeGen; (2) policy-grounded risk construction based on authentic, domain-specific safety guidelines; (3) diverse interaction formats, encompassing declarative statements, questions, instructions, and multi-turn conversations; (4) advanced benign data curation via detoxification prompting to challenge over-refusal behaviors; and (5) \textbf{attack-enhanced instances} that simulate adversarial inputs designed to bypass guardrails. Based on PolyGuard, we benchmark 19 advanced guardrail models and uncover a series of findings, such as: (1) All models achieve varied F1 scores, with many demonstrating high variance across risk categories, highlighting their limited domain coverage and insufficient handling of domain-specific safety concerns; (2) As models evolve, their coverage of safety risks broadens, but performance on common risk categories may decrease; (3) All models remain vulnerable to optimized adversarial attacks. The policy-grounded \dataset establishes the first principled and comprehensive guardrail benchmark. We believe that \dataset and the unique insights derived from our evaluations will advance the development of policy-aligned and resilient guardrail systems.

NeurIPS Conference 2025 Conference Paper

Private Online Learning against an Adaptive Adversary: Realizable and Agnostic Settings

  • Bo Li
  • Wei Wang
  • Peng Ye

We revisit the problem of private online learning, in which a learner receives a sequence of $T$ data points and has to respond at each time-step a hypothesis. It is required that the entire stream of output hypotheses should satisfy differential privacy. Prior work of Golowich and Livni [2021] established that every concept class $\mathcal{H}$ with finite Littlestone dimension $d$ is privately online learnable in the realizable setting. In particular, they proposed an algorithm that achieves an $O_{d}(\log T)$ mistake bound against an oblivious adversary. However, their approach yields a suboptimal $\tilde{O}\_{d}(\sqrt{T})$ bound against an adaptive adversary. In this work, we present a new algorithm with a mistake bound of $O_{d}(\log T)$ against an adaptive adversary, closing this gap. We further investigate the problem in the agnostic setting, which is more general than the realizable setting as it does not impose any assumptions on the data. We give an algorithm that obtains a sublinear regret of $\tilde{O}_d(\sqrt{T})$ for generic Littlestone classes, demonstrating that they are also privately online learnable in the agnostic setting.

AAAI Conference 2025 Conference Paper

Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking

  • Zhengfei Xu
  • Sijia Zhao
  • Yanchao Hao
  • Xiaolong Liu
  • Lili Li
  • Yuyang Yin
  • Bo Li
  • Xi Chen

Visual Entity Linking (VEL) is a crucial task for achieving fine-grained visual understanding, matching objects within images (visual mentions) to entities in a knowledge base. Previous VEL tasks rely on textual inputs, but writing queries for complex scenes can be challenging. Visual inputs like clicks or bounding boxes offer a more convenient alternative. Therefore, we propose a new task, Pixel-Level Visual Entity Linking (PL-VEL), which uses pixel masks from visual inputs to refer to objects, supplementing reference methods for VEL. To facilitate research on this task, we have constructed the MaskOVEN-Wiki dataset through an entirely automatic reverse region-entity annotation framework. This dataset contains over 5 million annotations aligning pixel-level regions with entity-level labels, which will advance visual understanding towards fine-grained. Moreover, as pixel masks correspond to semantic regions in an image, we enhance previous patch-interacted attention with region-interacted attention by a visual semantic tokenization approach. Manual evaluation results indicate that the reverse annotation framework achieved a 94.8% annotation success rate. Experimental results show that models trained on this dataset improved accuracy by 18 points compared to zero-shot models. Additionally, the semantic tokenization method achieved a 5-point accuracy improvement over the trained baseline.

TMLR Journal 2025 Journal Article

RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning

  • Mingqi Yuan
  • Roger Creus Castanyer
  • Bo Li
  • Xin Jin
  • Wenjun Zeng
  • Glen Berseth

Extrinsic rewards can effectively guide reinforcement learning (RL) agents in specific tasks. However, extrinsic rewards frequently fall short in complex environments due to the significant human effort needed for their design and annotation. This limitation underscores the necessity for intrinsic rewards, which offer auxiliary and dense signals and can enable agents to learn in an unsupervised manner. Although various intrinsic reward formulations have been proposed, their implementation and optimization details are insufficiently explored and lack standardization, thereby hindering research progress. To address this gap, we introduce RLeXplore, a unified, highly modularized, and plug-and-play framework offering reliable implementations of eight state-of-the-art intrinsic reward methods. Furthermore, we conduct an in-depth study that identifies critical implementation details and establishes well-justified standard practices in intrinsically-motivated RL. Our documentation, examples, and source code are available at [https://github.com/RLE-Foundation/RLeXplore](https://github.com/RLE-Foundation/RLeXplore).

AAAI Conference 2025 System Paper

RLLTE: Long-Term Evolution Project of Reinforcement Learning

  • Mingqi Yuan
  • Zequn Zhang
  • Yang Xu
  • Shihao Luo
  • Bo Li
  • Xin Jin
  • Wenjun Zeng

We present RLLTE: a long-term evolution, extremely modular, and open-source framework for reinforcement learning (RL) research and application. Beyond delivering top-notch algorithm implementations, RLLTE also serves as a toolkit for developing algorithms. More specifically, RLLTE decouples the RL algorithms completely from the exploitation-exploration perspective, providing a large number of components to accelerate algorithm development and evolution. In particular, RLLTE is the first RL framework to build a comprehensive ecosystem, which includes model training, evaluation, deployment, benchmark hub, and large language model (LLM)-empowered copilot. RLLTE is expected to set standards for RL engineering practice and be highly stimulative for industry and academia. Our documentation, examples, and source code are available at https://github.com/RLE-Foundation/rllte.

NeurIPS Conference 2025 Conference Paper

SE-GUI: Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning

  • Xinbin Yuan
  • Jian Zhang
  • Kaixin Li
  • Zhuoxuan Cai
  • Lujian Yao
  • Jie Chen
  • Enguang Wang
  • Qibin Hou

Graphical User Interface (GUI) agents have made substantial strides in understanding and executing user instructions across diverse platforms. Yet, grounding these instructions to precise interface elements remains challenging—especially in complex, high-resolution, professional environments. Traditional supervised fine-tuning (SFT) methods often require large volumes of diverse data and exhibit weak generalization. To overcome these limitations, we introduce a reinforcement learning (RL)-based framework that incorporates three core strategies: (1) seed data curation to ensure high-quality training samples, (2) a dense policy gradient that provides continuous feedback based on prediction accuracy, and (3) a self-evolutionary reinforcement finetuning mechanism that iteratively refines the model using attention maps. With only 3k training samples, our 7B-parameter model achieves state-of-the-art results among similarly sized models on three grounding benchmarks. Notably, it attains 47. 3\% accuracy on the ScreenSpot-Pro dataset—outperforming much larger models, such as UI-TARS-72B, by a margin of 24. 2\%. These findings underscore the effectiveness of RL-based approaches in enhancing GUI agent performance, particularly in high-resolution, complex environments.

NeurIPS Conference 2025 Conference Paper

SECODEPLT: A Unified Benchmark for Evaluating the Security Risks and Capabilities of Code GenAI

  • Yuzhou Nie
  • Zhun Wang
  • Yu Yang
  • Ruizhe Jiang
  • Yuheng Tang
  • Xander Davies
  • Yarin Gal
  • Bo Li

Existing benchmarks for evaluating the security risks and capabilities (e. g. , vulnerability detection) of code-generating large language models (LLMs) face several key limitations: (1) limited coverage of risk and capabilities; (2) reliance on static evaluation metrics such as LLM judgments or rule-based detection, which lack the precision of dynamic analysis; and(3) a trade-off between data quality and benchmark scale. To address these challenges, we introduce a general and scalable benchmark construction framework that begins with manually validated, high-quality seed examples and expands them via targeted mutations. Each mutated sample retains the seed’s security semantics while providing diverse, unseen instances. The resulting benchmark bundles every artifact required for dynamic evaluation, including prompts, vulnerable and patched code, test cases, and ground-truth proofs of concept, enabling rigorous measurement of insecure coding, vulnerability detection, and patch generation. Applying this framework to Python, C/C++, and Java, we build SECODEPLT, a dataset of more than 5. 9k samples spanning 44 CWE-based risk categories and three security capabilities. Compared with state-of-the-art benchmarks, SECODEPLT offers broader coverage, higher data fidelity, and substantially greater scale. We use SECODEPLT to evaluate leading code-generation LLMs and agents, revealing their strengths and weaknesses in both generating secure code and identifying or fixing vulnerabilities. We provide our code in \url{https: //github. com/ucsb-mlsec/SeCodePLT}, data in \url{https: //huggingface. co/datasets/UCSB-SURFI/SeCodePLT}

JBHI Journal 2025 Journal Article

SNER: Semi-Supervised Named Entity Recognition for Large Volume of Diabetes Data

  • Jingyi Zuo
  • Qijie Qian
  • Yun Liu
  • Shan Lu
  • Bo Li
  • Yongan Guo

The medical literature and records on diabetes provide crucial resources for diabetes prevention and treatment. However, extracting entities from these textual diabetes data is crucial but challenging. Named entity recognition (NER) - an important corner-stone technology of natural language processing - has been studied well in the general medical field. However, there is still a lack of effective NER methods to handle diabetes data. Briefly, there are three challenges in the real world, including 1) the large volume of diabetes-related data to be processed, 2) the lack of labeled data, and 3) the high costs of manual labeling. To mitigate those challenges, this paper proposes a novel NER method based on semi-supervised learning, namely SNER, for diabetes data processing. It utilizes large amounts of unlabeled data to solve the problem of lack of labeled data. Specifically, it filters the predicted labels based on their confidence and uncertainty scores to reduce the noise entering the model and divide them into positive pseudo-labels and negative pseudo-labels. Also, it utilizes negative pseudo-labels reasonably to improve the training effect of pseudo-labels. Experiments on two public diabetes datasets show that SNER achieves the best performance compared with existing state-of-the-art models.

AAAI Conference 2025 Conference Paper

Sparse Transfer Learning Accelerates and Enhances Certified Robustness: A Comprehensive Study

  • Zhangheng Li
  • Tianlong Chen
  • Linyi Li
  • Bo Li
  • Zhangyang Wang

Certified robustness is a critical measure for assessing the reliability of machine learning systems. Traditionally, the computational burden associated with certifying the robustness of machine learning models has posed a substantial challenge, particularly with the continuous expansion of model sizes. In this paper, we introduce an innovative approach to expedite the verification process for L2-norm certified robustness through sparse transfer learning. Our approach is both efficient and effective. It leverages verification results obtained from pre-training tasks and applies sparse updates to these results. To enhance performance, we incorporate dynamic sparse mask selection and introduce a novel stability-based regularizer called DiffStab. Empirical results demonstrate that our method accelerates the verification process for downstream tasks by as much as 70-80%, with only slight reductions in certified accuracy compared to dense parameter updates. We further validate that this performance improvement is even more pronounced in the few-shot transfer learning scenario.

AAAI Conference 2025 Conference Paper

The (Exact) Price of Cardinality for Indivisible Goods: A Parametric Perspective

  • Alexander Lam
  • Bo Li
  • Ankang Sun

We adopt a parametric approach to analyze the worst-case degradation in social welfare when the allocation of indivisible goods is constrained to be fair. Specifically, we are concerned with cardinality-constrained allocations, which require that each agent has at most k items in their allocated bundle. We propose the notion of the price of cardinality, which captures the worst-case multiplicative loss of utilitarian or egalitarian social welfare resulting from imposing the cardinality constraint. We then characterize tight or almost-tight bounds on the price of cardinality as exact functions of the instance parameters, demonstrating how the social welfare improves as k is increased. In particular, one of our main results refines and generalizes the existing asymptotic bound of Θ(√n) on the price of balancedness. We also further extend our analysis to the problem where the items are partitioned into disjoint categories, and each category has its own cardinality constraint. Through a parametric study of the price of cardinality, we provide a framework which aids decision makers in choosing an ideal level of cardinality-based fairness, using their knowledge of the potential loss of utilitarian and egalitarian social welfare.

NeurIPS Conference 2025 Conference Paper

VMDT: Decoding the Trustworthiness of Video Foundation Models

  • Yujin Potter
  • Zhun Wang
  • Nicholas Crispino
  • Kyle Montgomery
  • Alexander Xiong
  • Ethan Chang
  • Francesco Pinto
  • Yuqi Chen

As foundation models become more sophisticated, ensuring their trustworthiness becomes increasingly critical; yet, unlike text and image, the video modality still lacks comprehensive trustworthiness benchmarks. We introduce VMDT (Video-Modal DecodingTrust), the first unified platform for evaluating text-to-video (T2V) and video-to-text (V2T) models across five key trustworthiness dimensions: safety, hallucination, fairness, privacy, and adversarial robustness. Through our extensive evaluation of 7 T2V models and 19 V2T models using VMDT, we uncover several significant insights. For instance, all open-source T2V models evaluated fail to recognize harmful queries and often generate harmful videos, while exhibiting higher levels of unfairness compared to image modality models. In V2T models, unfairness and privacy risks rise with scale, whereas hallucination and adversarial robustness improve---though overall performance remains low. Uniquely, safety shows no correlation with model size, implying that factors other than scale govern current safety levels. Our findings highlight the urgent need for developing more robust and trustworthy video foundation models, and VMDT provides a systematic framework for measuring and tracking progress toward this goal. The code is available at https: //sunblaze-ucb. github. io/VMDT-page/.

NeurIPS Conference 2024 Conference Paper

3D Focusing-and-Matching Network for Multi-Instance Point Cloud Registration

  • Liyuan Zhang
  • Le Hui
  • Qi Liu
  • Bo Li
  • Yuchao Dai

Multi-instance point cloud registration aims to estimate the pose of all instances of a model point cloud in the whole scene. Existing methods all adopt the strategy of first obtaining the global correspondence and then clustering to obtain the pose of each instance. However, due to the cluttered and occluded objects in the scene, it is difficult to obtain an accurate correspondence between the model point cloud and all instances in the scene. To this end, we propose a simple yet powerful 3D focusing-and-matching network for multi-instance point cloud registration by learning the multiple pair-wise point cloud registration. Specifically, we first present a 3D multi-object focusing module to locate the center of each object and generate object proposals. By using self-attention and cross-attention to associate the model point cloud with structurally similar objects, we can locate potential matching instances by regressing object centers. Then, we propose a 3D dual-masking instance matching module to estimate the pose between the model point cloud and each object proposal. It performs instance mask and overlap mask masks to accurately predict the pair-wise correspondence. Extensive experiments on two public benchmarks, Scan2CAD and ROBI, show that our method achieves a new state-of-the-art performance on the multi-instance point cloud registration task.

IJCAI Conference 2024 Conference Paper

A Complete Landscape of EFX Allocations on Graphs: Goods, Chores and Mixed Manna

  • Yu Zhou
  • Tianze Wei
  • Minming Li
  • Bo Li

We study envy-free up to any item (EFX) allocations on graphs where vertices and edges represent agents and items respectively. An agent is only interested in items that are incident to her and all other items have zero marginal values to her. Christodoulou et al. first proposed this setting and studied the case of goods. We extend this setting to the case of mixed manna where an item may be liked or disliked by its endpoint agents. In our problem, an agent has an arbitrary valuation over her incident items such that the items she likes have non-negative marginal values to her and those she dislikes have non-positive marginal values. We provide a complete study of the four notions of EFX for mixed manna in the literature, which differ by whether the removed item can have zero marginal value. We prove that an allocation that satisfies the notion of EFX where the virtually-removed item could always have zero marginal value may not exist and determining its existence is NP-complete, while one that satisfies any of the other three notions always exists and can be computed in polynomial time. We also prove that an orientation (i. e. , a special allocation where each edge must be allocated to one of its endpoint agents) that satisfies any of the four notions may not exist, and determining its existence is NP-complete.

NeurIPS Conference 2024 Conference Paper

AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

  • Zhaorun Chen
  • Zhen Xiang
  • Chaowei Xiao
  • Dawn Song
  • Bo Li

LLM agents have demonstrated remarkable performance across various applications, primarily due to their advanced capabilities in reasoning, utilizing external knowledge and tools, calling APIs, and executing actions to interact with environments. Current agents typically utilize a memory module or a retrieval-augmented generation (RAG) mechanism, retrieving past knowledge and instances with similar embeddings from knowledge bases to inform task planning and execution. However, the reliance on unverified knowledge bases raises significant concerns about their safety and trustworthiness. To uncover such vulnerabilities, we propose a novel red teaming approach AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory orRAG knowledge base. In particular, we form the trigger generation process as a constrained optimization to optimize backdoor triggers by mapping the triggered instances to a unique embedding space, so as to ensure that whenever a user instruction contains the optimized backdoor trigger, the malicious demonstrations are retrieved from the poisoned memory or knowledge base with high probability. In the meantime, benign instructions without the trigger will still maintain normal performance. Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning, and the optimized backdoor trigger exhibits superior transferability, resilience, and stealthiness. Extensive experiments demonstrate AgentPoison's effectiveness in attackingthree types of real-world LLM agents: RAG-based autonomous driving agent, knowledge-intensive QA agent, and healthcare EHRAgent. We inject the poisoning instances into the RAG knowledge base and long-term memories of these agents, respectively, demonstrating the generalization of AgentPoison. On each agent, AgentPoison achieves an average attack success rate of $\ge$ 80% with minimalimpact on benign performance ($\le$ 1%) with a poison rate < 0. 1%. The code and data is available at https: //github. com/BillChan226/AgentPoison.

AAMAS Conference 2024 Conference Paper

Allocating Contiguous Blocks of Indivisible Chores Fairly: Revisited

  • Ankang Sun
  • Bo Li

Resource allocation is a fundamental problem in multi-agent systems, with two key factors to consider: fairness and efficiency. The concept of the “price of fairness” helps in the understanding of efficiency loss under fairness constraints. Among the diverse resource allocation settings, cake cutting stands out as a prominent model. Recently, Höhne and van Stee [Inf. Comput. , 2021] examined a variation of this model in which the cake represents indivisible chores, with each agent requiring a connected piece of the chores. Höhne and van Stee provided upper and lower bounds on the price of fairness when fairness is measured by envy-freeness and proportionality. However, in the case of indivisible items, achieving envy-free and proportional allocations is difficult, rendering these bounds insufficient for a comprehensive understanding of the true trade-off between fairness and efficiency. In this paper, we revisit the same problem and consider fairness notions that are satisfiable, including proportionality up to one item, and maximin share fairness. By presenting tight bounds on the price of fairness with respect to these notions, we complete the picture of fairness and efficiency trade-off.

IJCAI Conference 2024 Conference Paper

Allocating Mixed Goods with Customized Fairness and Indivisibility Ratio

  • Bo Li
  • Zihao Li
  • Shengxin Liu
  • Zekai Wu

We consider the problem of fairly allocating a combination of divisible and indivisible goods. While fairness criteria like envy-freeness (EF) and proportionality (PROP) can always be achieved for divisible goods, only their relaxed versions, such as the “up to one” relaxations EF1 and PROP1, can be satisfied when the goods are indivisible. The “up to one” relaxations require the fairness conditions to be satisfied provided that one good can be completely eliminated or added in the comparison. In this work, we bridge the gap between the two extremes and propose “up to a fraction” relaxations for the allocation of mixed divisible and indivisible goods. The fraction is determined based on the proportion of indivisible goods, which we call the indivisibility ratio. The new concepts also introduce asymmetric conditions that are customized for individuals with varying indivisibility ratios. We provide both upper and lower bounds on the fractions of the modified item in order to satisfy the fairness criterion. Our results are tight up to a constant for EF and asymptotically tight for PROP.

NeurIPS Conference 2024 Conference Paper

BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment

  • Jiongxiao Wang
  • Jiazhao Li
  • Yiquan Li
  • Xiangyu Qi
  • Junjie Hu
  • Yixuan Li
  • Patrick McDaniel
  • Muhao Chen

Despite the general capabilities of Large Language Models (LLMs) like GPT-4, these models still request fine-tuning or adaptation with customized data when meeting the specific business demands and intricacies of tailored use cases. However, this process inevitably introduces new safety threats, particularly against the Fine-tuning based Jailbreak Attack (FJAttack) under the setting of Language-Model-as-a-Service (LMaaS), where the model's safety has been significantly compromised by fine-tuning on users' uploaded examples that contain just a few harmful examples. Though potential defenses have been proposed that the service providers of LMaaS can integrate safety examples into the fine-tuning dataset to reduce safety issues, such approaches require incorporating a substantial amount of data, making it inefficient. To effectively defend against the FJAttack with limited safety examples under LMaaS, we propose the Backdoor Enhanced Safety Alignment method inspired by an analogy with the concept of backdoor attacks. In particular, service providers will construct prefixed safety examples with a secret prompt, acting as a "backdoor trigger". By integrating prefixed safety examples into the fine-tuning dataset, the subsequent fine-tuning process effectively acts as the "backdoor attack", establishing a strong correlation between the secret prompt and safety generations. Consequently, safe responses are ensured once service providers prepend this secret prompt ahead of any user input during inference. Our comprehensive experiments demonstrate that through the Backdoor Enhanced Safety Alignment with adding as few as 11 prefixed safety examples, the maliciously fine-tuned LLMs will achieve similar safety performance as the original aligned models without harming the benign performance. Furthermore, we also present the effectiveness of our method in a more practical setting where the fine-tuning data consists of both FJAttack examples and the fine-tuning task data.

NeurIPS Conference 2024 Conference Paper

BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning

  • Haohong Lin
  • Wenhao Ding
  • Jian Chen
  • Laixi Shi
  • Jiacheng Zhu
  • Bo Li
  • Ding Zhao

Offline model-based reinforcement learning (MBRL) enhances data efficiency by utilizing pre-collected datasets to learn models and policies, especially in scenarios where exploration is costly or infeasible. Nevertheless, its performance often suffers from the objective mismatch between model and policy learning, resulting in inferior performance despite accurate model predictions. This paper first identifies the primary source of this mismatch comes from the underlying confounders present in offline data for MBRL. Subsequently, we introduce B ilin E ar CAUS al r E presentation (BECAUSE), an algorithm to capture causal representation for both states and actions to reduce the influence of the distribution shift, thus mitigating the objective mismatch problem. Comprehensive evaluations on 18 tasks that vary in data quality and environment context demonstrate the superior performance of BECAUSE over existing offline RL algorithms. We show the generalizability and robustness of BECAUSE under fewer samples or larger numbers of confounders. Additionally, we offer theoretical analysis of BECAUSE to prove its error bound and sample efficiency when integrating causal representation into offline MBRL. See more details in our project page: https: //sites. google. com/view/be-cause.

ICRA Conference 2024 Conference Paper

BEE-Net: Bridging Semantic and Instance with Gated Encoding and Edge Constraint for Efficient Panoptic Segmentation

  • Xinyang Huang
  • Guanghui Zhang
  • Dongchen Zhu
  • Yunpeng Sun
  • Wenjun Shi
  • Gang Ye
  • Yang Xiao
  • Lei Wang 0202

Panoptic segmentation is a challenging perception task, which can help robots to comprehensively perceive the surrounding environment. In the task, we notice that semantic, instance, and panoptic have rich relations, however, which are rarely explored. In this work, we propose a novel panoptic, instance, and semantic bridged network to delve into the reciprocal relation. To make semantic and instance benefit from each other, we design a novel Gated Encoding (GE) module, incorporating complementary cues between semantic and instance heads through the gated mechanism. In addition, a novel edge-aware consistency constraint among edges of each task is presented, which exhaustedly exploits geometric constraints, to boost the segmentation quality of challenging edges. Experimental results on the Cityscapes and MS-COCO datasets demonstrate that our approach achieves state-of-the-art performance in an efficient CNN-based paradigm, attaining a balance between accuracy and efficiency.

AAMAS Conference 2024 Conference Paper

Bounding the Incentive Ratio of the Probabilistic Serial Rule

  • Bo Li
  • Ankang Sun
  • Shiji Xing

Probabilistic Serial (PS) is a well-studied allocation rule used for distributing resources among multiple agents. Although it satisfies certain notable fairness and welfare properties, it is not truthful. This means that agents have incentives to misreport their preferences in order to influence the allocation in their favor. An interesting research question is to understand the extent to which an agent can gain from manipulation. A widely-accepted concept employed for this exploration is the incentive ratio, defined as the supreme ratio, across all instances of the problem, between the utility an agent obtains by employing an optimal manipulation strategy and the utility they receive when being truthful. Wang et al. [AAAI, 2020] examined the incentive ratio of PS for the setting when the number of items 𝑚 equals the number of agents 𝑛 and proved that the incentive ratio is 1. 5. In this paper, we study the general scenario in which 𝑚 and 𝑛 can be arbitrary. We prove that in this case, the tight incentive ratio of PS is 2 − 1 2𝑛−1.

AAAI Conference 2024 Conference Paper

CF-NeRF: Camera Parameter Free Neural Radiance Fields with Incremental Learning

  • Qingsong Yan
  • Qiang Wang
  • Kaiyong Zhao
  • Jie Chen
  • Bo Li
  • Xiaowen Chu
  • Fei Deng

Neural Radiance Fields have demonstrated impressive performance in novel view synthesis. However, NeRF and most of its variants still rely on traditional complex pipelines to provide extrinsic and intrinsic camera parameters, such as COLMAP. Recent works, like NeRFmm, BARF, and L2G-NeRF, directly treat camera parameters as learnable and estimate them through differential volume rendering. However, these methods work for forward-looking scenes with slight motions and fail to tackle the rotation scenario in practice. To overcome this limitation, we propose a novel camera parameter free neural radiance field (CF-NeRF), which incrementally reconstructs 3D representations and recovers the camera parameters inspired by incremental structure from motion. Given a sequence of images, CF-NeRF estimates camera parameters of images one by one and reconstructs the scene through initialization, implicit localization, and implicit optimization. To evaluate our method, we use a challenging real-world dataset, NeRFBuster, which provides 12 scenes under complex trajectories. Results demonstrate that CF-NeRF is robust to rotation and achieves state-of-the-art results without providing prior information and constraints.

NeurIPS Conference 2024 Conference Paper

Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

  • Yiquan Li
  • Zhongzhu Chen
  • Kun Jin
  • Jiongxiao Wang
  • Jiachen Lei
  • Bo Li
  • Chaowei Xiao

Diffusion Purification, purifying noised images with diffusion models, has been widely used for enhancing certified robustness via randomized smoothing. However, existing frameworks often grapple with the balance between efficiency and effectiveness. While the Denoising Diffusion Probabilistic Model (DDPM) offers an efficient single-step purification, it falls short in ensuring purified images reside on the data manifold. Conversely, the Stochastic Diffusion Model effectively places purified images on the data manifold but demands solving cumbersome stochastic differential equations, while its derivative, the Probability Flow Ordinary Differential Equation (PF-ODE), though solving simpler ordinary differential equations, still requires multiple computational steps. In this work, we demonstrated that an ideal purification pipeline should generate the purified images on the data manifold that are as much semantically aligned to the original images for effectiveness in one step for efficiency. Therefore, we introduced Consistency Purification, an efficiency-effectiveness Pareto superior purifier compared to the previous work. Consistency Purification employs the consistency model, a one-step generative model distilled from PF-ODE, thus can generate on-manifold purified images with a single network evaluation. However, the consistency model is designed not for purification thus it does not inherently ensure semantic alignment between purified and original images. To resolve this issue, we further refine it through Consistency Fine-tuning with LPIPS loss, which enables more aligned semantic meaning while keeping the purified images on data manifold. Our comprehensive experiments demonstrate that our Consistency Purification framework achieves state-of-the-art certified robustness and efficiency compared to baseline methods.

AAAI Conference 2024 Conference Paper

Contrastive Balancing Representation Learning for Heterogeneous Dose-Response Curves Estimation

  • Minqin Zhu
  • Anpeng Wu
  • Haoxuan Li
  • Ruoxuan Xiong
  • Bo Li
  • Xiaoqing Yang
  • Xuan Qin
  • Peng Zhen

Estimating the individuals' potential response to varying treatment doses is crucial for decision-making in areas such as precision medicine and management science. Most recent studies predict counterfactual outcomes by learning a covariate representation that is independent of the treatment variable. However, such independence constraints neglect much of the covariate information that is useful for counterfactual prediction, especially when the treatment variables are continuous. To tackle the above issue, in this paper, we first theoretically demonstrate the importance of the balancing and prognostic representations for unbiased estimation of the heterogeneous dose-response curves, that is, the learned representations are constrained to satisfy the conditional independence between the covariates and both of the treatment variables and the potential responses. Based on this, we propose a novel Contrastive balancing Representation learning Network using a partial distance measure, called CRNet, for estimating the heterogeneous dose-response curves without losing the continuity of treatments. Extensive experiments are conducted on synthetic and real-world datasets demonstrating that our proposal significantly outperforms previous methods.

ICRA Conference 2024 Conference Paper

CVFormer: Learning Circum-View Representation and Consistency for Vision-Based Occupancy Prediction via Transformers

  • Zhengqi Bai
  • Wenjun Shi
  • Dongchen Zhu
  • Hanlong Kang
  • Guanghui Zhang
  • Gang Ye
  • Yang Xiao
  • Lei Wang 0202

With the increasing demands for perception accuracy in autonomous driving, there is a growing focus on fine-grained 3D semantic occupancy prediction. Effectively representing detailed three-dimensional scenes has become a significant challenge in the development of this task. In this paper, we present a novel transformer-based framework named CVFormer, which leverages two-dimensional circum-views from the ego to excavate three-dimensional features of the surrounding environment. Circum-views provide a novel solution for effectively addressing the representation of dense and fine-grained scenes. Specifically, a multi-attention module CTMA is designed for fusing temporal features from circum-views to fully exploit the spatiotemporal correlations between frames and capture more comprehensive clues. Furthermore, a novel 2D projection constraint is established by observing objects from different perspective directions, and multiple 3D constraints based on object invariance and semantic consistency are also conducted for supervising the network, which enhances its performance of understanding the scene. Experimental results on nuScenes dataset demonstrate that the proposed CVFormer obviously outperforms existing methods for occupancy prediction.

NeurIPS Conference 2024 Conference Paper

Data Free Backdoor Attacks

  • Bochuan Cao
  • Jinyuan Jia
  • Chuxuan Hu
  • Wenbo Guo
  • Zhen Xiang
  • Jinghui Chen
  • Bo Li
  • Dawn Song

Backdoor attacks aim to inject a backdoor into a classifier such that it predicts any input with an attacker-chosen backdoor trigger as an attacker-chosen target class. Existing backdoor attacks require either retraining the classifier with some clean data or modifying the model's architecture. As a result, they are 1) not applicable when clean data is unavailable, 2) less efficient when the model is large, and 3) less stealthy due to architecture changes. In this work, we propose DFBA, a novel retraining-free and data-free backdoor attack without changing the model architecture. Technically, our proposed method modifies a few parameters of a classifier to inject a backdoor. Through theoretical analysis, we verify that our injected backdoor is provably undetectable and unremovable by various state-of-the-art defenses under mild assumptions. Our evaluation on multiple datasets further demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100\% attack success rates, and 3) bypasses six existing state-of-the-art defenses. Moreover, our comparison with a state-of-the-art non-data-free backdoor attack shows our attack is more stealthy and effective against various defenses while achieving less classification accuracy loss. We will release our code upon paper acceptance.

JBHI Journal 2024 Journal Article

Difference-Deformable Convolution With Pseudo Scale Instance Map for Cell Localization

  • Chengyang Zhang
  • Jie Chen
  • Bo Li
  • Min Feng
  • Yongquan Yang
  • Qikui Zhu
  • Hong Bu Bu

Cell localization still faces two unresolved challenges: 1) the dramatic variations in cell morphology, coupled with the heterogeneous intensity distribution of lightly stained cells; 2) existing cell location maps lack scale information, resulting in insufficient supervision for point maps and inaccurate supervision for density maps. 1) To address the first challenges, we introduce a novel gradient-aware and shape-adaptive Difference-Deformable Convolution (DDConv), which enhances the model's robustness to color by leveraging gradient information while adaptively adjusting the shape of the convolutional kernel to tackle the substantial variability in cell morphology. 2) To overcome the issue of unreasonable location maps, we propose the Pseudo-Scale Instance (PSI) map, which can adaptively provide the corresponding scale information for each cell to realize accurate supervision. We analyze and evaluate DDConv and the PSI map in three challenging cell localization tasks. In comparison to existing methods, our proposed approach significantly enhances localization performance, setting a new benchmark for the cell localization task.

AAAI Conference 2024 Conference Paper

Envy-Free House Allocation under Uncertain Preferences

  • Haris Aziz
  • Isaiah Iliffe
  • Bo Li
  • Angus Ritossa
  • Ankang Sun
  • Mashbat Suzuki

Envy-freeness is one of the most important fairness concerns when allocating items. We study envy-free house allocation when agents have uncertain preferences over items and consider several well-studied preference uncertainty models. The central problem that we focus on is computing an allocation that has the highest probability of being envy-free. We show that each model leads to a distinct set of algorithmic and complexity results, including detailed results on (in-)approximability. En route, we consider two related problems of checking whether there exists an allocation that is possibly or necessarily envy-free. We give a complete picture of the computational complexity of these two problems for all the uncertainty models we consider.

AAMAS Conference 2024 Conference Paper

Fair and Efficient Division of a Discrete Cake with Switching Utility Loss

  • Zheng Chen
  • Bo Li
  • Minming Li
  • Guochuan Zhang

Cake cutting is a widely studied model for allocating resources with temporal or spatial structures among agents. Recently, a new line of research has emerged that focuses on the discrete variant, where the resources are indivisible and connected by a path. In some real-world applications, the resources are interdependent, and dividing the cake may reduce their effectiveness. In this paper, we introduce a model that captures the effect of division as switching utility loss and investigate the tradeoff between fairness and efficiency for various settings. Specifically, we measure fairness and efficiency using the popular notions of envy-freeness up to one item (EF1) and social welfare, respectively. The goal of our study is to understand how much social welfare must be sacrificed to ensure EF1 allocations and design polynomial-time algorithms that can compute EF1 allocations with the best possible social welfare guarantee.

JMLR Journal 2024 Journal Article

Gradual Domain Adaptation: Theory and Algorithms

  • Yifei He
  • Haoxiang Wang
  • Bo Li
  • Han Zhao

Unsupervised domain adaptation (UDA) adapts a model from a labeled source domain to an unlabeled target domain in a one-off way. Though widely applied, UDA faces a great challenge whenever the distribution shift between the source and the target is large. Gradual domain adaptation (GDA) mitigates this limitation by using intermediate domains to gradually adapt from the source to the target domain. In this work, we first theoretically analyze gradual self-training, a popular GDA algorithm, and provide a significantly improved generalization bound compared with Kumar et al. (2020). Our theoretical analysis leads to an interesting insight: to minimize the generalization error on the target domain, the sequence of intermediate domains should be placed uniformly along the Wasserstein geodesic between the source and target domains. The insight is particularly useful under the situation where intermediate domains are missing or scarce, which is often the case in real-world applications. Based on the insight, we propose Generative Gradual Domain Adaptation with Optimal Transport (GOAT), an algorithmic framework that can generate intermediate domains in a data-dependent way. More concretely, we first generate intermediate domains along the Wasserstein geodesic between two given consecutive domains in a feature space, then apply gradual self-training to adapt the source-trained classifier to the target along the sequence of intermediate domains. Empirically, we demonstrate that our GOAT framework can improve the performance of standard GDA when the given intermediate domains are scarce, significantly broadening the real-world application scenarios of GDA. Our code is available at https://github.com/uiuctml/GOAT. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

IJCAI Conference 2024 Conference Paper

Improved Approximation of Weighted MMS Fairness for Indivisible Chores

  • Fangxiao Wang
  • Bo Li
  • Pinyan Lu

We study how to fairly allocate a set of indivisible chores among n agents who may have different weights corresponding to their involvement in completing these chores. We found that some of the existing fairness notions may place agents with lower weights at a disadvantage, which motivates us to explore weighted maximin share fairness (WMMS). While it is known that a WMMS allocation may not exist, no non-trivial approximation has been discovered thus far. In this paper, we first design a simple sequential picking algorithm that solely relies on the agents’ ordinal rankings of the items, which achieves an approximation ratio of O(log n). Then, for the case involving two agents, we improve the approximation ratio to (√3+1)/2 ≈1. 366, and prove that it is optimal. We also consider an online setting when the items arrive one after another and design an O(√n)-competitive online algorithm given the valuations are normalized

AAAI Conference 2024 Conference Paper

Labels Need Prompts Too: Mask Matching for Natural Language Understanding Tasks

  • Bo Li
  • Wei Ye
  • Quansen Wang
  • Wen Zhao
  • Shikun Zhang

Textual label names (descriptions) are typically semantically rich in many natural language understanding (NLU) tasks. In this paper, we incorporate the prompting methodology, which is widely used to enrich model input, into the label side for the first time. Specifically, we propose a Mask Matching method, which equips an input with a prompt and its label with another, and then makes predictions by matching their mask representations. We evaluate our method extensively on 8 NLU tasks with 14 datasets. The experimental results show that Mask Matching significantly outperforms its counterparts of fine-tuning and conventional prompt-tuning, setting up state-of-the-art performances in several datasets. Mask Matching is particularly good at handling NLU tasks with large label counts and informative label names. As pioneering efforts that investigate the label-side prompt, we also discuss open issues for future study.

AAAI Conference 2024 Conference Paper

Pairwise-Label-Based Deep Incremental Hashing with Simultaneous Code Expansion

  • Dayan Wu
  • Qinghang Su
  • Bo Li
  • Weiping Wang

Deep incremental hashing has become a subject of considerable interest due to its capability to learn hash codes in an incremental manner, eliminating the need to generate codes for classes that have already been learned. However, accommodating more classes requires longer hash codes, and regenerating database codes becomes inevitable when code expansion is required. In this paper, we present a unified deep hash framework that can simultaneously learn new classes and increase hash code capacity. Specifically, we design a triple-channel asymmetric framework to optimize a new CNN model with a target code length and a code projection matrix. This enables us to directly generate hash codes for new images, and efficiently generate expanded hash codes for original database images from the old ones with the learned projection matrix. Meanwhile, we propose a pairwise-label-based incremental similarity-preserving loss to optimize the new CNN model, which can incrementally preserve new similarities while maintaining the old ones. Additionally, we design a double-end quantization loss to reduce the quantization error from new and original query images. As a result, our method efficiently embeds both new and original similarities into the expanded hash codes, while keeping the original database codes unchanged. We conduct extensive experiments on three widely-used image retrieval benchmarks, demonstrating that our method can significantly reduce the time required to expand existing database codes, while maintaining state-of-the-art retrieval performance.

AAMAS Conference 2024 Conference Paper

Proportional Fairness in Obnoxious Facility Location

  • Alexander Lam
  • Haris Aziz
  • Bo Li
  • Fahimeh Ramezani
  • Toby Walsh

We consider the obnoxious facility location problem (in which agents prefer the facility location to be far from them) and propose a hierarchy of distance-based proportional fairness concepts for the problem. These fairness axioms ensure that groups of agents at the same location are guaranteed to be a distance from the facility proportional to their group size. We consider deterministic and randomized mechanisms, and compute tight bounds on the price of proportional fairness. In the deterministic setting, we show that our proportional fairness axioms are incompatible with strategyproofness, and prove asymptotically tight 𝑒𝑝𝑠𝑖𝑙𝑜𝑛-price of anarchy and stability bounds for proportionally fair welfare-optimal mechanisms. In the randomized setting, we identify proportionally fair and strategyproof mechanisms that give an expected welfare within a constant factor of the optimal welfare. Finally, we prove existence results for two extensions to our model.

IJCAI Conference 2024 Conference Paper

Public Event Scheduling with Busy Agents

  • Bo Li
  • Lijun Li
  • Minming Li
  • Ruilong Zhang

We study a public event scheduling problem, where multiple public events are scheduled to coordinate the availability of multiple agents. The availability of each agent is determined by solving a separate flexible interval job scheduling problem, where the jobs are required to be preemptively processed. The agents want to attend as many events as possible, and their agreements are considered to be the total length of time during which they can attend these events. The goal is to find a schedule for events as well as the job schedule for each agent such that the total agreement is maximized. We first show that the problem is NP-hard, and then prove that a simple greedy algorithm achieves 1/2-approximation when the whole timeline is polynomially bounded. Our method also implies a (1-1/e)-approximate algorithm for this case. Subsequently, for the general timeline case, we present an algorithmic framework that extends a 1/alpha-approximate algorithm for the one-event instance to the general case that achieves 1/(alpha+1)-approximation. Finally, we give a polynomial time algorithm that solves the one-event instance, and this implies a 1/2-approximate algorithm for the general case.

NeurIPS Conference 2024 Conference Paper

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

  • Saleh Ashkboos
  • Amirkeivan Mohtashami
  • Maximilian L. Croci
  • Bo Li
  • Pashmina Cameron
  • Martin Jaggi
  • Dan Alistarh
  • Torsten Hoefler

We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot rotates LLMs in a way that removes outliers from the hidden state without changing the output, making quantization easier. This computational invariance is applied to the hidden state (residual) of the LLM, as well as to the activations of the feed-forward components, aspects of the attention mechanism, and to the KV cache. The result is a quantized model where all matrix multiplications are performed in 4 bits, without any channels identified for retention in higher precision. Our 4-bit quantized LLAMA2-70B model has losses of at most 0. 47 WikiText-2 perplexity and retains 99% of the zero-shot performance. We also show that QuaRot can provide lossless 6 and 8 bit LLAMA-2 models without any calibration data using round-to-nearest quantization. Code is available at github. com/spcl/QuaRot.

NeurIPS Conference 2024 Conference Paper

RedCode: Risky Code Execution and Generation Benchmark for Code Agents

  • Chengquan Guo
  • Xun Liu
  • Chulin Xie
  • Andy Zhou
  • Yi Zeng
  • Zinan Lin
  • Dawn Song
  • Bo Li

With the rapidly increasing capabilities and adoption of code agents for AI-assisted coding and software development, safety and security concerns, such as generating or executing malicious code, have become significant barriers to the real-world deployment of these agents. To provide comprehensive and practical evaluations on the safety of code agents, we propose RedCode, an evaluation platform with benchmarks grounded in four key principles: real interaction with systems, holistic evaluation of unsafe code generation and execution, diverse input formats, and high-quality safety scenarios and tests. RedCode consists of two parts to evaluate agents’ safety in unsafe code execution and generation: (1) RedCode-Exec provides challenging code prompts in Python as inputs, aiming to evaluate code agents’ ability to recognize and handle unsafe code. We then map the Python code to other programming languages (e. g. , Bash) and natural text summaries or descriptions for evaluation, leading to a total of over 4, 000 testing instances. We provide 25 types of critical vulnerabilities spanning various domains, such as websites, file systems, and operating systems. We provide a Docker sandbox environment to evaluate the execution capabilities of code agents and design corresponding evaluation metrics to assess their execution results. (2) RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions to generate harmful code or software. Our empirical findings, derived from evaluating three agent frameworks based on 19 LLMs, provide insights into code agents’ vulnerabilities. For instance, evaluations on RedCode-Exec show that agents are more likely to reject executing unsafe operations on the operating system, but are less likely to reject executing technically buggy code, indicating high risks. Unsafe operations described in natural text lead to a lower rejection rate than those in code format. Additionally, evaluations on RedCode-Gen reveal that more capable base models and agents with stronger overall coding abilities, such as GPT4, tend to produce more sophisticated and effective harmful software. Our findings highlight the need for stringent safety evaluations for diverse code agents. Our dataset and code are publicly available at https: //github. com/AI-secure/RedCode.

ICRA Conference 2024 Conference Paper

Rethinking Imitation-based Planners for Autonomous Driving

  • Jie Cheng 0008
  • Yingbing Chen
  • Xiaodong Mei 0001
  • Bowen Yang
  • Bo Li
  • Ming Liu 0001

In recent years, imitation-based driving planners have reported considerable success. However, due to the absence of a standardized benchmark, the effectiveness of various designs remains unclear. The newly released nuPlan addresses this issue by offering a large-scale real-world dataset and a standardized closed-loop benchmark for equitable comparisons. Utilizing this platform, we conduct a comprehensive study on two fundamental yet underexplored aspects of imitation-based planners: the essential features for ego planning and the effective data augmentation techniques to reduce compounding errors. Furthermore, we highlight an imitation gap that has been overlooked by current learning systems. Finally, integrating our findings, we propose a strong baseline model—PlanTF. Our results demonstrate that a well-designed, purely imitation-based planner can achieve highly competitive performance compared to state-of-the-art methods involving hand-crafted rules and exhibit superior generalization capabilities in long-tail cases. Our models and benchmarks are publicly available. Project website https://jchengai.github.io/planTF.

NeurIPS Conference 2024 Conference Paper

Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

  • Andy Zhou
  • Bo Li
  • Haohan Wang

Despite advances in AI alignment, large language models (LLMs) remain vulnerable to adversarial attacks or jailbreaking, in which adversaries can modify prompts to induce unwanted behavior. While some defenses have been proposed, they have not been adapted to newly proposed attacks and more challenging threat models. To address this, we propose an optimization-based objective for defending LLMs against jailbreaking attacks and an algorithm, Robust Prompt Optimization (RPO), to create robust system-level defenses. Our approach directly incorporates the adversary into the defensive objective and optimizes a lightweight and transferable suffix, enabling RPO to adapt to worst-case adaptive attacks. Our theoretical and experimental results show improved robustness to both jailbreaks seen during optimization and unknown jailbreaks, reducing the attack success rate (ASR) on GPT-4 to 6% and Llama-2 to 0% on JailbreakBench, setting the state-of-the-art.

ECAI Conference 2024 Conference Paper

Sinogram-Image Dual-Domain Network for Robust Metal Artifact Reduction in CT Image

  • Chong Liu
  • Yuhan Huang
  • Bo Li
  • Hui Ding

Computed tomography (CT) utilizes X-ray technology for internal body imaging. However, the presence of metal objects often results in artifacts due to their significant absorption and scattering of X-rays, thus obstructing lesion diagnosis, especially in the presence of multiple metals. Existing artifact reduction methods often suffer from deficiencies in completeness and preservation of fine detail. To address this limitation, we propose a novel sinogram and image dual-domain network. Specifically, in the sinogram domain, two enhancement modules are designed: one for extracting information from regions affected by metal traces, and the other for learning to restore the sinogram corresponding to these metal traces. Subsequently, utilizing filtered back projection (FBP), artifact removal images are reconstructed in the image domain. Quantitative and qualitative analyses of synthetic images show our framework’s superiority over conventional Metal Artifact Reduction (MAR) methods in both synthetic and clinical settings.

TMLR Journal 2024 Journal Article

Synthetic data shuffling accelerates the convergence of federated learning under data heterogeneity

  • Bo Li
  • Yasin Esfandiari
  • Mikkel N. Schmidt
  • Tommy Sonne Alstrøm
  • Sebastian U Stich

In federated learning, data heterogeneity is a critical challenge. A straightforward solution is to shuffle the clients' data to homogenize the distribution. However, this may violate data access rights, and how and when shuffling can accelerate the convergence of a federated optimization algorithm is not theoretically well understood. In this paper, we establish a precise and quantifiable correspondence between data heterogeneity and parameters in the convergence rate when a fraction of data is shuffled across clients. We discuss that shuffling can, in some cases, quadratically reduce the gradient dissimilarity with respect to the shuffling percentage, accelerating convergence. Inspired by the theory, we propose a practical approach that addresses the data access rights issue by shuffling locally generated synthetic data. The experimental results show that shuffling synthetic data improves the performance of multiple existing federated learning algorithms by a large margin.

NeurIPS Conference 2024 Conference Paper

The Limits of Differential Privacy in Online Learning

  • Bo Li
  • Wei Wang
  • Peng Ye

Differential privacy (DP) is a formal notion that restricts the privacy leakage of an algorithm when running on sensitive data, in which privacy-utility trade-off is one of the central problems in private data analysis. In this work, we investigate the fundamental limits of differential privacy in online learning algorithms and present evidence that separates three types of constraints: no DP, pure DP, and approximate DP. We first describe a hypothesis class that is online learnable under approximate DP but not online learnable under pure DP under the adaptive adversarial setting. This indicates that approximate DP must be adopted when dealing with adaptive adversaries. We then prove that any private online learner must make an infinite number of mistakes for almost all hypothesis classes. This essentially generalizes previous results and shows a strong separation between private and non-private settings since a finite mistake bound is always attainable (as long as the class is online learnable) when there is no privacy requirement.

NeurIPS Conference 2024 Conference Paper

Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation

  • Ruihao Xia
  • Yu Liang
  • Peng-tao Jiang
  • Hao Zhang
  • Bo Li
  • Yang Tang
  • Pan Zhou

Despite their success, unsupervised domain adaptation methods for semantic segmentation primarily focus on adaptation between image domains and do not utilize other abundant visual modalities like depth, infrared and event. This limitation hinders their performance and restricts their application in real-world multimodal scenarios. To address this issue, we propose Modality Adaptation with text-to-image Diffusion Models (MADM) for semantic segmentation task which utilizes text-to-image diffusion models pre-trained on extensive image-text pairs to enhance the model's cross-modality capabilities. Specifically, MADM comprises two key complementary components to tackle major challenges. First, due to the large modality gap, using one modal data to generate pseudo labels for another modality suffers from a significant drop in accuracy. To address this, MADM designs diffusion-based pseudo-label generation which adds latent noise to stabilize pseudo-labels and enhance label accuracy. Second, to overcome the limitations of latent low-resolution features in diffusion models, MADM introduces the label palette and latent regression which converts one-hot encoded labels into the RGB form by palette and regresses them in the latent space, thus ensuring the pre-trained decoder for up-sampling to obtain fine-grained features. Extensive experimental results demonstrate that MADM achieves state-of-the-art adaptation performance across various modality tasks, including images to depth, infrared, and event modalities. We open-source our code and models at https: //github. com/XiaRho/MADM.

NeurIPS Conference 2024 Conference Paper

Validating Climate Models with Spherical Convolutional Wasserstein Distance

  • Robert C. Garrett
  • Trevor Harris
  • Zhuo Wang
  • Bo Li

The validation of global climate models is crucial to ensure the accuracy and efficacy of model output. We introduce the spherical convolutional Wasserstein distance to more comprehensively measure differences between climate models and reanalysis data. This new similarity measure accounts for spatial variability using convolutional projections and quantifies local differences in the distribution of climate variables. We apply this method to evaluate the historical model outputs of the Coupled Model Intercomparison Project (CMIP) members by comparing them to observational and reanalysis data products. Additionally, we investigate the progression from CMIP phase 5 to phase 6 and find modest improvements in the phase 6 models regarding their ability to produce realistic climatologies.

NeurIPS Conference 2023 Conference Paper

An Inductive Bias for Tabular Deep Learning

  • Ege Beyazit
  • Jonathan Kozaczuk
  • Bo Li
  • Vanessa Wallace
  • Bilal Fadlallah

Deep learning methods have achieved state-of-the-art performance in most modeling tasks involving images, text and audio, however, they typically underperform tree-based methods on tabular data. In this paper, we hypothesize that a significant contributor to this performance gap is the interaction between irregular target functions resulting from the heterogeneous nature of tabular feature spaces, and the well-known tendency of neural networks to learn smooth functions. Utilizing tools from spectral analysis, we show that functions described by tabular datasets often have high irregularity, and that they can be smoothed by transformations such as scaling and ranking in order to improve performance. However, because these transformations tend to lose information or negatively impact the loss landscape during optimization, they need to be rigorously fine-tuned for each feature to achieve performance gains. To address these problems, we propose introducing frequency reduction as an inductive bias. We realize this bias as a neural network layer that promotes learning low-frequency representations of the input features, allowing the network to operate in a space where the target function is more regular. Our proposed method introduces less computational complexity than a fully connected layer, while significantly improving neural network performance, and speeding up its convergence on 14 tabular datasets.

AAMAS Conference 2023 Conference Paper

Approximation Algorithm for Computing Budget-Feasible EF1 Allocations

  • Jiarui Gan
  • Bo Li
  • Xiaowei Wu

We study algorithmic fairness in a budget-feasible resource allocation problem. In this problem, a set of items with varied sizes and values are to be allocated to a group of agents, while each agent has a budget constraint on the total size of items she can receive. An envy-free (EF) allocation is defined in this context as one in which no agent envies another for the items they get and, in addition, no agent envies the charity, who is automatically endowed with all the unallocated items. Since EF allocations barely exist even without budget constraints, we are interested in the relaxed notion of envy-freeness up to one item (EF1). In this paper, we further the recent progress towards understanding the existence and approximations of EF1 (or EF2) allocations. We propose a polynomial-time algorithm that computes a 1/2-approximate EF1 allocation for an arbitrary number of agents with heterogeneous budgets. For the uniform-budget and two-agent cases, we present a polynomial-time algorithm that computes an exact EF1 allocation. We also consider the large budget setting, where the item sizes are infinitesimal relative to the agents’ budgets. We show that both the allocations that maximize the Nash social welfare and the allocations that our main algorithm computes are EF1 in the limit.

AAAI Conference 2023 Conference Paper

Attack Can Benefit: An Adversarial Approach to Recognizing Facial Expressions under Noisy Annotations

  • Jiawen Zheng
  • Bo Li
  • Shengchuan Zhang
  • Shuang Wu
  • Liujuan Cao
  • Shouhong Ding

The real-world Facial Expression Recognition (FER) datasets usually exhibit complex scenarios with coupled noise annotations and imbalanced classes distribution, which undoubtedly impede the development of FER methods. To address the aforementioned issues, in this paper, we propose a novel and flexible method to spot noisy labels by leveraging adversarial attack, termed as Geometry Aware Adversarial Vulnerability Estimation (GAAVE). Different from existing state-of-the-art methods of noisy label learning (NLL), our method has no reliance on additional information and is thus easy to generalize to the large-scale real-world FER datasets. Besides, the combination of Dataset Splitting module and Subset Refactoring module mitigates the impact of class imbalance, and the Self-Annotator module facilitates the sufficient use of all training data. Extensive experiments on RAF-DB, FERPlus, AffectNet, and CIFAR-10 datasets validate the effectiveness of our method. The stabilized enhancement based on different methods demonstrates the flexibility of our proposed GAAVE.

TMLR Journal 2023 Journal Article

Can Pruning Improve Certified Robustness of Neural Networks?

  • Zhangheng Li
  • Tianlong Chen
  • Linyi Li
  • Bo Li
  • Zhangyang Wang

With the rapid development of deep learning, the sizes of deep neural networks are getting larger beyond the affordability of hardware platforms. Given the fact that neural networks are often over-parameterized, one effective way to reduce such computational overhead is neural network pruning, by removing redundant parameters from trained neural networks. It has been recently observed that pruning can not only reduce computational overhead but also can improve empirical robustness of deep neural networks (NNs), potentially owing to removing spurious correlations while preserving the predictive accuracies. This paper for the first time demonstrates that pruning can generally improve $L_\infty$ certified robustness for ReLU-based NNs under the \textit{complete verification} setting. Using the popular Branch-and-Bound (BaB) framework, we find that pruning can enhance the estimated bound tightness of certified robustness verification, by alleviating linear relaxation and sub-domain split problems. We empirically verify our findings with off-the-shelf pruning methods and further present a new stability-based pruning method tailored for reducing neuron instability, that outperforms existing pruning methods in enhancing certified robustness. Our experiments show that by appropriately pruning an NN, its certified accuracy can be boosted up to \textbf{8.2\%} under standard training, and up to \textbf{24.5\%} under adversarial training on the CIFAR10 dataset. We additionally observe the possible existence of {\it certified lottery tickets} in our experiments that can match both standard and certified robust accuracies of the original dense models across different datasets. Our findings offer a new angle to study the intriguing interaction between sparsity and robustness, i.e. interpreting the interaction of sparsity and certified robustness via neuron stability. Codes will be fully released.

NeurIPS Conference 2023 Conference Paper

CBD: A Certified Backdoor Detector Based on Local Dominant Probability

  • Zhen Xiang
  • Zidi Xiong
  • Bo Li

Backdoor attack is a common threat to deep neural networks. During testing, samples embedded with a backdoor trigger will be misclassified as an adversarial target by a backdoored model, while samples without the backdoor trigger will be correctly classified. In this paper, we present the first certified backdoor detector (CBD), which is based on a novel, adjustable conformal prediction scheme based on our proposed statistic local dominant probability. For any classifier under inspection, CBD provides 1) a detection inference, 2) the condition under which the attacks are guaranteed to be detectable for the same classification domain, and 3) a probabilistic upper bound for the false positive rate. Our theoretical results show that attacks with triggers that are more resilient to test-time noise and have smaller perturbation magnitudes are more likely to be detected with guarantees. Moreover, we conduct extensive experiments on four benchmark datasets considering various backdoor types, such as BadNet, CB, and Blend. CBD achieves comparable or even higher detection accuracy than state-of-the-art detectors, and it in addition provides detection certification. Notably, for backdoor attacks with random perturbation triggers bounded by $\ell_2\leq0. 75$ which achieves more than 90\% attack success rate, CBD achieves 100\% (98\%), 100\% (84\%), 98\% (98\%), and 72\% (40\%) empirical (certified) detection true positive rates on the four benchmark datasets GTSRB, SVHN, CIFAR-10, and TinyImageNet, respectively, with low false positive rates.

NeurIPS Conference 2023 Conference Paper

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

  • Tao Lei
  • Junwen Bai
  • Siddhartha Brahma
  • Joshua Ainslie
  • Kenton Lee
  • Yanqi Zhou
  • Nan Du
  • Vincent Zhao

We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency. CoDA generalizes beyond standard adapter approaches to enable a new way of balancing speed and accuracy using conditional computation. Starting with an existing dense pretrained model, CoDA adds sparse activation together with a small number of new parameters and a light-weight training phase. Our experiments demonstrate that the CoDA approach provides an unexpectedly efficient way to transfer knowledge. Across a variety of language, vision, and speech tasks, CoDA achieves a 2x to 8x inference speed-up compared to the state-of-the-art Adapter approaches with moderate to no accuracy loss and the same parameter efficiency.

NeurIPS Conference 2023 Conference Paper

Content-based Unrestricted Adversarial Attack

  • Zhaoyu Chen
  • Bo Li
  • Shuang Wu
  • Kaixun Jiang
  • Shouhong Ding
  • Wenqiang Zhang

Unrestricted adversarial attacks typically manipulate the semantic content of an image (e. g. , color or texture) to create adversarial examples that are both effective and photorealistic, demonstrating their ability to deceive human perception and deep neural networks with stealth and success. However, current works usually sacrifice unrestricted degrees and subjectively select some image content to guarantee the photorealism of unrestricted adversarial examples, which limits its attack performance. To ensure the photorealism of adversarial examples and boost attack performance, we propose a novel unrestricted attack framework called Content-based Unrestricted Adversarial Attack. By leveraging a low-dimensional manifold that represents natural images, we map the images onto the manifold and optimize them along its adversarial direction. Therefore, within this framework, we implement Adversarial Content Attack (ACA) based on Stable Diffusion and can generate high transferable unrestricted adversarial examples with various adversarial contents. Extensive experimentation and visualization demonstrate the efficacy of ACA, particularly in surpassing state-of-the-art attacks by an average of 13. 3-50. 4\% and 16. 8-48. 0\% in normally trained models and defense methods, respectively.

NeurIPS Conference 2023 Conference Paper

Continuous Parametric Optical Flow

  • Jianqin Luo
  • Zhexiong Wan
  • Yuxin Mao
  • Bo Li
  • Yuchao Dai

In this paper, we present continuous parametric optical flow, a parametric representation of dense and continuous motion over arbitrary time interval. In contrast to existing discrete-time representations (i. e. , flow in between consecutive frames), this new representation transforms the frame-to-frame pixel correspondences to dense continuous flow. In particular, we present a temporal-parametric model that employs B-splines to fit point trajectories using a limited number of frames. To further improve the stability and robustness of the trajectories, we also add an encoder with a neural ordinary differential equation (NODE) to represent features associated with specific times. We also contribute a synthetic dataset and introduce two evaluation perspectives to measure the accuracy and robustness of continuous flow estimation. Benefiting from the combination of explicit parametric modeling and implicit feature optimization, our model focuses on motion continuity and outperforms the flow-based and point-tracking approaches for fitting long-term and variable sequences.

NeurIPS Conference 2023 Conference Paper

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

  • Boxin Wang
  • Weixin Chen
  • Hengzhi Pei
  • Chulin Xie
  • Mintong Kang
  • Chenhui Zhang
  • Chejian Xu
  • Zidi Xiong

Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications to healthcare and finance – where mistakes can be costly. To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3. 5, considering diverse perspectives – including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. Based on our evaluations, we discover previously unpublished vulnerabilities to trustworthiness threats. For instance, we find that GPT models can be easily misled to generate toxic and biased outputs and leak private information in both training data and conversation history. We also find that although GPT-4 is usually more trustworthy than GPT-3. 5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, potentially due to the reason that GPT-4 follows the (misleading) instructions more precisely. Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps. Our benchmark is publicly available at https: //decodingtrust. github. io/.

AAAI Conference 2023 Conference Paper

Delving into the Adversarial Robustness of Federated Learning

  • Jie Zhang
  • Bo Li
  • Chen Chen
  • Lingjuan Lyu
  • Shuang Wu
  • Shouhong Ding
  • Chao Wu

In Federated Learning (FL), models are as fragile as centrally trained models against adversarial examples. However, the adversarial robustness of federated learning remains largely unexplored. This paper casts light on the challenge of adversarial robustness of federated learning. To facilitate a better understanding of the adversarial vulnerability of the existing FL methods, we conduct comprehensive robustness evaluations on various attacks and adversarial training methods. Moreover, we reveal the negative impacts induced by directly adopting adversarial training in FL, which seriously hurts the test accuracy, especially in non-IID settings. In this work, we propose a novel algorithm called Decision Boundary based Federated Adversarial Training (DBFAT), which consists of two components (local re-weighting and global regularization) to improve both accuracy and robustness of FL systems. Extensive experiments on multiple datasets demonstrate that DBFAT consistently outperforms other baselines under both IID and non-IID settings.

NeurIPS Conference 2023 Conference Paper

DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification

  • Mintong Kang
  • Dawn Song
  • Bo Li

Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples and achieve state-of-the-art robustness. Recent studies show that even advanced attacks cannot break such defenses effectively, since the purification process induces an extremely deep computational graph which poses the potential problem of gradient obfuscation, high memory cost, and unbounded randomness. In this paper, we propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses, including both DDPM and score-based approaches. In particular, we propose a deviated-reconstruction loss at intermediate diffusion steps to induce inaccurate density gradient estimation to tackle the problem of vanishing/exploding gradients. We also provide a segment-wise forwarding-backwarding algorithm, which leads to memory-efficient gradient backpropagation. We validate the attack effectiveness of DiffAttack compared with existing adaptive attacks on CIFAR-10 and ImageNet. We show that DiffAttack decreases the robust accuracy of models compared with SOTA attacks by over 20\% on CIFAR-10 under $\ell_\infty$ attack $(\epsilon=8/255)$, and over 10\% on ImageNet under $\ell_\infty$ attack $(\epsilon=4/255)$. We conduct a series of ablations studies, and we find 1) DiffAttack with the deviated-reconstruction loss added over uniformly sampled time steps is more effective than that added over only initial/final steps, and 2) diffusion-based purification with a moderate diffusion length is more robust under DiffAttack.

NeurIPS Conference 2023 Conference Paper

Domain Watermark: Effective and Harmless Dataset Copyright Protection is Closed at Hand

  • Junfeng Guo
  • Yiming Li
  • Lixu Wang
  • Shu-Tao Xia
  • Heng Huang
  • Cong Liu
  • Bo Li

The prosperity of deep neural networks (DNNs) is largely benefited from open-source datasets, based on which users can evaluate and improve their methods. In this paper, we revisit backdoor-based dataset ownership verification (DOV), which is currently the only feasible approach to protect the copyright of open-source datasets. We reveal that these methods are fundamentally harmful given that they could introduce malicious misclassification behaviors to watermarked DNNs by the adversaries. In this paper, we design DOV from another perspective by making watermarked models (trained on the protected dataset) correctly classify some `hard' samples that will be misclassified by the benign model. Our method is inspired by the generalization property of DNNs, where we find a \emph{hardly-generalized domain} for the original dataset (as its \emph{domain watermark}). It can be easily learned with the protected dataset containing modified samples. Specifically, we formulate the domain generation as a bi-level optimization and propose to optimize a set of visually-indistinguishable clean-label modified data with similar effects to domain-watermarked samples from the hardly-generalized domain to ensure watermark stealthiness. We also design a hypothesis-test-guided ownership verification via our domain watermark and provide the theoretical analyses of our method. Extensive experiments on three benchmark datasets are conducted, which verify the effectiveness of our method and its resistance to potential adaptive methods.

JBHI Journal 2023 Journal Article

Domain-Aware Dual Attention for Generalized Medical Image Segmentation on Unseen Domains

  • Huilin Lai
  • Ye Luo
  • Bo Li
  • Guokai Zhang
  • Jianwei Lu

Recently, there has been significant progress in medical image segmentation utilizing deep learning techniques. However, these achievements largely rely on the supposition that the source and target domain data are identically distributed, and the direct application of related methods without addressing the distribution shift results in dramatic degradation in realistic clinical environments. Current approaches concerning the distribution shift either require the target domain data in advance for adaptation, or focus only on the distribution shift across domains while ignoring the intra-domain data variation. This paper proposes a domain-aware dual attention network for the generalized medical image segmentation task on unseen target domains. To alleviate the severe distribution shift between the source and target domains, an Extrinsic Attention (EA) module is designed to learn image features with knowledge originating from multi-source domains. Moreover, an Intrinsic Attention (IA) module is also proposed to handle the intra-domain variation by individually modeling the pixel-region relations derived from an image. The EA and IA modules complement each other well in terms of modeling the extrinsic and intrinsic domain relationships, respectively. To validate the model effectiveness, comprehensive experiments are conducted on various benchmark datasets, including the prostate segmentation in magnetic resonance imaging (MRI) scans and the optic cup/disc segmentation in fundus images. The experimental results demonstrate that our proposed model effectively generalizes to unseen domains and exceeds the existing advanced approaches.

NeurIPS Conference 2023 Conference Paper

Fair Allocation of Indivisible Chores: Beyond Additive Costs

  • Bo Li
  • Fangxiao Wang
  • Yu Zhou

We study the maximin share (MMS) fair allocation of $m$ indivisible tasks to $n$ agents who have costs for completing the assigned tasks. It is known that exact MMS fairness cannot be guaranteed, and so far the best-known approximation for additive cost functions is $\frac{13}{11}$ by Huang and Segal-Halevi [EC, 2023]; however, beyond additivity, very little is known. In this work, we first prove that no algorithm can ensure better than $\min\{n, \frac{\log m}{\log \log m}\}$-approximation if the cost functions are submodular. This result also shows a sharp contrast with the allocation of goods where constant approximations exist as shown by Barman and Krishnamurthy [TEAC, 2020] and Ghodsi et al. [AIJ, 2022]. We then prove that for subadditive costs, there always exists an allocation that is $\min\{n, \lceil\log m\rceil\}$-approximation, and thus the approximation ratio is asymptotically tight. Besides multiplicative approximation, we also consider the ordinal relaxation, 1-out-of-$d$ MMS, which was recently proposed by Hosseini et al. [JAIR and AAMAS, 2022]. Our impossibility result implies that for any $d\ge 2$, a 1-out-of-$d$ MMS allocation may not exist. Due to these hardness results for general subadditive costs, we turn to studying two specific subadditive costs, namely, bin packing and job scheduling. For both settings, we show that constant approximate allocations exist for both multiplicative and ordinal relaxations of MMS.

NeurIPS Conference 2023 Conference Paper

FedGame: A Game-Theoretic Defense against Backdoor Attacks in Federated Learning

  • Jinyuan Jia
  • Zhuowen Yuan
  • Dinuka Sahabandu
  • Luyao Niu
  • Arezoo Rajabi
  • Bhaskar Ramasubramanian
  • Bo Li
  • Radha Poovendran

Federated learning (FL) provides a distributed training paradigm where multiple clients can jointly train a global model without sharing their local data. However, recent studies have shown that FL offers an additional surface for backdoor attacks. For instance, an attacker can compromise a subset of clients and thus corrupt the global model to misclassify an input with a backdoor trigger as the adversarial target. Existing defenses for FL against backdoor attacks usually detect and exclude the corrupted information from the compromised clients based on a static attacker model. However, such defenses are inadequate against dynamic attackers who strategically adapt their attack strategies. To bridge this gap, we model the strategic interactions between the defender and dynamic attackers as a minimax game. Based on the analysis of the game, we design an interactive defense mechanism FedGame. We prove that under mild assumptions, the global model trained with FedGame under backdoor attacks is close to that trained without attacks. Empirically, we compare FedGame with multiple state-of-the-art baselines on several benchmark datasets under various attacks. We show that FedGame can effectively defend against strategic attackers and achieves significantly higher robustness than baselines. Our code is available at: https: //github. com/AI-secure/FedGame.

NeurIPS Conference 2023 Conference Paper

IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI

  • Bochuan Cao
  • Changjiang Li
  • Ting Wang
  • Jinyuan Jia
  • Bo Li
  • Jinghui Chen

Diffusion-based image generation models, such as Stable Diffusion or DALL·E 2, are able to learn from given images and generate high-quality samples following the guidance from prompts. For instance, they can be used to create artistic images that mimic the style of an artist based on his/her original artworks or to maliciously edit the original images for fake content. However, such ability also brings serious ethical issues without proper authorization from the owner of the original images. In response, several attempts have been made to protect the original images from such unauthorized data usage by adding imperceptible perturbations, which are designed to mislead the diffusion model and make it unable to properly generate new samples. In this work, we introduce a perturbation purification platform, named IMPRESS, to evaluate the effectiveness of imperceptible perturbations as a protective measure. IMPRESS is based on the key observation that imperceptible perturbations could lead to a perceptible inconsistency between the original image and the diffusion-reconstructed image, which can be used to devise a new optimization strategy for purifying the image, which may weaken the protection of the original image from unauthorized data usage (e. g. , style mimicking, malicious editing). The proposed IMPRESS platform offers a comprehensive evaluation of several contemporary protection methods, and can be used as an evaluation platform for future protection methods.

NeurIPS Conference 2023 Conference Paper

Incentives in Federated Learning: Equilibria, Dynamics, and Mechanisms for Welfare Maximization

  • Aniket Murhekar
  • Zhuowen Yuan
  • Bhaskar Ray Chaudhury
  • Bo Li
  • Ruta Mehta

Federated learning (FL) has emerged as a powerful scheme to facilitate the collaborative learning of models amongst a set of agents holding their own private data. Although the agents benefit from the global model trained on shared data, by participating in federated learning, they may also incur costs (related to privacy and communication) due to data sharing. In this paper, we model a collaborative FL framework, where every agent attempts to achieve an optimal trade-off between her learning payoff and data sharing cost. We show the existence of Nash equilibrium (NE) under mild assumptions on agents' payoff and costs. Furthermore, we show that agents can discover the NE via best response dynamics. However, some of the NE may be bad in terms of overall welfare for the agents, implying little incentive for some fraction of the agents to participate in the learning. To remedy this, we design a budget-balanced mechanism involving payments to the agents, that ensures that any $p$-mean welfare function of the agents' utilities is maximized at NE. In addition, we introduce a FL protocol FedBR-BG that incorporates our budget-balanced mechanism, utilizing best response dynamics. Our empirical validation on MNIST and CIFAR-10 substantiates our theoretical analysis. We show that FedBR-BG outperforms the basic best-response-based protocol without additional incentivization, the standard federated learning protocol FedAvg, as well as a recent baseline MWFed in terms of achieving superior $p$-mean welfare.

JBHI Journal 2023 Journal Article

Interpretable Inference and Classification of Tissue Types in Histological Colorectal Cancer Slides Based on Ensembles Adaptive Boosting Prototype Tree

  • Meiyan Liang
  • Ru Wang
  • Jianan Liang
  • Lin Wang
  • Bo Li
  • Xiaojun Jia
  • Yu Zhang
  • Qinghui Chen

Digital pathology images are treated as the “gold standard” for the diagnosis of colorectal lesions, especially colon cancer. Real-time, objective and accurate inspection results will assist clinicians to choose symptomatic treatment in a timely manner, which is of great significance in clinical medicine. However, Manual methods suffers from long inspection cycle and serious reliance on subjective interpretation. It is also a challenging task for existing computer-aided diagnosis methods to obtain models that are both accurate and interpretable. Models that exhibit high accuracy are always more complex and opaque, while interpretable models may lack the necessary accuracy. Therefore, the framework of ensemble adaptive boosting prototype tree is proposed to predict the colorectal pathology images and provide interpretable inference by visualizing the decision-making process in each base learner. The results showed that the proposed method could effectively address the “accuracy-interpretability trade-off” issue by ensemble of m adaptive boosting neural prototype trees. The superior performance of the framework provides a novel paradigm for interpretable inference and high-precision prediction of pathology image patches in computational pathology.

NeurIPS Conference 2023 Conference Paper

Large Language Models are Visual Reasoning Coordinators

  • Liangyu Chen
  • Bo Li
  • Sheng Shen
  • Jingkang Yang
  • Chunyuan Li
  • Kurt Keutzer
  • Trevor Darrell
  • Ziwei Liu

Visual reasoning requires multimodal perception and commonsense cognition of the world. Recently, multiple vision-language models (VLMs) have been proposed with excellent commonsense reasoning ability in various domains. However, how to harness the collective power of these complementary VLMs is rarely explored. Existing methods like ensemble still struggle to aggregate these models with the desired higher-order communications. In this work, we propose Cola, a novel paradigm that coordinates multiple VLMs for visual reasoning. Our key insight is that a large language model (LLM) can efficiently coordinate multiple VLMs by facilitating natural language communication that leverages their distinct and complementary capabilities. Extensive experiments demonstrate that our instruction tuning variant, Cola-FT, achieves state-of-the-art performance on visual question answering (VQA), outside knowledge VQA, visual entailment, and visual spatial reasoning tasks. Moreover, we show that our in-context learning variant, Cola-Zero, exhibits competitive performance in zero and few-shot settings, without finetuning. Through systematic ablation studies and visualizations, we validate that a coordinator LLM indeed comprehends the instruction prompts as well as the separate functionalities of VLMs; it then coordinates them to enable impressive visual reasoning capabilities.

AAAI Conference 2023 Conference Paper

Learning Instrumental Variable from Data Fusion for Treatment Effect Estimation

  • Anpeng Wu
  • Kun Kuang
  • Ruoxuan Xiong
  • Minqin Zhu
  • Yuxuan Liu
  • Bo Li
  • Furui Liu
  • Zhihua Wang

The advent of the big data era brought new opportunities and challenges to draw treatment effect in data fusion, that is, a mixed dataset collected from multiple sources (each source with an independent treatment assignment mechanism). Due to possibly omitted source labels and unmeasured confounders, traditional methods cannot estimate individual treatment assignment probability and infer treatment effect effectively. Therefore, we propose to reconstruct the source label and model it as a Group Instrumental Variable (GIV) to implement IV-based Regression for treatment effect estimation. In this paper, we conceptualize this line of thought and develop a unified framework (Meta-EM) to (1) map the raw data into a representation space to construct Linear Mixed Models for the assigned treatment variable; (2) estimate the distribution differences and model the GIV for the different treatment assignment mechanisms; and (3) adopt an alternating training strategy to iteratively optimize the representations and the joint distribution to model GIV for IV regression. Empirical results demonstrate the advantages of our Meta-EM compared with state-of-the-art methods. The project page with the code and the Supplementary materials is available at https://github.com/causal-machine-learning-lab/meta-em.

AAAI Conference 2023 Conference Paper

Logic and Commonsense-Guided Temporal Knowledge Graph Completion

  • Guanglin Niu
  • Bo Li

A temporal knowledge graph (TKG) stores the events derived from the data involving time. Predicting events is extremely challenging due to the time-sensitive property of events. Besides, the previous TKG completion (TKGC) approaches cannot represent both the timeliness and the causality properties of events, simultaneously. To address these challenges, we propose a Logic and Commonsense-Guided Embedding model (LCGE) to jointly learn the time-sensitive representation involving timeliness and causality of events, together with the time-independent representation of events from the perspective of commonsense. Specifically, we design a temporal rule learning algorithm to construct a rule-guided predicate embedding regularization strategy for learning the causality among events. Furthermore, we could accurately evaluate the plausibility of events via auxiliary commonsense knowledge. The experimental results of TKGC task illustrate the significant performance improvements of our model compared with the existing approaches. More interestingly, our model is able to provide the explainability of the predicted results in the view of causal inference. The appendix, source code and datasets of this paper are available at https://github.com/ngl567/LCGE.

IJCAI Conference 2023 Conference Paper

Maximin-Aware Allocations of Indivisible Chores with Symmetric and Asymmetric Agents

  • Tianze Wei
  • Bo Li
  • Minming Li

The real-world deployment of fair allocation algorithms usually involves a heterogeneous population of users, which makes it challenging for the users to get complete knowledge of the allocation except for their own bundles. Recently, a new fairness notion, maximin-awareness (MMA) was proposed and it guarantees that every agent is not the worst-off one, no matter how the items that are not allocated to this agent are distributed. We adapt and generalize this notion to the case of indivisible chores and when the agents may have arbitrary weights. Due to the inherent difficulty of MMA, we also consider its up to one and up to any relaxations. A string of results on the existence and computation of MMA related fair allocations, and their connections to existing fairness concepts is given.

AAAI Conference 2023 Conference Paper

Multiagent MST Cover: Pleasing All Optimally via a Simple Voting Rule

  • Bo Li
  • Xiaowei Wu
  • Chenyang Xu
  • Ruilong Zhang

Given a connected graph on whose edges we can build roads to connect the nodes, a number of agents hold possibly different perspectives on which edges should be selected by assigning different edge weights. Our task is to build a minimum number of roads so that every agent has a spanning tree in the built subgraph whose weight is the same as a minimum spanning tree in the original graph. We first show that this problem is NP-hard and does not admit better than ((1-o(1)) ln k)-approximation polynomial-time algorithms unless P = NP, where k is the number of agents. We then give a simple voting algorithm with an optimal approximation ratio. Moreover, our algorithm only needs to access the agents' rankings on the edges. Finally, we extend our problem to submodular objective functions and Matroid rank constraints.

NeurIPS Conference 2023 Conference Paper

Optimized Covariance Design for AB Test on Social Network under Interference

  • Qianyi Chen
  • Bo Li
  • Lu Deng
  • Yong Wang

Online A/B tests have become increasingly popular and important for social platforms. However, accurately estimating the global average treatment effect (GATE) has proven to be challenging due to network interference, which violates the Stable Unit Treatment Value Assumption (SUTVA) and poses great challenge to experimental design. Existing network experimental design research was mostly based on the unbiased Horvitz-Thompson (HT) estimator with substantial data trimming to ensure unbiasedness at the price of high resultant estimation variance. In this paper, we strive to balance the bias and variance in designing randomized network experiments. Under a potential outcome model with 1-hop interference, we derive the bias and variance of the standard HT estimator and reveal their relation to the network topological structure and the covariance of the treatment assignment vector. We then propose to formulate the experimental design problem as to optimize the covariance matrix of the treatment assignment vector to achieve the bias and variance balance by minimizing the mean squared error (MSE) of the estimator. An efficient projected gradient descent algorithm is presented to the implement of the desired randomization scheme. Finally, we carry out extensive simulation studies to demonstrate the advantages of our proposed method over other existing methods in many settings, with different levels of model misspecification.

AAMAS Conference 2023 Conference Paper

Possible Fairness for Allocating Indivisible Resources

  • Haris Aziz
  • Bo Li
  • Shiji Xing
  • Yu Zhou

Fair division of indivisible resources has attracted significant attention from multi-agent systems and computational social choice. Two popular solution concepts are envy-freeness up to any item (EFX) and maximin share (MMS) fairness which are defined using agents’ cardinal preferences. On one hand, accurate cardinal values are hard to express in real-life applications, and on the other hand, with cardinal values, MMS and EFX may not be easy to satisfy. In this work, we study a new setting where agents have arbitrary ordinal preferences for the items (possibly with indifferences), and an allocation is called possible EFX (p-EFX) or possible MMS (p- MMS) if there exist cardinal preferences that are consistent with the ordinal ones so that the allocation is EFX or MMS. We first design a polynomial-time algorithm to compute an allocation that is p-EFX and p-MMS under lexicographic preferences. This result also strengthens a result of Hosseini et al. (AAAI 2021) who proved the existence of EFX and MMS allocations under strict lexicographic preferences (i. e. , the items do not have ties). Although it has been well justified that lexicographic preferences are natural and common, there are situations where they do not fit appropriately, especially when the items have similar types. Therefore, on top of p-EFX and p-MMS, we want the allocation to be balanced (i. e. , the numbers of items allocated to the agents differ by at most one). We then design another algorithm that satisfies p-EFX, p-MMS, and balanced simultaneously.

AAMAS Conference 2023 Conference Paper

Proportional Fairness in Obnoxious Facility Location

  • Haris Aziz
  • Alexander Lam
  • Bo Li
  • Fahimeh Ramezani
  • Toby Walsh

We consider the obnoxious facility location problem (in which agents prefer the facility location to be far from them) and propose a hierarchy of distance-based proportional fairness concepts for the problem. These fairness axioms ensure that groups of agents at the same location are guaranteed to be a distance from the facility proportional to their group size. We consider deterministic and randomized mechanisms, and compute tight bounds on the price of proportional fairness. In the deterministic setting, not only are our proportional fairness axioms incompatible with strategyproofness, the Nash equilibria may not guarantee welfare within a constant factor of the optimal welfare. On the other hand, in the randomized setting, we identify proportionally fair and strategyproof mechanisms that give an expected welfare within a constant factor of the optimal welfare.

AAAI Conference 2023 Conference Paper

Rethinking Disparity: A Depth Range Free Multi-View Stereo Based on Disparity

  • Qingsong Yan
  • Qiang Wang
  • Kaiyong Zhao
  • Bo Li
  • Xiaowen Chu
  • Fei Deng

Existing learning-based multi-view stereo (MVS) methods rely on the depth range to build the 3D cost volume and may fail when the range is too large or unreliable. To address this problem, we propose a disparity-based MVS method based on the epipolar disparity flow (E-flow), called DispMVS, which infers the depth information from the pixel movement between two views. The core of DispMVS is to construct a 2D cost volume on the image plane along the epipolar line between each pair (between the reference image and several source images) for pixel matching and fuse uncountable depths triangulated from each pair by multi-view geometry to ensure multi-view consistency. To be robust, DispMVS starts from a randomly initialized depth map and iteratively refines the depth map with the help of the coarse-to-fine strategy. Experiments on DTUMVS and Tanks\&Temple datasets show that DispMVS is not sensitive to the depth range and achieves state-of-the-art results with lower GPU memory.

AAAI Conference 2023 Conference Paper

Reviewing Labels: Label Graph Network with Top-k Prediction Set for Relation Extraction

  • Bo Li
  • Wei Ye
  • Jinglei Zhang
  • Shikun Zhang

The typical way for relation extraction is fine-tuning large pre-trained language models on task-specific datasets, then selecting the label with the highest probability of the output distribution as the final prediction. However, the usage of the Top-k prediction set for a given sample is commonly overlooked. In this paper, we first reveal that the Top-k prediction set of a given sample contains useful information for predicting the correct label. To effectively utilizes the Top-k prediction set, we propose Label Graph Network with Top-k Prediction Set, termed as KLG. Specifically, for a given sample, we build a label graph to review candidate labels in the Top-k prediction set and learn the connections between them. We also design a dynamic k selection mechanism to learn more powerful and discriminative relation representation. Our experiments show that KLG achieves the best performances on three relation extraction datasets. Moreover, we observe thatKLG is more effective in dealing with long-tailed classes.

AAAI Conference 2023 Conference Paper

Sequence Generation with Label Augmentation for Relation Extraction

  • Bo Li
  • Dingyao Yu
  • Wei Ye
  • Jinglei Zhang
  • Shikun Zhang

Sequence generation demonstrates promising performance in recent information extraction efforts, by incorporating large-scale pre-trained Seq2Seq models. This paper investigates the merits of employing sequence generation in relation extraction, finding that with relation names or synonyms as generation targets, their textual semantics and the correlation (in terms of word sequence pattern) among them affect model performance. We then propose Relation Extraction with Label Augmentation (RELA), a Seq2Seq model with automatic label augmentation for RE. By saying label augmentation, we mean prod semantically synonyms for each relation name as the generation target. Besides, we present an in-depth analysis of the Seq2Seq model's behavior when dealing with RE. Experimental results show that RELA achieves competitive results compared with previous methods on four RE datasets.

ICRA Conference 2023 Conference Paper

SLAMesh: Real-time LiDAR Simultaneous Localization and Meshing

  • Jianyuan Ruan
  • Bo Li
  • Yibo Wang
  • Yuxiang Sun 0002

Most current LiDAR simultaneous localization and mapping (SLAM) systems build maps in point clouds, which are sparse when zoomed in, even though they seem dense to human eyes. Dense maps are essential for robotic applications, such as map-based navigation. Due to the low memory cost, mesh has become an attractive dense model for mapping in recent years. However, existing methods usually produce mesh maps by using an offline post-processing step to generate mesh maps. This two-step pipeline does not allow these methods to use the built mesh maps online and to enable localization and meshing to benefit each other. To solve this problem, we propose the first CPU-only real-time LiDAR SLAM system that can simultaneously build a mesh map and perform localization against the mesh map. A novel and direct meshing strategy with Gaussian process reconstruction realizes the fast building, registration, and updating of mesh maps. We perform experiments on several public datasets. The results show that our SLAM system can run at around 40Hz. The localization and meshing accuracy also outperforms the state-of-the-art methods, including the TSDF map and Poisson reconstruction. Our code and video demos are available at: https://github.com/lab-sun/SLAMesh.

NeurIPS Conference 2023 Conference Paper

WordScape: a Pipeline to extract multilingual, visually rich Documents with Layout Annotations from Web Crawl Data

  • Maurice Weber
  • Carlo Siebenschuh
  • Rory Butler
  • Anton Alexandrov
  • Valdemar Thanner
  • Georgios Tsolakis
  • Haris Jabbar
  • Ian Foster

We introduce WordScape, a novel pipeline for the creation of cross-disciplinary, multilingual corpora comprising millions of pages with annotations for document layout detection. Relating visual and textual items on document pages has gained further significance with the advent of multimodal models. Various approaches proved effective for visual question answering or layout segmentation. However, the interplay of text, tables, and visuals remains challenging for a variety of document understanding tasks. In particular, many models fail to generalize well to diverse domains and new languages due to insufficient availability of training data. WordScape addresses these limitations. Our automatic annotation pipeline parses the Open XML structure of Word documents obtained from the web, jointly providing layout-annotated document images and their textual representations. In turn, WordScape offers unique properties as it (1) leverages the ubiquity of the Word file format on the internet, (2) is readily accessible through the Common Crawl web corpus, (3) is adaptive to domain-specific documents, and (4) offers culturally and linguistically diverse document pages with natural semantic structure and high-quality text. Together with the pipeline, we will additionally release 9. 5M urls to word documents which can be processed using WordScape to create a dataset of over 40M pages. Finally, we investigate the quality of text and layout annotations extracted by WordScape, assess the impact on document understanding benchmarks, and demonstrate that manual labeling costs can be substantially reduced.

JAIR Journal 2023 Journal Article

Your College Dorm and Dormmates: Fair Resource Sharing with Externalities

  • Jiarui Gan
  • Bo Li
  • Yingkai Li

We study a fair resource sharing problem, where a set of resources are to be shared among a group of agents. Each agent demands one resource and each resource can serve a limited number of agents. An agent cares about what resource they get as well as the externalities imposed by their mates, who share the same resource with them. Clearly, the strong notion of envy-freeness, where no agent envies another for their resource or mates, cannot always be achieved and we show that even deciding the existence of such a strongly envy-free assignment is an intractable problem. Hence, a more interesting question is whether (and in what situations) a relaxed notion of envy-freeness, the Pareto envyfreeness, can be achieved. Under this relaxed notion, an agent envies another only when they envy both the resource and the mates of the other agent. In particular, we are interested in a dorm assignment problem, where students are to be assigned to dorms with the same capacity and they have dichotomous preference over their dormmates. We show that when the capacity of each dorm is 2, a Pareto envy-free assignment always exists and we present a polynomial-time algorithm to compute such an assignment. Nevertheless, the result breaks immediately when the capacity increases to 3, in which case even Pareto envyfreeness cannot be guaranteed. In addition to the existential results, we also investigate the utility guarantees of (Pareto) envy-free assignments in our model.

NeurIPS Conference 2022 Conference Paper

AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies

  • Li Siyao
  • Yuhang Li
  • Bo Li
  • Chao Dong
  • Ziwei Liu
  • Chen Change Loy

Visual correspondence of 2D animation is the core of many applications and deserves careful study. Existing correspondence datasets for 2D cartoon suffer from simple frame composition and monotonic movements, making them insufficient to simulate real animations. In this work, we present a new 2D animation visual correspondence dataset, AnimeRun, by converting open source 3D movies to full scenes in 2D style, including simultaneous moving background and interactions of multiple subjects. Statistics show that our proposed dataset not only resembles real anime more in image composition, but also possesses richer and more complex motion patterns compared to existing datasets. With this dataset, we establish a comprehensive benchmark by evaluating several existing optical flow and segment matching methods, and analyze shortcomings of these methods on animation data. Data are available at https: //lisiyao21. github. io/projects/AnimeRun.

IJCAI Conference 2022 Conference Paper

Bayesian Auctions with Efficient Queries (Extended Abstract)

  • Jing Chen
  • Bo Li
  • Yingkai Li
  • Pinyan Lu

Designing dominant-strategy incentive compatible (DSIC) mechanisms for a seller to generate (approximately) optimal revenue by selling items to players is a fundamental problem in Bayesian mechanism design. However, most existing studies assume that the seller knows the entire distribution from which the players’ values are drawn. Unfortunately, this assumption may not hold in reality: for example, when the distributions have exponentially large supports or do not have succinct representations. In this work we consider, for the first time, the query complexityof Bayesian mechanisms. The seller only has limited oracle accesses to the players’ distributions, via quantile queriesand value queries. For single-item auctions, we design mechanisms with logarithmicnumber of value or quantile queries which achieve almost optimal revenue. We then prove logarithmic lower-bounds, i. e. , logarithmic number of queries are necessary for any constant approximation DSIC mechanisms, even when randomized and adaptive queries are allowed. Thus our mechanisms are almost optimal regarding query complexity. Our lower-bounds can be extended to multi-item auctions with monotone subadditive valuations, and we complement this part with constant approximation mechanisms for unit-demand or additive valuation functions. Our results are robust even if the answers to the queries contain noises.

NeurIPS Conference 2022 Conference Paper

Certifying Some Distributional Fairness with Subpopulation Decomposition

  • Mintong Kang
  • Linyi Li
  • Maurice Weber
  • Yang Liu
  • Ce Zhang
  • Bo Li

Extensive efforts have been made to understand and improve the fairness of machine learning models based on observational metrics, especially in high-stakes domains such as medical insurance, education, and hiring decisions. However, there is a lack of certified fairness considering the end-to-end performance of an ML model. In this paper, we first formulate the certified fairness of an ML model trained on a given data distribution as an optimization problem based on the model performance loss bound on a fairness constrained distribution, which is within bounded distributional distance with the training distribution. We then propose a general fairness certification framework and instantiate it for both sensitive shifting and general shifting scenarios. In particular, we propose to solve the optimization problem by decomposing the original data distribution into analytical subpopulations and proving the convexity of the subproblems to solve them. We evaluate our certified fairness on six real-world datasets and show that our certification is tight in the sensitive shifting scenario and provides non-trivial certification under general shifting. Our framework is flexible to integrate additional non-skewness constraints and we show that it provides even tighter certification under different real-world scenarios. We also compare our certified fairness bound with adapted existing distributional robustness bounds on Gaussian data and demonstrate that our method is significantly tighter.

AAMAS Conference 2022 Conference Paper

Characterizing Attacks on Deep Reinforcement Learning

  • Xinlei Pan
  • Chaowei Xiao
  • Warren He
  • Shuang Yang
  • Jian Peng
  • Mingjie Sun
  • Mingyan Liu
  • Bo Li

Recent studies show that Deep Reinforcement Learning (DRL) models are vulnerable to adversarial attacks, which attack DRL models by adding small perturbations to the observations. However, some attacks assume full availability of the victim model, and some require a huge amount of computation, making them less feasible for real world applications. In this work, we make further explorations of the vulnerabilities of DRL by studying other aspects of attacks on DRL using realistic and e�cient attacks. First, we adapt and propose e�cient black-box attacks when we do not have access to DRL model parameters. Second, to address the high computational demands of existing attacks, we introduce e�cient online sequential attacks that exploit temporal consistency across consecutive steps. Third, we explore the possibility of an attacker perturbing other aspects in the DRL setting, such as the environment dynamics. Finally, to account for imperfections in how an attacker would inject perturbations in the physical world, we devise a method for generating a robust physical perturbations to be printed. The attack is evaluated on a real-world robot under various conditions. We conduct extensive experiments both in simulation such as Atari games, robotics and autonomous driving, and on real-world robotics, to compare the e�ectiveness of the proposed attacks with baseline approaches. To the best of our knowledge, we are the�rst to apply adversarial attacks on DRL systems to physical robots.

NeurIPS Conference 2022 Conference Paper

CoPur: Certifiably Robust Collaborative Inference via Feature Purification

  • Jing Liu
  • Chulin Xie
  • Sanmi Koyejo
  • Bo Li

Collaborative inference leverages diverse features provided by different agents (e. g. , sensors) for more accurate inference. A common setup is where each agent sends its embedded features instead of the raw data to the Fusion Center (FC) for joint prediction. In this setting, we consider the inference-time attacks when a small fraction of agents are compromised. The compromised agent either does not send embedded features to the FC, or sends arbitrarily embedded features. To address this, we propose a certifiably robust COllaborative inference framework via feature PURification (CoPur), by leveraging the block-sparse nature of adversarial perturbations on the feature vector, as well as exploring the underlying redundancy across the embedded features (by assuming the overall features lie on an underlying lower dimensional manifold). We theoretically show that the proposed feature purification method can robustly recover the true feature vector, despite adversarial corruptions and/or incomplete observations. We also propose and test an untargeted distributed feature-flipping attack, which is agnostic to the model, training data, label, as well as the features held by other agents, and is shown to be effective in attacking state-of-the-art defenses. Experiments on ExtraSensory and NUS-WIDE datasets show that CoPur significantly outperforms existing defenses in terms of robustness against targeted and untargeted adversarial attacks.

NeurIPS Conference 2022 Conference Paper

DENSE: Data-Free One-Shot Federated Learning

  • Jie Zhang
  • Chen Chen
  • Bo Li
  • Lingjuan Lyu
  • Shuang Wu
  • Shouhong Ding
  • Chunhua Shen
  • Chao Wu

One-shot Federated Learning (FL) has recently emerged as a promising approach, which allows the central server to learn a model in a single communication round. Despite the low communication cost, existing one-shot FL methods are mostly impractical or face inherent limitations, \eg a public dataset is required, clients' models are homogeneous, and additional data/model information need to be uploaded. To overcome these issues, we propose a novel two-stage \textbf{D}ata-fre\textbf{E} o\textbf{N}e-\textbf{S}hot federated l\textbf{E}arning (DENSE) framework, which trains the global model by a data generation stage and a model distillation stage. DENSE is a practical one-shot FL method that can be applied in reality due to the following advantages: (1) DENSE requires no additional information compared with other methods (except the model parameters) to be transferred between clients and the server; (2) DENSE does not require any auxiliary dataset for training; (3) DENSE considers model heterogeneity in FL, \ie different clients can have different model architectures. Experiments on a variety of real-world datasets demonstrate the superiority of our method. For example, DENSE outperforms the best baseline method Fed-ADI by 5. 08\% on CIFAR10 dataset.

NeurIPS Conference 2022 Conference Paper

Distributionally Robust Optimization with Data Geometry

  • Jiashuo Liu
  • Jiayun Wu
  • Bo Li
  • Peng Cui

Distributionally Robust Optimization (DRO) serves as a robust alternative to empirical risk minimization (ERM), which optimizes the worst-case distribution in an uncertainty set typically specified by distance metrics including $f$-divergence and the Wasserstein distance. The metrics defined in the ostensible high dimensional space lead to exceedingly large uncertainty sets, resulting in the underperformance of most existing DRO methods. It has been well documented that high dimensional data approximately resides on low dimensional manifolds. In this work, to further constrain the uncertainty set, we incorporate data geometric properties into the design of distance metrics, obtaining our novel Geometric Wasserstein DRO (GDRO). Empowered by Gradient Flow, we derive a generically applicable approximate algorithm for the optimization of GDRO, and provide the bounded error rate of the approximation as well as the convergence rate of our algorithm. We also theoretically characterize the edge cases where certain existing DRO methods are the degeneracy of GDRO. Extensive experiments justify the superiority of our GDRO to existing DRO methods in multiple settings with strong distributional shifts, and confirm that the uncertainty set of GDRO adapts to data geometry.

NeurIPS Conference 2022 Conference Paper

Exact Shape Correspondence via 2D graph convolution

  • Barakeel Fanseu Kamhoua
  • Lin Zhang
  • Yongqiang Chen
  • Han Yang
  • MA KAILI
  • Bo Han
  • Bo Li
  • James Cheng

For exact 3D shape correspondence (matching or alignment), i. e. , the task of matching each point on a shape to its exact corresponding point on the other shape (or to be more specific, matching at geodesic error 0), most existing methods do not perform well due to two main problems. First, on nearly-isometric shapes (i. e. , low noise levels), most existing methods use the eigen-vectors (eigen-functions) of the Laplace Beltrami Operator (LBO) or other shape descriptors to update an initialized correspondence which is not exact, leading to an accumulation of update errors. Thus, though the final correspondence may generally be smooth, it is generally inexact. Second, on non-isometric shapes (noisy shapes), existing methods are generally not robust to noise as they usually assume near-isometry. In addition, existing methods that attempt to address the non-isometric shape problem (e. g. , GRAMPA) are generally computationally expensive and do not generalise to nearly-isometric shapes. To address these two problems, we propose a 2D graph convolution-based framework called 2D-GEM. 2D-GEM is robust to noise on non-isometric shapes and with a few additional constraints, it also addresses the errors in the update on nearly-isometric shapes. We demonstrate the effectiveness of 2D-GEM by achieving a high accuracy of 90. 5$\%$ at geodesic error 0 on the non-isometric benchmark SHREC16, i. e. , TOPKIDS (while being much faster than GRAMPA), and on nearly-isometric benchmarks by achieving a high accuracy of 92. 5$\%$ on TOSCA and 84. 9$\%$ on SCAPE at geodesic error 0.

NeurIPS Conference 2022 Conference Paper

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

  • Boxin Wang
  • Wei Ping
  • Chaowei Xiao
  • Peng Xu
  • Mostofa Patwary
  • Mohammad Shoeybi
  • Bo Li
  • Anima Anandkumar

Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models. We conduct this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we demonstrate that using self-generated datasets consistently outperforms the existing baselines across various model sizes on both automatic and human evaluations, even when it uses a 3 1 smaller training corpus. We then comprehensively study detoxifying LMs with parameter sizes ranging from 126M up to 530B (3× larger than GPT3), a scale that has never been studied before. We find that i) large LMs have similar toxicity levels as smaller ones given the same pre-training corpus, and ii) large LMs require more endeavor to unlearn the toxic content seen at pretraining. We also explore parameter-efficient training methods for detoxification. We demonstrate that adding and training adapter-only layers in LMs not only saves a lot of parameters but also achieves a better trade-off between toxicity and perplexity than whole model adaptation for large-scale models. Our code will be available at: https: //github. com/NVIDIA/Megatron-LM/.

NeurIPS Conference 2022 Conference Paper

Fairness in Federated Learning via Core-Stability

  • Bhaskar Ray Chaudhury
  • Linyi Li
  • Mintong Kang
  • Bo Li
  • Ruta Mehta

Federated learning provides an effective paradigm to jointly optimize a model benefited from rich distributed data while protecting data privacy. Nonetheless, the heterogeneity nature of distributed data, especially in the non-IID setting, makes it challenging to define and ensure fairness among local agents. For instance, it is intuitively ``unfair" for agents with data of high quality to sacrifice their performance due to other agents with low quality data. Currently popular egalitarian and weighted equity-based fairness measures suffer from the aforementioned pitfall. In this work, we aim to formally represent this problem and address these fairness issues using concepts from co-operative game theory and social choice theory. We model the task of learning a shared predictor in the federated setting as a fair public decision making problem, and then define the notion of core-stable fairness: Given $N$ agents, there is no subset of agents $S$ that can benefit significantly by forming a coalition among themselves based on their utilities $U_N$ and $U_S$ (i. e. , $ (|S|/ N) U_S \geq U_N$). Core-stable predictors are robust to low quality local data from some agents, and additionally they satisfy Proportionality (each agent gets at least $1/n$ fraction of the best utility that she can get from any predictor) and Pareto-optimality (there exists no model that can increase the utility of an agent without decreasing the utility of another), two well sought-after fairness and efficiency notions within social choice. We then propose an efficient federated learning protocol CoreFed to optimize a core stable predictor. CoreFed determines a core-stable predictor when the loss functions of the agents are convex. CoreFed also determines approximate core-stable predictors when the loss functions are not convex, like smooth neural networks. We further show the existence of core-stable predictors in more general settings using Kakutani's fixed point theorem. Finally, we empirically validate our analysis on two real-world datasets, and we show that CoreFed achieves higher core-stability fairness than FedAvg while maintaining similar accuracy.

NeurIPS Conference 2022 Conference Paper

General Cutting Planes for Bound-Propagation-Based Neural Network Verification

  • Huan Zhang
  • Shiqi Wang
  • Kaidi Xu
  • Linyi Li
  • Bo Li
  • Suman Jana
  • Cho-Jui Hsieh
  • J. Zico Kolter

Bound propagation methods, when combined with branch and bound, are among the most effective methods to formally verify properties of deep neural networks such as correctness, robustness, and safety. However, existing works cannot handle the general form of cutting plane constraints widely accepted in traditional solvers, which are crucial for strengthening verifiers with tightened convex relaxations. In this paper, we generalize the bound propagation procedure to allow the addition of arbitrary cutting plane constraints, including those involving relaxed integer variables that do not appear in existing bound propagation formulations. Our generalized bound propagation method, GCP-CROWN, opens up the opportunity to apply general cutting plane methods for neural network verification while benefiting from the efficiency and GPU acceleration of bound propagation methods. As a case study, we investigate the use of cutting planes generated by off-the-shelf mixed integer programming (MIP) solver. We find that MIP solvers can generate high-quality cutting planes for strengthening bound-propagation-based verifiers using our new formulation. Since the branching-focused bound propagation procedure and the cutting-plane-focused MIP solver can run in parallel utilizing different types of hardware (GPUs and CPUs), their combination can quickly explore a large number of branches with strong cutting planes, leading to strong verification performance. Experiments demonstrate that our method is the first verifier that can completely solve the oval20 benchmark and verify twice as many instances on the oval21 benchmark compared to the best tool in VNN-COMP 2021, and also noticeably outperforms state-of-the-art verifiers on a wide range of benchmarks. GCP-CROWN is part of the $\alpha, \beta$-CROWN verifier, the VNN-COMP 2022 winner. Code is available at http: //PaperCode. cc/GCP-CROWN.

NeurIPS Conference 2022 Conference Paper

Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning

  • Wenhao Ding
  • Haohong Lin
  • Bo Li
  • Ding Zhao

As a pivotal component to attaining generalizable solutions in human intelligence, reasoning provides great potential for reinforcement learning (RL) agents' generalization towards varied goals by summarizing part-to-whole arguments and discovering cause-and-effect relations. However, how to discover and represent causalities remains a huge gap that hinders the development of causal RL. In this paper, we augment Goal-Conditioned RL (GCRL) with Causal Graph (CG), a structure built upon the relation between objects and events. We novelly formulate the GCRL problem into variational likelihood maximization with CG as latent variables. To optimize the derived objective, we propose a framework with theoretical performance guarantees that alternates between two steps: using interventional data to estimate the posterior of CG; using CG to learn generalizable models and interpretable policies. Due to the lack of public benchmarks that verify generalization capability under reasoning, we design nine tasks and then empirically show the effectiveness of the proposed method against five baselines on these tasks. Further theoretical analysis shows that our performance improvement is attributed to the virtuous cycle of causal discovery, transition modeling, and policy training, which aligns with the experimental evidence in extensive ablation studies.

AAAI Conference 2022 Conference Paper

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

  • Xiaohua Chen
  • Yucan Zhou
  • Dayan Wu
  • Wanqian Zhang
  • Yu Zhou
  • Bo Li
  • Weiping Wang

Real-world data often follows a long-tailed distribution, which makes the performance of existing classification algorithms degrade heavily. A key issue is that samples in tail categories fail to depict their intra-class diversity. Humans can imagine a sample in new poses, scenes, and view angles with their prior knowledge even if it is the first time to see this category. Inspired by this, we propose a novel reasoning-based implicit semantic data augmentation method to borrow transformation directions from other classes. Since the covariance matrix of each category represents the feature transformation directions, we can sample new directions from similar categories to generate definitely different instances. Specifically, the long-tailed distributed data is first adopted to train a backbone and a classifier. Then, a covariance matrix for each category is estimated, and a knowledge graph is constructed to store the relations of any two categories. Finally, tail samples are adaptively enhanced via propagating information from all the similar categories in the knowledge graph. Experimental results on CIFAR-100-LT, ImageNet-LT, and iNaturalist 2018 have demonstrated the effectiveness of our proposed method compared with the state-of-the-art methods.

NeurIPS Conference 2022 Conference Paper

Improving Certified Robustness via Statistical Learning with Logical Reasoning

  • Zhuolin Yang
  • Zhikuan Zhao
  • Boxin Wang
  • Jiawei Zhang
  • Linyi Li
  • Hengzhi Pei
  • Bojan Karlaš
  • Ji Liu

Intensive algorithmic efforts have been made to enable the rapid improvements of certificated robustness for complex ML models recently. However, current robustness certification methods are only able to certify under a limited perturbation radius. Given that existing pure data-driven statistical approaches have reached a bottleneck, in this paper, we propose to integrate statistical ML models with knowledge (expressed as logical rules) as a reasoning component using Markov logic networks (MLN), so as to further improve the overall certified robustness. This opens new research questions about certifying the robustness of such a paradigm, especially the reasoning component (e. g. , MLN). As the first step towards understanding these questions, we first prove that the computational complexity of certifying the robustness of MLN is #P-hard. Guided by this hardness result, we then derive the first certified robustness bound for MLN by carefully analyzing different model regimes. Finally, we conduct extensive experiments on five datasets including both high-dimensional images and natural language texts, and we show that the certified robustness with knowledge-based logical reasoning indeed significantly outperforms that of the state-of-the-arts.

AAAI Conference 2022 Conference Paper

Invariant Information Bottleneck for Domain Generalization

  • Bo Li
  • Yifei Shen
  • Yezhen Wang
  • Wenzhen Zhu
  • Colorado Reed
  • Dongsheng Li
  • Kurt Keutzer
  • Han Zhao

Invariant risk minimization (IRM) has recently emerged as a promising alternative for domain generalization. Nevertheless, the loss function is difficult to optimize for nonlinear classifiers and the original optimization objective could fail when pseudo-invariant features and geometric skews exist. Inspired by IRM, in this paper we propose a novel formulation for domain generalization, dubbed invariant information bottleneck (IIB). IIB aims at minimizing invariant risks for nonlinear classifiers and simultaneously mitigating the impact of pseudo-invariant features and geometric skews. Specifically, we first present a novel formulation for invariant causal prediction via mutual information. Then we adopt the variational formulation of the mutual information to develop a tractable loss function for nonlinear classifiers. To overcome the failure modes of IRM, we propose to minimize the mutual information between the inputs and the corresponding representations. IIB significantly outperforms IRM on synthetic datasets, where the pseudo-invariant features and geometric skews occur, showing the effectiveness of proposed formulation in overcoming failure modes of IRM. Furthermore, experiments on DomainBed show that IIB outperforms 13 baselines by 0. 9% on average across 7 real datasets.

NeurIPS Conference 2022 Conference Paper

LOT: Layer-wise Orthogonal Training on Improving l2 Certified Robustness

  • Xiaojun Xu
  • Linyi Li
  • Bo Li

Recent studies show that training deep neural networks (DNNs) with Lipschitz constraints are able to enhance adversarial robustness and other model properties such as stability. In this paper, we propose a layer-wise orthogonal training method (LOT) to effectively train 1-Lipschitz convolution layers via parametrizing an orthogonal matrix with an unconstrained matrix. We then efficiently compute the inverse square root of a convolution kernel by transforming the input domain to the Fourier frequency domain. On the other hand, as existing works show that semi-supervised training helps improve empirical robustness, we aim to bridge the gap and prove that semi-supervised learning also improves the certified robustness of Lipschitz-bounded models. We conduct comprehensive evaluations for LOT under different settings. We show that LOT significantly outperforms baselines regarding deterministic l2 certified robustness, and scales to deeper neural networks. Under the supervised scenario, we improve the state-of-the-art certified robustness for all architectures (e. g. from 59. 04% to 63. 50% on CIFAR-10 and from 32. 57% to 34. 59% on CIFAR-100 at radius $\rho=36/255$ for 40-layer networks). With semi-supervised learning over unlabelled data, we are able to improve state-of-the-art certified robustness on CIFAR-10 at $\rho=108/255$ from 36. 04% to 42. 39%. In addition, LOT consistently outperforms baselines on different model architectures with only 1/3 evaluation time.

NeurIPS Conference 2022 Conference Paper

OpenOOD: Benchmarking Generalized Out-of-Distribution Detection

  • Jingkang Yang
  • Pengyun Wang
  • Dejian Zou
  • Zitang Zhou
  • Kunyuan Ding
  • Wenxuan Peng
  • Haoqi Wang
  • Guangyao Chen

Out-of-distribution (OOD) detection is vital to safety-critical machine learning applications and has thus been extensively studied, with a plethora of methods developed in the literature. However, the field currently lacks a unified, strictly formulated, and comprehensive benchmark, which often results in unfair comparisons and inconclusive results. From the problem setting perspective, OOD detection is closely related to neighboring fields including anomaly detection (AD), open set recognition (OSR), and model uncertainty, since methods developed for one domain are often applicable to each other. To help the community to improve the evaluation and advance, we build a unified, well-structured codebase called OpenOOD, which implements over 30 methods developed in relevant fields and provides a comprehensive benchmark under the recently proposed generalized OOD detection framework. With a comprehensive comparison of these methods, we are gratified that the field has progressed significantly over the past few years, where both preprocessing methods and the orthogonal post-hoc methods show strong potential.

NeurIPS Conference 2022 Conference Paper

Product Ranking for Revenue Maximization with Multiple Purchases

  • Renzhe Xu
  • Xingxuan Zhang
  • Bo Li
  • Yafeng Zhang
  • Xiaolong Chen
  • Peng Cui

Product ranking is the core problem for revenue-maximizing online retailers. To design proper product ranking algorithms, various consumer choice models are proposed to characterize the consumers' behaviors when they are provided with a list of products. However, existing works assume that each consumer purchases at most one product or will keep viewing the product list after purchasing a product, which does not agree with the common practice in real scenarios. In this paper, we assume that each consumer can purchase multiple products at will. To model consumers' willingness to view and purchase, we set a random attention span and purchase budget, which determines the maximal amount of products that he/she views and purchases, respectively. Under this setting, we first design an optimal ranking policy when the online retailer can precisely model consumers' behaviors. Based on the policy, we further develop the Multiple-Purchase-with-Budget UCB (MPB-UCB) algorithms with $\tilde{O}(\sqrt{T})$ regret that estimate consumers' behaviors and maximize revenue simultaneously in online settings. Experiments on both synthetic and semi-synthetic datasets prove the effectiveness of the proposed algorithms.

NeurIPS Conference 2022 Conference Paper

SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles

  • Chejian Xu
  • Wenhao Ding
  • Weijie Lyu
  • Zuxin Liu
  • Shuai Wang
  • Yihan He
  • Hanjiang Hu
  • Ding Zhao

As shown by recent studies, machine intelligence-enabled systems are vulnerable to test cases resulting from either adversarial manipulation or natural distribution shifts. This has raised great concerns about deploying machine learning algorithms for real-world applications, especially in safety-critical domains such as autonomous driving (AD). On the other hand, traditional AD testing on naturalistic scenarios requires hundreds of millions of driving miles due to the high dimensionality and rareness of the safety-critical scenarios in the real world. As a result, several approaches for autonomous driving evaluation have been explored, which are usually, however, based on different simulation platforms, types of safety-critical scenarios, scenario generation algorithms, and driving route variations. Thus, despite a large amount of effort in autonomous driving testing, it is still challenging to compare and understand the effectiveness and efficiency of different testing scenario generation algorithms and testing mechanisms under similar conditions. In this paper, we aim to provide the first unified platform SafeBench to integrate different types of safety-critical testing scenarios, scenario generation algorithms, and other variations such as driving routes and environments. In particular, we consider 8 safety-critical testing scenarios following National Highway Traffic Safety Administration (NHTSA) and develop 4 scenario generation algorithms considering 10 variations for each scenario. Meanwhile, we implement 4 deep reinforcement learning-based AD algorithms with 4 types of input (e. g. , bird’s-eye view, camera) to perform fair comparisons on SafeBench. We find our generated testing scenarios are indeed more challenging and observe the trade-off between the performance of AD agents under benign and safety-critical testing scenarios. We believe our unified platform SafeBench for large-scale and effective autonomous driving testing will motivate the development of new testing scenario generation and safe AD algorithms. SafeBench is available at https: //safebench. github. io.

NeurIPS Conference 2022 Conference Paper

Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection

  • Yiming Li
  • Yang Bai
  • Yong Jiang
  • Yong Yang
  • Shu-Tao Xia
  • Bo Li

Deep neural networks (DNNs) have demonstrated their superiority in practice. Arguably, the rapid development of DNNs is largely benefited from high-quality (open-sourced) datasets, based on which researchers and developers can easily evaluate and improve their learning methods. Since the data collection is usually time-consuming or even expensive, how to protect their copyrights is of great significance and worth further exploration. In this paper, we revisit dataset ownership verification. We find that existing verification methods introduced new security risks in DNNs trained on the protected dataset, due to the targeted nature of poison-only backdoor watermarks. To alleviate this problem, in this work, we explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic. Specifically, we introduce two dispersibilities and prove their correlation, based on which we design the untargeted backdoor watermark under both poisoned-label and clean-label settings. We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification. Experiments on benchmark datasets verify the effectiveness of our methods and their resistance to existing backdoor defenses.

NeurIPS Conference 2022 Conference Paper

VF-PS: How to Select Important Participants in Vertical Federated Learning, Efficiently and Securely?

  • Jiawei Jiang
  • Lukas Burkhalter
  • Fangcheng Fu
  • Bolin Ding
  • Bo Du
  • Anwar Hithnawi
  • Bo Li
  • Ce Zhang

Vertical Federated Learning (VFL), that trains federated models over vertically partitioned data, has emerged as an important learning paradigm. However, existing VFL methods are facing two challenges: (1) scalability when # participants grows to even modest scale and (2) diminishing return w. r. t. # participants: not all participants are equally important and many will not introduce quality improvement in a large consortium. Inspired by these two challenges, in this paper, we ask: How can we select l out of m participants, where l ≪ m, that are most important? We call this problem Vertically Federated Participant Selection, and model it with a principled mutual information-based view. Our first technical contribution is VF-MINE—a Vertically Federated Mutual INformation Estimator—that uses one of the most celebrated algorithms in database theory—Fagin’s algorithm as a building block. Our second contribution is to further optimize VF-MINE to enable VF-PS, a group testing-based participant selection framework. We empirically show that vertically federated participation selection can be orders of magnitude faster than training a full-fledged VFL model, while being able to identify the most important subset of participants that often lead to a VFL model of similar quality.

IROS Conference 2021 Conference Paper

A Novel 2-SUR 6-DOF Parallel Manipulator Actuated by Spherical Motion Generators

  • Kun Wang
  • Xiaoyong Wu
  • Yujin Wang
  • Bo Li
  • Bo Yuan
  • Shaoping Bai

A novel 6-DOF parallel manipulator with two spherical-universal-revolute limbs is proposed in this work. Compared with general 6-DOF parallel manipulators of six kinematic limbs, this new manipulator actuated by spherical motion generators has only two limbs, which brings kinematic advantages such as small footprint and large workspace. The inverse position problem of the manipulator is solved by an analytical approach, upon which velocity equations are formulated. Kinematic performance including workspace and also manipulability are calculated to show the advantages of the new design.

NeurIPS Conference 2021 Conference Paper

Adversarial Attack Generation Empowered by Min-Max Optimization

  • Jingkang Wang
  • Tianyun Zhang
  • Sijia Liu
  • Pin-Yu Chen
  • Jiacen Xu
  • Makan Fardad
  • Bo Li

The worst-case training principle that minimizes the maximal adversarial loss, also known as adversarial training (AT), has shown to be a state-of-the-art approach for enhancing adversarial robustness. Nevertheless, min-max optimization beyond the purpose of AT has not been rigorously explored in the adversarial context. In this paper, we show how a general notion of min-max optimization over multiple domains can be leveraged to the design of different types of adversarial attacks. In particular, given a set of risk sources, minimizing the worst-case attack loss can be reformulated as a min-max problem by introducing domain weights that are maximized over the probability simplex of the domain set. We showcase this unified framework in three attack generation problems -- attacking model ensembles, devising universal perturbation under multiple inputs, and crafting attacks resilient to data transformations. Extensive experiments demonstrate that our approach leads to substantial attack improvement over the existing heuristic strategies as well as robustness improvement over state-of-the-art defense methods against multiple perturbation types. Furthermore, we find that the self-adjusted domain weights learned from min-max optimization can provide a holistic tool to explain the difficulty level of attack across domains.

NeurIPS Conference 2021 Conference Paper

Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models

  • Boxin Wang
  • Chejian Xu
  • Shuohang Wang
  • Zhe Gan
  • Yu Cheng
  • Jianfeng Gao
  • Ahmed Awadallah
  • Bo Li

Large-scale pre-trained language models have achieved tremendous success across a wide range of natural language understanding (NLU) tasks, even surpassing human performance. However, recent studies reveal that the robustness of these models can be challenged by carefully crafted textual adversarial examples. While several individual datasets have been proposed to evaluate model robustness, a principled and comprehensive benchmark is still missing. In this paper, we present Adversarial GLUE (AdvGLUE), a new multi-task benchmark to quantitatively and thoroughly explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. In particular, we systematically apply 14 textual adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations. Our findings are summarized as follows. (i) Most existing adversarial attack algorithms are prone to generating invalid or ambiguous adversarial examples, with around 90% of them either changing the original semantic meanings or misleading human annotators as well. Therefore, we perform a careful filtering process to curate a high-quality benchmark. (ii) All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy. We hope our work will motivate the development of new adversarial attacks that are more stealthy and semantic-preserving, as well as new robust language models against sophisticated adversarial attacks. AdvGLUE is available at https: //adversarialglue. github. io.

NeurIPS Conference 2021 Conference Paper

Anti-Backdoor Learning: Training Clean Models on Poisoned Data

  • Yige Li
  • Xixiang Lyu
  • Nodens Koren
  • Lingjuan Lyu
  • Bo Li
  • Xingjun Ma

Backdoor attack has emerged as a major security threat to deep neural networks (DNNs). While existing defense methods have demonstrated promising results on detecting or erasing backdoors, it is still not clear whether robust training methods can be devised to prevent the backdoor triggers being injected into the trained model in the first place. In this paper, we introduce the concept of \emph{anti-backdoor learning}, aiming to train \emph{clean} models given backdoor-poisoned data. We frame the overall learning process as a dual-task of learning the \emph{clean} and the \emph{backdoor} portions of data. From this view, we identify two inherent characteristics of backdoor attacks as their weaknesses: 1) the models learn backdoored data much faster than learning with clean data, and the stronger the attack the faster the model converges on backdoored data; 2) the backdoor task is tied to a specific class (the backdoor target class). Based on these two weaknesses, we propose a general learning scheme, Anti-Backdoor Learning (ABL), to automatically prevent backdoor attacks during training. ABL introduces a two-stage \emph{gradient ascent} mechanism for standard training to 1) help isolate backdoor examples at an early training stage, and 2) break the correlation between backdoor examples and the target class at a later training stage. Through extensive experiments on multiple benchmark datasets against 10 state-of-the-art attacks, we empirically show that ABL-trained models on backdoor-poisoned data achieve the same performance as they were trained on purely clean data. Code is available at \url{https: //github. com/bboylyg/ABL}.

IJCAI Conference 2021 Conference Paper

Budget-feasible Maximum Nash Social Welfare is Almost Envy-free

  • Xiaowei Wu
  • Bo Li
  • Jiarui Gan

The Nash social welfare (NSW) is a well-known social welfare measurement that balances individual utilities and the overall efficiency. In the context of fair allocation of indivisible goods, it has been shown by Caragiannis et al. (EC 2016 and TEAC 2019) that an allocation maximizing the NSW is envy-free up to one good (EF1). In this paper, we are interested in the fairness of the NSW in a budget-feasible allocation problem, in which each item has a cost that will be incurred to the agent it is allocated to, and each agent has a budget constraint on the total cost of items she receives. We show that a budget-feasible allocation that maximizes the NSW achieves a 1/4-approximation of EF1 and the approximation ratio is tight. The approximation ratio improves gracefully when the items have small costs compared with the agents' budgets; it converges to 1/2 when the budget-cost ratio approaches infinity.

AAAI Conference 2021 Conference Paper

ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation

  • Sicheng Zhao
  • Yezhen Wang
  • Bo Li
  • Bichen Wu
  • Yang Gao
  • Pengfei Xu
  • Trevor Darrell
  • Kurt Keutzer

Due to its robust and precise distance measurements, Li- DAR plays an important role in scene understanding for autonomous driving. Training deep neural networks (DNNs) on LiDAR data requires large-scale point-wise annotations, which are time-consuming and expensive to obtain. Instead, simulation-to-real domain adaptation (SRDA) trains a DNN using unlimited synthetic data with automatically generated labels and transfers the learned model to real scenarios. Existing SRDA methods for LiDAR point cloud segmentation mainly employ a multi-stage pipeline and focus on featurelevel alignment. They require prior knowledge of real-world statistics and ignore the pixel-level dropout noise gap and the spatial feature gap between different domains. In this paper, we propose a novel end-to-end framework, named ePointDA, to address the above issues. Specifically, ePointDA consists of three modules: self-supervised dropout noise rendering, statistics-invariant and spatially-adaptive feature alignment, and transferable segmentation learning. The joint optimization enables ePointDA to bridge the domain shift at the pixel-level by explicitly rendering dropout noise for synthetic LiDAR and at the feature-level by spatially aligning the features between different domains, without requiring the real-world statistics. Extensive experiments adapting from synthetic GTA-LiDAR to real KITTI and SemanticKITTI demonstrate the superiority of ePointDA for LiDAR point cloud segmentation.

NeurIPS Conference 2021 Conference Paper

Fair Scheduling for Time-dependent Resources

  • Bo Li
  • Minming Li
  • Ruilong Zhang

We study a fair resource scheduling problem, where a set of interval jobs are to be allocated to heterogeneous machines controlled by intellectual agents. Each job is associated with release time, deadline, and processing time such that it can be processed if its complete processing period is between its release time and deadline. The machines gain possibly different utilities by processing different jobs, and all jobs assigned to the same machine should be processed without overlap. We consider two widely studied solution concepts, namely, maximin share fairness and envy-freeness. For both criteria, we discuss the extent to which fair allocations exist and present constant approximation algorithms for various settings.

ICML Conference 2021 Conference Paper

FILTRA: Rethinking Steerable CNN by Filter Transform

  • Bo Li
  • Qili Wang
  • Gim Hee Lee

Steerable CNN imposes the prior knowledge of transformation invariance or equivariance in the network architecture to enhance the the network robustness on geometry transformation of data and reduce overfitting. It has been an intuitive and widely used technique to construct a steerable filter by augmenting a filter with its transformed copies in the past decades, which is named as filter transform in this paper. Recently, the problem of steerable CNN has been studied from aspect of group representation theory, which reveals the function space structure of a steerable kernel function. However, it is not yet clear on how this theory is related to the filter transform technique. In this paper, we show that kernel constructed by filter transform can also be interpreted in the group representation theory. This interpretation help complete the puzzle of steerable CNN theory and provides a novel and simple approach to implement steerable convolution operators. Experiments are executed on multiple datasets to verify the feasibility of the proposed approach.

NeurIPS Conference 2021 Conference Paper

G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators

  • Yunhui Long
  • Boxin Wang
  • Zhuolin Yang
  • Bhavya Kailkhura
  • Aston Zhang
  • Carl Gunter
  • Bo Li

Recent advances in machine learning have largely benefited from the massive accessible training data. However, large-scale data sharing has raised great privacy concerns. In this work, we propose a novel privacy-preserving data Generative model based on the PATE framework (G-PATE), aiming to train a scalable differentially private data generator that preserves high generated data utility. Our approach leverages generative adversarial nets to generate data, combined with private aggregation among different discriminators to ensure strong privacy guarantees. Compared to existing approaches, G-PATE significantly improves the use of privacy budgets. In particular, we train a student data generator with an ensemble of teacher discriminators and propose a novel private gradient aggregation mechanism to ensure differential privacy on all information that flows from teacher discriminators to the student generator. In addition, with random projection and gradient discretization, the proposed gradient aggregation mechanism is able to effectively deal with high-dimensional gradient vectors. Theoretically, we prove that G-PATE ensures differential privacy for the data generator. Empirically, we demonstrate the superiority of G-PATE over prior work through extensive experiments. We show that G-PATE is the first work being able to generate high-dimensional image data with high data utility under limited privacy budgets ($\varepsilon \le 1$). Our code is available at https: //github. com/AI-secure/G-PATE.

NeurIPS Conference 2021 Conference Paper

Integrated Latent Heterogeneity and Invariance Learning in Kernel Space

  • Jiashuo Liu
  • Zheyuan Hu
  • Peng Cui
  • Bo Li
  • Zheyan Shen

The ability to generalize under distributional shifts is essential to reliable machine learning, while models optimized with empirical risk minimization usually fail on non-$i. i. d$ testing data. Recently, invariant learning methods for out-of-distribution (OOD) generalization propose to find causally invariant relationships with multi-environments. However, modern datasets are frequently multi-sourced without explicit source labels, rendering many invariant learning methods inapplicable. In this paper, we propose Kernelized Heterogeneous Risk Minimization (KerHRM) algorithm, which achieves both the latent heterogeneity exploration and invariant learning in kernel space, and then gives feedback to the original neural network by appointing invariant gradient direction. We theoretically justify our algorithm and empirically validate the effectiveness of our algorithm with extensive experiments.

IJCAI Conference 2021 Conference Paper

Mechanism Design for Facility Location Problems: A Survey

  • Hau Chan
  • Aris Filos-Ratsikas
  • Bo Li
  • Minming Li
  • Chenhao Wang

The study of approximate mechanism design for facility location has been in the center of research at the intersection of artificial intelligence and economics for the last decade, largely due to its practical importance in various domains, such as social planning and clustering. At a high level, the goal is to select a number of locations on which to build a set of facilities, aiming to optimize some social objective based on the preferences of strategic agents, who might have incentives to misreport their private information. This paper presents a comprehensive survey of the significant progress that has been made since the introduction of the problem, highlighting all the different variants and methodologies, as well as the most interesting directions for future research.

IJCAI Conference 2021 Conference Paper

MG-DVD: A Real-time Framework for Malware Variant Detection Based on Dynamic Heterogeneous Graph Learning

  • Chen Liu
  • Bo Li
  • Jun Zhao
  • Ming Su
  • Xu-Dong Liu

Detecting the newly emerging malware variants in real time is crucial for mitigating cyber risks and proactively blocking intrusions. In this paper, we propose MG-DVD, a novel detection framework based on dynamic heterogeneous graph learning, to detect malware variants in real time. Particularly, MG-DVD first models the fine-grained execution event streams of malware variants into dynamic heterogeneous graphs and investigates real-world meta-graphs between malware objects, which can effectively characterize more discriminative malicious evolutionary patterns between malware and their variants. Then, MG-DVD presents two dynamic walk-based heterogeneous graph learning methods to learn more comprehensive representations of malware variants, which significantly reduces the cost of the entire graph retraining. As a result, MG-DVD is equipped with the ability to detect malware variants in real time, and it presents better interpretability by introducing meaningful meta-graphs. Comprehensive experiments on large-scale samples prove that our proposed MG-DVD outperforms state-of-the-art methods in detecting malware variants in terms of effectiveness and efficiency.

AAMAS Conference 2021 Conference Paper

Multi-Robot Task Allocation-Complexity and Approximation

  • Haris Aziz
  • Hau Chan
  • Ágnes Cseh
  • Bo Li
  • Fahimeh Ramezani
  • Chenhao Wang

Multi-robot task allocation is one of the most fundamental classes of problems in robotics and is crucial for various real-world robotic applications such as search, rescue and area exploration. We consider the Single-Task robots and Multi-Robot tasks Instantaneous Assignment (ST-MR-IA) setting where each task requires at least a certain number of robots and each robot can work on at most one task and incurs an operational cost for each task. Our aim is to consider a natural computational problem of allocating robots to complete the maximum number of tasks subject to budget constraints. We consider budget constraints of three different kinds: (1) total budget, (2) task budget, and (3) robot budget. We provide a detailed complexity analysis including results on approximations as well as polynomial-time algorithms for the general setting and important restricted settings.

AAAI Conference 2021 Conference Paper

Multi-view Inference for Relation Extraction with Uncertain Knowledge

  • Bo Li
  • Wei Ye
  • Canming Huang
  • Shikun Zhang

Knowledge graphs (KGs) are widely used to facilitate relation extraction (RE) tasks. While most previous RE methods focus on leveraging deterministic KGs, uncertain KGs, which assign a confidence score for each relation instance, can provide prior probability distributions of relational facts as valuable external knowledge for RE models. This paper proposes to exploit uncertain knowledge to improve relation extraction. Specifically, we introduce ProBase, an uncertain KG that indicates to what extent a target entity belongs to a concept, into our RE architecture. We then design a novel multi-view inference framework to systematically integrate local context and global knowledge across three views: mention-, entity- and concept-view. The experiment results show that our model achieves competitive performances on both sentence- and document-level relation extraction, which verifies the effectiveness of introducing uncertain knowledge and the multiview inference framework that we design.

ICRA Conference 2021 Conference Paper

Reduced Dynamics and Control for an Autonomous Bicycle

  • Jiaming Xiong
  • Bo Li
  • Ruihan Yu
  • Daolin Ma
  • Wei Wang 0034
  • Caishan Liu

In this paper, we propose the reduced model for the full dynamics of a bicycle and analyze its nonlinear behavior under a proportional control law for steering. Based on the Gibbs-Appell equations for the Whipple bicycle, we obtain a second-order nonlinear ordinary differential equation (ODE) that governs the bicycle’s controlled motion. Two types of equilibrium points for the governing equation are found, which correspond to the bicycle’s uniform straight forward and circular motions, respectively. By applying the Hurwitz criterion to the linearized equation, we find that the steer coefficient must be negative, consistent with the human’s intuition of turning toward a fall. Under this condition, a critical angular velocity of the rear wheel exists, above which the uniform straight forward motion is stable, and slightly below which a pair of symmetrical stable uniform circular motions will occur. These theoretical findings are verified by both numerical simulations and experiments performed on a powered autonomous bicycle.

IJCAI Conference 2021 Conference Paper

Rescuing Deep Hashing from Dead Bits Problem

  • Shu Zhao
  • Dayan Wu
  • Yucan Zhou
  • Bo Li
  • Weiping Wang

Deep hashing methods have shown great retrieval accuracy and efficiency in large-scale image retrieval. How to optimize discrete hash bits is always the focus in deep hashing methods. A common strategy in these methods is to adopt an activation function, e. g. sigmoid() or tanh(), and minimize a quantization loss to approximate discrete values. However, this paradigm may make more and more hash bits stuck into the wrong saturated area of the activation functions and never escaped. We call this problem "Dead Bits Problem (DBP)". Besides, the existing quantization loss will aggravate DBP as well. In this paper, we propose a simple but effective gradient amplifier which acts before activation functions to alleviate DBP. Moreover, we devise an error-aware quantization loss to further alleviate DBP. It avoids the negative effect of quantization loss based on the similarity between two images. The proposed gradient amplifier and error-aware quantization loss are compatible with a variety of deep hashing methods. Experimental results on three datasets demonstrate the efficiency of the proposed gradient amplifier and the error-aware quantization loss.

AAAI Conference 2021 Conference Paper

Stable Adversarial Learning under Distributional Shifts

  • Jiashuo Liu
  • Zheyan Shen
  • Peng Cui
  • Linjun Zhou
  • Kun Kuang
  • Bo Li
  • Yishi Lin

Machine learning algorithms with empirical risk minimization are vulnerable under distributional shifts due to the greedy adoption of all the correlations found in training data. Recently, there are robust learning methods aiming at this problem by minimizing the worst-case risk over an uncertainty set. However, they equally treat all covariates to form the decision sets regardless of the stability of their correlations with the target, resulting in the overwhelmingly large set and low confidence of the learner. In this paper, we propose Stable Adversarial Learning (SAL) algorithm that leverages heterogeneous data sources to construct a more practical uncertainty set and conduct differentiated robustness optimization, where covariates are differentiated according to the stability of their correlations with the target. We theoretically show that our method is tractable for stochastic gradientbased optimization and provide the performance guarantees for our method. Empirical studies on both simulation and real datasets validate the effectiveness of our method in terms of uniformly good performance across unknown distributional shifts.

NeurIPS Conference 2021 Conference Paper

TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness

  • Zhuolin Yang
  • Linyi Li
  • Xiaojun Xu
  • Shiliang Zuo
  • Qian Chen
  • Pan Zhou
  • Benjamin Rubinstein
  • Ce Zhang

Adversarial Transferability is an intriguing property - adversarial perturbation crafted against one model is also effective against another model, while these models are from different model families or training processes. To better protect ML systems against adversarial attacks, several questions are raised: what are the sufficient conditions for adversarial transferability, and how to bound it? Is there a way to reduce the adversarial transferability in order to improve the robustness of an ensemble ML model? To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness. Our theoretical analysis shows that only promoting the orthogonality between gradients of base models is not enough to ensure low transferability; in the meantime, the model smoothness is an important factor to control the transferability. We also provide the lower and upper bounds of adversarial transferability under certain conditions. Inspired by our theoretical analysis, we propose an effective Transferability Reduced Smooth (TRS) ensemble training strategy to train a robust ensemble with low transferability by enforcing both gradient orthogonality and model smoothness between base models. We conduct extensive experiments on TRS and compare with 6 state-of-the-art ensemble baselines against 8 whitebox attacks on different datasets, demonstrating that the proposed TRS outperforms all baselines significantly.

JAIR Journal 2021 Journal Article

Two-facility Location Games with Minimum Distance Requirement

  • Xinping Xu
  • Bo Li
  • Minming Li
  • Lingjie Duan

We study the mechanism design problem of a social planner for locating two facilities on a line interval [0, 1], where a set of n strategic agents report their locations and a mechanism determines the locations of the two facilities. We consider the requirement of a minimum distance 0 ≤ d ≤ 1 between the two facilities. Given the two facilities are heterogeneous, we model the cost/utility of an agent as the sum of his distances to both facilities. In the heterogeneous two-facility location game to minimize the social cost, we show that the optimal solution can be computed in polynomial time and prove that carefully choosing one optimal solution as output is strategyproof. We also design a strategyproof mechanism minimizing the maximum cost. Given the two facilities are homogeneous, we model the cost/utility of an agent as his distance to the closer facility. In the homogeneous two-facility location game for minimizing the social cost, we show that any deterministic strategyproof mechanism has unbounded approximation ratio. Moreover, in the obnoxious heterogeneous two-facility location game for maximizing the social utility, we propose new deterministic group strategyproof mechanisms with provable approximation ratios and establish a lower bound (7 − d)/6 for any deterministic strategyproof mechanism. We also design a strategyproof mechanism maximizing the minimum utility. In the obnoxious homogeneous two-facility location game for maximizing the social utility, we propose deterministic group strategyproof mechanisms with provable approximation ratios and establish a lower bound 4/3. Besides, in the two-facility location game with triple-preference, where each facility may be favorable, obnoxious, indifferent for any agent, we further motivate agents to report both their locations and preferences towards the two facilities truthfully, and design a deterministic group strategyproof mechanism with an approximation ratio 4.

NeurIPS Conference 2021 Conference Paper

What Would Jiminy Cricket Do? Towards Agents That Behave Morally

  • Dan Hendrycks
  • Mantas Mazeika
  • Andy Zou
  • Sahil Patel
  • Christine Zhu
  • Jesus Navarro
  • Dawn Song
  • Bo Li

When making everyday decisions, people are guided by their conscience, an internal sense of right and wrong, to behave morally. By contrast, artificial agents may behave immorally when trained on environments that ignore moral concerns, such as violent video games. With the advent of generally capable agents that pretrain on many environments, mitigating inherited biases towards immoral behavior will become necessary. However, prior work on aligning agents with human values and morals focuses on small-scale settings lacking in semantic complexity. To enable research in larger, more realistic settings, we introduce Jiminy Cricket, an environment suite of 25 text-based adventure games with thousands of semantically rich, morally salient scenarios. Via dense annotations for every possible action, Jiminy Cricket environments robustly evaluate whether agents can act morally while maximizing reward. To improve moral behavior, we leverage language models with commonsense moral knowledge and develop strategies to mediate this knowledge into actions. In extensive experiments, we find that our artificial conscience approach can steer agents towards moral behavior without sacrificing performance.

NeurIPS Conference 2020 Conference Paper

Counterfactual Prediction for Bundle Treatment

  • Hao Zou
  • Peng Cui
  • Bo Li
  • Zheyan Shen
  • Jianxin Ma
  • Hongxia Yang
  • Yue He

Estimating counterfactual outcome of different treatments from observational data is an important problem to assist decision making in a variety of fields. Among the various forms of treatment specification, bundle treatment has been widely adopted in many scenarios, such as recommendation systems and online marketing. The bundle treatment usually can be abstracted as a high dimensional binary vector, which makes it more challenging for researchers to remove the confounding bias in observational data. In this work, we assume the existence of low dimensional latent structure underlying bundle treatment. Via the learned latent representations of treatments, we propose a novel variational sample re-weighting (VSR) method to eliminate confounding bias by decorrelating the treatments and confounders. Finally, we conduct extensive experiments to demonstrate that the predictive model trained on this re-weighted dataset can achieve more accurate counterfactual outcome prediction.

ICLR Conference 2020 Conference Paper

Efficient Probabilistic Logic Reasoning with Graph Neural Networks

  • Yuyu Zhang
  • Xinshi Chen
  • Yuan Yang
  • Arun Ramamurthy
  • Bo Li
  • Yuan (Alan) Qi
  • Le Song

Markov Logic Networks (MLNs), which elegantly combine logic rules and probabilistic graphical models, can be used to address many knowledge graph problems. However, inference in MLN is computationally intensive, making the industrial-scale application of MLN very difficult. In recent years, graph neural networks (GNNs) have emerged as efficient and effective tools for large-scale graph problems. Nevertheless, GNNs do not explicitly incorporate prior logic rules into the models, and may require many labeled examples for a target task. In this paper, we explore the combination of MLNs and GNNs, and use graph neural networks for variational inference in MLN. We propose a GNN variant, named ExpressGNN, which strikes a nice balance between the representation power and the simplicity of the model. Our extensive experiments on several benchmark datasets demonstrate that ExpressGNN leads to effective and efficient probabilistic logic reasoning.

AAAI Conference 2020 Conference Paper

Facility Location Problem with Capacity Constraints: Algorithmic and Mechanism Design Perspectives

  • Haris Aziz
  • Hau Chan
  • Barton Lee
  • Bo Li
  • Toby Walsh

We consider the facility location problem in the onedimensional setting where each facility can serve a limited number of agents from the algorithmic and mechanism design perspectives. From the algorithmic perspective, we prove that the corresponding optimization problem, where the goal is to locate facilities to minimize either the total cost to all agents or the maximum cost of any agent is NP-hard. However, we show that the problem is fixed-parameter tractable, and the optimal solution can be computed in polynomial time whenever the number of facilities is bounded, or when all facilities have identical capacities. We then consider the problem from a mechanism design perspective where the agents are strategic and need not reveal their true locations. We show that several natural mechanisms studied in the uncapacitated setting either lose strategyproofness or a bound on the solution quality for the total or maximum cost objective. We then propose new mechanisms that are strategyproof and achieve approximation guarantees that almost match the lower bounds.

AAAI Conference 2020 Conference Paper

Frame-Guided Region-Aligned Representation for Video Person Re-Identification

  • Zengqun Chen
  • Zhiheng Zhou
  • Junchu Huang
  • Pengyu Zhang
  • Bo Li

Pedestrians in videos are usually in a moving state, resulting in serious spatial misalignment like scale variations and pose changes, which makes the video-based person re-identification problem more challenging. To address the above issue, in this paper, we propose a Frame-Guided Region-Aligned model (FGRA) for discriminative representation learning in two steps in an end-to-end manner. Firstly, based on a frame-guided feature learning strategy and a nonparametric alignment module, a novel alignment mechanism is proposed to extract well-aligned region features. Secondly, in order to form a sequence representation, an effective feature aggregation strategy that utilizes temporal alignment score and spatial attention is adopted to fuse region features in the temporal and spatial dimensions, respectively. Experiments are conducted on benchmark datasets to demonstrate the effectiveness of the proposed method to solve the misalignment problem and the superiority of the proposed method to the existing video-based person re-identification methods.

IROS Conference 2020 Conference Paper

GP-SLAM+: real-time 3D lidar SLAM based on improved regionalized Gaussian process map reconstruction

  • Jianyuan Ruan
  • Bo Li
  • Yingqiang Wang
  • Zhou Fang

This paper presents a 3D lidar SLAM system based on improved regionalized Gaussian process (GP) map reconstruction to provide both low-drift state estimation and mapping in real-time for robotics applications. We utilize spatial GP regression to model the environment. This tool enables us to recover surfaces including those in sparsely scanned areas and obtain uniform samples with uncertainty. Those properties facilitate robust data association and map updating in our scan-to-map registration scheme, especially when working with sparse range data. Compared with previous GP-SLAM, this work overcomes the prohibitive computational complexity of GP and redesigns the registration strategy to meet the accuracy requirements in 3D scenarios. For large-scale tasks, a two-thread framework is employed to suppress the drift further. Aerial and ground-based experiments demonstrate that our method allows robust odometry and precise mapping in real-time. It also outperforms the state-of-the-art lidar SLAM systems in our tests with light-weight sensors.

NeurIPS Conference 2020 Conference Paper

On Convergence of Nearest Neighbor Classifiers over Feature Transformations

  • Luka Rimanic
  • Cedric Renggli
  • Bo Li
  • Ce Zhang

The k-Nearest Neighbors (kNN) classifier is a fundamental non-parametric machine learning algorithm. However, it is well known that it suffers from the curse of dimensionality, which is why in practice one often applies a kNN classifier on top of a (pre-trained) feature transformation. From a theoretical perspective, most, if not all theoretical results aimed at understanding the kNN classifier are derived for the raw feature space. This leads to an emerging gap between our theoretical understanding of kNN and its practical applications. In this paper, we take a first step towards bridging this gap. We provide a novel analysis on the convergence rates of a kNN classifier over transformed features. This analysis requires in-depth understanding of the properties that connect both the transformed space and the raw feature space. More precisely, we build our convergence bound upon two key properties of the transformed space: (1) safety -- how well can one recover the raw posterior from the transformed space, and (2) smoothness -- how complex this recovery function is. Based on our result, we are able to explain why some (pre-trained) feature transformations are better suited for a kNN classifier than other. We empirically validate that both properties have an impact on the kNN convergence on 30 feature transformations with 6 benchmark datasets spanning from the vision to the text domain.

AAAI Conference 2020 Conference Paper

Reinforcement Learning with Perturbed Rewards

  • Jingkang Wang
  • Yang Liu
  • Bo Li

Recent studies have shown that reinforcement learning (RL) models are vulnerable in various noisy scenarios. For instance, the observed reward channel is often subject to noise in practice (e. g. , when rewards are collected through sensors), and is therefore not credible. In addition, for applications such as robotics, a deep reinforcement learning (DRL) algorithm can be manipulated to produce arbitrary errors by receiving corrupted rewards. In this paper, we consider noisy RL problems with perturbed rewards, which can be approximated with a confusion matrix. We develop a robust RL framework that enables agents to learn in noisy environments where only perturbed rewards are observed. Our solution framework builds on existing RL/DRL algorithms and firstly addresses the biased noisy reward setting without any assumptions on the true distribution (e. g. , zero-mean Gaussian noise as made in previous works). The core ideas of our solution include estimating a reward confusion matrix and defining a set of unbiased surrogate rewards. We prove the convergence and sample complexity of our approach. Extensive experiments on different DRL platforms show that trained policies based on our estimated surrogate reward can achieve higher expected rewards, and converge faster than existing baselines. For instance, the state-of-the-art PPO algorithm is able to obtain 84. 6% and 80. 8% improvements on average score for five Atari games, with error rates as 10% and 30% respectively.

AAAI Conference 2020 Conference Paper

Reinforcement-Learning Based Portfolio Management with Augmented Asset Movement Prediction States

  • Yunan Ye
  • Hengzhi Pei
  • Boxin Wang
  • Pin-Yu Chen
  • Yada Zhu
  • Ju Xiao
  • Bo Li

Portfolio management (PM) is a fundamental financial planning task that aims to achieve investment goals such as maximal profits or minimal risks. Its decision process involves continuous derivation of valuable information from various data sources and sequential decision optimization, which is a prospective research direction for reinforcement learning (RL). In this paper, we propose SARL, a novel State- Augmented RL framework for PM. Our framework aims to address two unique challenges in financial PM: (1) data heterogeneity – the collected information for each asset is usually diverse, noisy and imbalanced (e. g. , news articles); and (2) environment uncertainty – the financial market is versatile and non-stationary. To incorporate heterogeneous data and enhance robustness against environment uncertainty, our SARL augments the asset information with their price movement prediction as additional states, where the prediction can be solely based on financial data (e. g. , asset prices) or derived from alternative sources such as news. Experiments on two real-world datasets, (i) Bitcoin market and (ii) HighTech stock market with 7-year Reuters news articles, validate the effectiveness of SARL over existing PM approaches, both in terms of accumulated profits and risk-adjusted profits. Moreover, extensive simulations are conducted to demonstrate the importance of our proposed state augmentation, providing new insights and boosting performance significantly over standard RL-based PM method and other baselines.

NeurIPS Conference 2020 Conference Paper

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations

  • Huan Zhang
  • Hongge Chen
  • Chaowei Xiao
  • Bo Li
  • Mingyan Liu
  • Duane Boning
  • Cho-Jui Hsieh

A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises. Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions. Several works have shown this vulnerability via adversarial attacks, but how to improve the robustness of DRL under this setting has not been well studied. We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, are ineffective for many RL tasks. We propose the state-adversarial Markov decision process (SA-MDP) to study the fundamental properties of this problem, and develop a theoretically principled policy regularization which can be applied to a large family of DRL algorithms, including deep deterministic policy gradient (DDPG), proximal policy optimization (PPO) and deep Q networks (DQN), for both discrete and continuous action control problems. We significantly improve the robustness of DDPG, PPO and DQN agents under a suite of strong white box adversarial attacks, including two new attacks of our own. Additionally, we find that a robust policy noticeably improves DRL performance in a number of environments.

AAAI Conference 2020 Conference Paper

Rule-Guided Compositional Representation Learning on Knowledge Graphs

  • Guanglin Niu
  • Yongfei Zhang
  • Bo Li
  • Peng Cui
  • Si Liu
  • Jingyang Li
  • Xiaowei Zhang

Representation learning on a knowledge graph (KG) is to embed entities and relations of a KG into low-dimensional continuous vector spaces. Early KG embedding methods only pay attention to structured information encoded in triples, which would cause limited performance due to the structure sparseness of KGs. Some recent attempts consider paths information to expand the structure of KGs but lack explainability in the process of obtaining the path representations. In this paper, we propose a novel Rule and Path-based Joint Embedding (RPJE) scheme, which takes full advantage of the explainability and accuracy of logic rules, the generalization of KG embedding as well as the supplementary semantic structure of paths. Specifically, logic rules of different lengths (the number of relations in rule body) in the form of Horn clauses are first mined from the KG and elaborately encoded for representation learning. Then, the rules of length 2 are applied to compose paths accurately while the rules of length 1 are explicitly employed to create semantic associations among relations and constrain relation embeddings. Moreover, the confidence level of each rule is also considered in optimization to guarantee the availability of applying the rule to representation learning. Extensive experimental results illustrate that RPJE outperforms other state-of-the-art baselines on KG completion task, which also demonstrate the superiority of utilizing logic rules as well as paths for improving the accuracy and explainability of representation learning.

AAAI Conference 2020 Conference Paper

Single Camera Training for Person Re-Identification

  • Tianyu Zhang
  • Lingxi Xie
  • Longhui Wei
  • Yongfei Zhang
  • Bo Li
  • Qi Tian

Person re-identification (ReID) aims at finding the same person in different cameras. Training such systems usually requires a large amount of cross-camera pedestrians to be annotated from surveillance videos, which is labor-consuming especially when the number of cameras is large. Differently, this paper investigates ReID in an unexplored single-cameratraining (SCT) setting, where each person in the training set appears in only one camera. To the best of our knowledge, this setting was never studied before. SCT enjoys the advantage of low-cost data collection and annotation, and thus eases ReID systems to be trained in a brand new environment. However, it raises major challenges due to the lack of cross-camera person occurrences, which conventional approaches heavily rely on to extract discriminative features. The key to dealing with the challenges in the SCT setting lies in designing an effective mechanism to complement cross-camera annotation. We start with a regular deep network for feature extraction, upon which we propose a novel loss function named multi-camera negative loss (MCNL). This is a metric learning loss motivated by probability, suggesting that in a multi-camera system, one image is more likely to be closer to the most similar negative sample in other cameras than to the most similar negative sample in the same camera. In experiments, MCNL significantly boosts ReID accuracy in the SCT setting, which paves the way of fast deployment of ReID systems with good performance on new target scenes.

AAAI Conference 2020 Conference Paper

Stable Prediction with Model Misspecification and Agnostic Distribution Shift

  • Kun Kuang
  • Ruoxuan Xiong
  • Peng Cui
  • Susan Athey
  • Bo Li

For many machine learning algorithms, two main assumptions are required to guarantee performance. One is that the test data are drawn from the same distribution as the training data, and the other is that the model is correctly speci- fied. In real applications, however, we often have little prior knowledge on the test data and on the underlying true model. Under model misspecification, agnostic distribution shift between training and test data leads to inaccuracy of parameter estimation and instability of prediction across unknown test data. To address these problems, we propose a novel Decorrelated Weighting Regression (DWR) algorithm which jointly optimizes a variable decorrelation regularizer and a weighted regression model. The variable decorrelation regularizer estimates a weight for each sample such that variables are decorrelated on the weighted training data. Then, these weights are used in the weighted regression to improve the accuracy of estimation on the effect of each variable, thus help to improve the stability of prediction across unknown test data. Extensive experiments clearly demonstrate that our DWR algorithm can significantly improve the accuracy of parameter estimation and stability of prediction with model misspecification and agnostic distribution shift.

IJCAI Conference 2019 Conference Paper

Approximately Maximizing the Broker's Profit in a Two-sided Market

  • Jing Chen
  • Bo Li
  • Yingkai Li

We study how to maximize the broker's (expected) profit in a two-sided market, where she buys items from a set of sellers and resells them to a set of buyers. Each seller has a single item to sell and holds a private value on her item, and each buyer has a valuation function over the bundles of the sellers' items. We consider the Bayesian setting where the agents' values/valuations are independently drawn from prior distributions, and aim at designing dominant-strategy incentive-compatible (DSIC) mechanisms that are approximately optimal. Production-cost markets, where each item has a publicly-known cost to be produced, provide a platform for us to study two-sided markets. Briefly, we show how to covert a mechanism for production-cost markets into a mechanism for the broker, whenever the former satisfies cost-monotonicity. This reduction holds even when buyers have general combinatorial valuation functions. When the buyers' valuations are additive, we generalize an existing mechanism to production-cost markets in an approximation-preserving way. We then show that the resulting mechanism is cost-monotone and thus can be converted into an 8-approximation mechanism for two-sided markets.

AAAI Conference 2019 Conference Paper

Community Focusing: Yet Another Query-Dependent Community Detection

  • Zhuo Wang
  • Weiping Wang
  • Chaokun Wang
  • Xiaoyan Gu
  • Bo Li
  • Dan Meng

As a major kind of query-dependent community detection, community search finds a densely connected subgraph containing a set of query nodes. As density is the major consideration of community search, most methods of community search often find a dense subgraph with many vertices far from the query nodes, which are not very related to the query nodes. Motivated by this, a new problem called community focusing (CF) is studied. It finds a community where the members are close and densely connected to the query nodes. A distance-sensitive dense subgraph structure called β-attention-core is proposed to remove the vertices loosely connected to or far from the query nodes, and a combinational density is designed to guarantee the density of a subgraph. Then CF is formalized as finding a subgraph with the largest combinational density among the β-attention-core subgraphs containing the query nodes with the largest β. Thereafter, effective methods are devised for CF. Furthermore, a speed-up strategy is developed to make the methods scalable to large networks. Extensive experimental results on real and synthetic networks demonstrate the performance of our methods.

IJCAI Conference 2019 Conference Paper

Detecting Robust Co-Saliency with Recurrent Co-Attention Neural Network

  • Bo Li
  • Zhengxing Sun
  • Lv Tang
  • Yunhan Sun
  • Jinlong Shi

Effective feature representations which should not only express the images individual properties, but also reflect the interaction among group images are essentially crucial for robust co-saliency detection. This paper proposes a novel deep learning co-saliency detection approach which simultaneously learns single image properties and robust group feature in a recurrent manner. Specifically, our network first extracts the semantic features of each image. Then, a specially designed Recurrent Co-Attention Unit (RCAU) will explore all images in the group recurrently to generate the final group representation using the co-attention between images, and meanwhile suppresses noisy information. The group feature which contains complementary synergetic information is later merged with the single image features which express the unique properties to infer robust co-saliency. We also propose a novel co-perceptual loss to make full use of interactive relationships of whole images in the training group as the supervision in our end-to-end training process. Extensive experimental results demonstrate the superiority of our approach in comparison with the state-of-the-art methods.

AAMAS Conference 2019 Conference Paper

Heterogeneous Two-facility Location Games with Minimum Distance Requirement

  • Lingjie Duan
  • Bo Li
  • Minming Li
  • Xinping Xu

We study the mechanism design problem of a social planner for locating two heterogeneous facilities on a line interval [0, 1], where a set of 𝑛 strategic agents report their locations and a mechanism determines the locations of the two facilities. Unlike prior work on two-facility location games, we consider the requirement of the minimum distance 𝑑 between the two facilities. As the two facilities are heterogeneous and have additive effects on agents, we model that the cost of an agent is the sum of his distances to both facilities and the social cost is the total cost of all agents. In the twofacility location game to minimize the social cost, we show that the optimal solution can be computed in polynomial time and prove that carefully choosing one optimal solution as output is strategyproof. In the obnoxious two-facility location game for maximizing the social utility, a mechanism outputting the optimal solution is not strategyproof and we propose new deterministic group strategyproof mechanisms with provable approximation ratios. Moreover, we establish a lower bound 7−𝑑 6 for the approximation ratio achievable by deterministic strategyproof mechanisms. Finally, we study the two-facility location game with triple-preference, where each of the two facilities may be favorable, obnoxious, indifferent for any agent. We further allow each agent to misreport his location and preference towards the two facilities and design a deterministic group strategyproof mechanism with approximation ratio 4. * The authorship follows alphabetical order. † The work described in this paper was supported by the Singapore Ministry of Education Academic Research Fund Tier 2 under Grant MOE2016-T2-1-173. ‡ Part of this work was done when B. Li was visiting City University of Hong Kong. The work described in this paper was supported by NSF CAREER Award No. 1553385. S M. Li is also from City University of Hong Kong Shenzhen Research Institute, Shenzhen, P. R. China. The work described in this paper was supported by a grant from Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CityU 11200518) and was partially sponsored by Project 11771365 supported by NSFC. ¶ X. Xu is the corresponding author. Proc. of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), N. Agmon, M. E. Taylor, E. Elkind, M. Veloso (eds.), May 13–17, 2019, Montreal, Canada. © 2019 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org). All rights reserved.

AAMAS Conference 2019 Conference Paper

How You Act Tells a Lot: Privacy-Leaking Attack on Deep Reinforcement Learning

  • Xinlei Pan
  • Weiyao Wang
  • Xiaoshuai Zhang
  • Bo Li
  • Jinfeng Yi
  • Dawn Song

Machine learning has been widely applied to various applications, some of which involve training with privacy-sensitive data. A modest number of data breaches have been studied, including credit card information in natural language data and identities from face dataset. However, most of these studies focus on supervised learning models. As deep reinforcement learning (DRL) has been deployed in a number of real-world systems, such as indoor robot navigation, whether trained DRL policies can leak private information requires in-depth study. To explore such privacy breaches in general, we mainly propose two methods: environment dynamics search via genetic algorithm and candidate inference based on shadow policies. We conduct extensive experiments to demonstrate such privacy vulnerabilities in DRL under various settings. We leverage the proposed algorithms to infer floor plans from some trained Grid World navigation DRL agents with LiDAR perception. The proposed algorithm can correctly infer most of the floor plans and reaches an average recovery rate of 95. 83% using policy gradient trained agents. In addition, we are able to recover the robot configuration in continuous control environments and an autonomous driving simulator with high accuracy. To the best of our knowledge, this is the first work to investigate privacy leakage in DRL settings and we show that DRL-based agents do potentially leak privacy-sensitive information from the trained policies.

AAMAS Conference 2019 Conference Paper

Maximin-Aware Allocations of Indivisible Goods

  • Hau Chan
  • Jing Chen
  • Bo Li
  • Xiaowei Wu

We study envy-free allocations of indivisible goods to agents in settings where each agent is unaware of the bundles (or allocated goods) of other agents. In particular, we propose maximin aware (MMA) fairness measure, which guarantees that every agent, given the bundle allocated to her, is aware that she does not get the worst bundle, even if she does not know how the other goods are distributed. We also introduce two of its relaxations, MMA1 and MMAX. We show that MMA1 and MMAX potentially have stronger egalitarian guarantees than EF1 and are easier to achieve than MMS and EFX. Finally, we present a polynomial-time algorithm, which computes an allocation such that every agent is either 1 2 -approximate MMA or exactly MMAX. Interestingly, the returned allocation is also 1 2 -approximate EFX when all agents have subadditive valuations, which answers an open question left in [Plaut and Roughgarden, SODA 2018].

IJCAI Conference 2019 Conference Paper

Maximin-Aware Allocations of Indivisible Goods

  • Hau Chan
  • Jing Chen
  • Bo Li
  • Xiaowei Wu

We study envy-free allocations of indivisible goods to agents in settings where each agent is unaware of the goods allocated to other agents. In particular, we propose the maximin aware (MMA) fairness measure, which guarantees that every agent, given the bundle allocated to her, is aware that she does not envy at least one other agent, even if she does not know how the other goods are distributed among other agents. We also introduce two of its relaxations, and discuss their egalitarian guarantee and existence. Finally, we present a polynomial-time algorithm, which computes an allocation that approximately satisfies MMA or its relaxations. Interestingly, the returned allocation is also 1/2-approximate EFX when all agents have sub- additive valuations, which improves the algorithm in [Plaut and Roughgarden, 2018].

AAMAS Conference 2019 Conference Paper

Maxmin Share Fair Allocation of Indivisible Chores to Asymmetric Agents

  • Haris Aziz
  • Hau Chan
  • Bo Li

We initiate the study of indivisible chore allocation for agents with asymmetric shares. The fairness concepts we focus on are natural generalizations of maxmin share: WMMS fairness and OWMMS fairness. We first highlight the fact that commonly-used algorithms that work well for allocation of goods to asymmetric agents, and even for chores to symmetric agents do not provide good approximations for allocation of chores to asymmetric agents under WMMS. As a consequence, we present a novel polynomial-time constantapproximation algorithm, via linear program, for OWMMS. For two special cases: binary valuation case and 2-agent case, we provide exact or better constant-approximation algorithms.

AAMAS Conference 2019 Conference Paper

Mechanism Design with Unstructured Beliefs

  • Bo Li

Mechanism design is the task to design algorithms, toward desired objectives, that is robust to potential manipulation by strategic players. Traditionally, it is assumed that the mechanism designer and the players in the economy share some common knowledge. However, as pointed out by Wilson, such common knowledge is “rarely present in experiments and never in practice”, and “only by repeated weakening of common knowledge assumptions will the theory approximate reality. ” In the work, we mainly focus on designing resilient mechanisms that work properly even in such a less foreseeable environment. Bayesian auction design is a very flourishing topic in the field of mechanism design, where an important simplifying assumption is both the seller and the players know the exact distributions of all players’ valuations. In this work we first consider the query complexity of Bayesian mechanisms, where we only allow the seller to have limited oracle accesses to the players’ value distributions via simple queries. Then we further weaken the assumption by considering an information structure where the knowledge about the distributions can be arbitrarily scattered among the players. In both of these two unstructured information settings, we design mechanisms that are constant approximations to the optimal Bayesian mechanisms with full information. Finally, we study an envy-free allocation problem that the unstructured beliefs need to be taken into consideration. In particular, we model an environment where each player is unaware of the bundles (or allocated items) of other players, but still knows he does not receive the worst bundle. We present both conceptual and algorithmic results for this new envy-free allocation domain.

NeurIPS Conference 2019 Conference Paper

Multi-source Domain Adaptation for Semantic Segmentation

  • Sicheng Zhao
  • Bo Li
  • Xiangyu Yue
  • Yang Gu
  • Pengfei Xu
  • Runbo Hu
  • Hua Chai
  • Kurt Keutzer

Simulation-to-real domain adaptation for semantic segmentation has been actively studied for various applications such as autonomous driving. Existing methods mainly focus on a single-source setting, which cannot easily handle a more practical scenario of multiple sources with different distributions. In this paper, we propose to investigate multi-source domain adaptation for semantic segmentation. Specifically, we design a novel framework, termed Multi-source Adversarial Domain Aggregation Network (MADAN), which can be trained in an end-to-end manner. First, we generate an adapted domain for each source with dynamic semantic consistency while aligning at the pixel-level cycle-consistently towards the target. Second, we propose sub-domain aggregation discriminator and cross-domain cycle discriminator to make different adapted domains more closely aggregated. Finally, feature-level alignment is performed between the aggregated domain and target domain while training the segmentation network. Extensive experiments from synthetic GTA and SYNTHIA to real Cityscapes and BDDS datasets demonstrate that the proposed MADAN model outperforms state-of-the-art approaches. Our source code is released at: https: //github. com/Luodian/MADAN.

IROS Conference 2019 Conference Paper

On Enhancing Ground Surface Detection from Sparse Lidar Point Cloud

  • Bo Li

Ground surface detection in point cloud is widely used as a key module in autonomous driving systems. Different from previous approaches which are mostly developed for lidars with high beam resolution, e. g. Velodyne HDL-64, this paper proposes ground detection techniques applicable to much sparser point cloud captured by lidars with low beam resolution, e. g. Velodyne VLP-16. The approach is based on the RANSAC scheme of plane fitting. Inlier verification for plane hypotheses is enhanced by exploiting the point-wise tangent, which is a local feature available to compute regardless of the density of lidar beams. Ground surface which is not perfectly planar is fitted by multiple (specifically 4 in our implementation) disjoint plane regions. By assuming these plane regions to be rectanglar and exploiting the integral image technique, our approach approximately finds the optimal region partition and plane hypotheses under the RANSAC scheme with real-time computational complexity.

IJCAI Conference 2019 Conference Paper

Robustra: Training Provable Robust Neural Networks over Reference Adversarial Space

  • Linyi Li
  • Zexuan Zhong
  • Bo Li
  • Tao Xie

Machine learning techniques, especially deep neural networks (DNNs), have been widely adopted in various applications. However, DNNs are recently found to be vulnerable against adversarial examples, i. e. , maliciously perturbed inputs that can mislead the models to make arbitrary prediction errors. Empirical defenses have been studied, but many of them can be adaptively attacked again. Provable defenses provide provable error bound of DNNs, while such bound so far is far from satisfaction. To address this issue, in this paper, we present our approach named Robustra for effectively improving the provable error bound of DNNs. We leverage the adversarial space of a reference model as the feasible region to solve the min-max game between the attackers and defenders. We solve its dual problem by linearly approximating the attackers' best strategy and utilizing the monotonicity of the slack variables introduced by the reference model. The evaluation results show that our approach can provide significantly better provable adversarial error bounds on MNIST and CIFAR10 datasets, compared to the state-of-the-art results. In particular, bounded by L^infty, with epsilon = 0. 1, on MNIST we reduce the error bound from 2. 74% to 2. 09%; with epsilon = 0. 3, we reduce the error bound from 24. 19% to 16. 91%.

IJCAI Conference 2019 Conference Paper

Strategyproof and Approximately Maxmin Fair Share Allocation of Chores

  • Haris Aziz
  • Bo Li
  • Xiaowei Wu

We initiate the work on fair and strategyproof allocation of indivisible chores. The fairness concept we consider in this paper is maxmin share (MMS) fairness. We consider three previously studied models of information elicited from the agents: the ordinal model, the cardinal model, and the public ranking model in which the ordinal preferences are publicly known. We present both positive and negative results on the level of MMS approximation that can be guaranteed if we require the algorithm to be strategyproof. Our results uncover some interesting contrasts between the approximation ratios achieved for chores versus goods.

AAAI Conference 2019 Conference Paper

SuperVAE: Superpixelwise Variational Autoencoder for Salient Object Detection

  • Bo Li
  • Zhengxing Sun
  • Yuqi Guo

Image saliency detection has recently witnessed rapid progress due to deep neural networks. However, there still exist many important problems in the existing deep learning based methods. Pixel-wise convolutional neural network (CNN) methods suffer from blurry boundaries due to the convolutional and pooling operations. While region-based deep learning methods lack spatial consistency since they deal with each region independently. In this paper, we propose a novel salient object detection framework using a superpixelwise variational autoencoder (SuperVAE) network. We first use VAE to model the image background and then separate salient objects from the background through the reconstruction residuals. To better capture semantic and spatial contexts information, we also propose a perceptual loss to take advantage from deep pre-trained CNNs to train our SuperVAE network. Without the supervision of mask-level annotated data, our method generates high quality saliency results which can better preserve object boundaries and maintain the spatial consistency. Extensive experiments on five wildly-used benchmark datasets show that the proposed method achieves superior or competitive performance compared to other algorithms including the very recent state-of-the-art supervised methods.

AAAI Conference 2019 Conference Paper

Uncovering Specific-Shape Graph Anomalies in Attributed Graphs

  • Nannan Wu
  • Wenjun Wang
  • Feng Chen
  • Jianxin Li
  • Bo Li
  • Jinpeng Huai

As networks are ubiquitous in the modern era, point anomalies have been changed to graph anomalies in terms of anomaly shapes. However, the specific-shape priors about anomalous subgraphs of interest are seldom considered by the traditional approaches when detecting the subgraphs in attributed graphs (e. g. , computer networks, Bitcoin networks, and etc.). This paper proposes a nonlinear approach to specific-shape graph anomaly detection. The nonlinear approach focuses on optimizing a broad class of nonlinear cost functions via specific-shape constraints in attributed graphs. Our approach can be used to many different graph anomaly settings. The traditional approaches can only support linear cost functions (e. g. , an aggregation function for the summation of node weights). However, our approach can employ more powerful nonlinear cost functions, and enjoys a rigorous theoretical guarantee on the near-optimal solution with the geometrical convergence rate.

IJCAI Conference 2019 Conference Paper

Weighted Maxmin Fair Share Allocation of Indivisible Chores

  • Haris Aziz
  • Hau Chan
  • Bo Li

We initiate the study of indivisible chore allocation for agents with asymmetric shares. The fairness concept we focus on is the weighted natural generalization of maxmin share: WMMS fairness and OWMMS fairness. We first highlight the fact that commonly-used algorithms that work well for allocation of goods to asymmetric agents, and even for chores to symmetric agents do not provide good approximations for allocation of chores to asymmetric agents under WMMS. As a consequence, we present a novel polynomial-time constant-approximation algorithm, via linear program, for OWMMS. For two special cases: the binary valuation case and the 2-agent case, we provide exact or better constant-approximation algorithms.

AAMAS Conference 2019 Conference Paper

Well-behaved Online Load Balancing Against Strategic Jobs

  • Bo Li
  • Minming Li
  • Xiaowei Wu

In the online load balancing problem on related machines, we have a set of jobs (with different sizes) arriving online, and we need to assign each job to a machine immediately upon its arrival, so as to minimize the makespan, i. e. , the maximum completion time. In classic mechanism design problems, we assume that the jobs are controlled by selfish agents, with the sizes being their private information. Each job (agent) aims at minimizing its own cost, which is its completion time plus the payment charged by the mechanism. Truthful mechanisms guaranteeing that every job minimizes its cost by reporting its true size have been well-studied [Aspnes et al. JACM 1997, Feldman et al. EC 2017]. In this paper, we study truthful online load balancing mechanisms that are well-behaved [Epstein et al. , MOR 2016]. Wellbehavior is important as it guarantees fairness between machines, and implies truthfulness in some cases when machines are controlled by selfish agents. Unfortunately, existing truthful online load balancing mechanisms are not well-behaved. We first show that to guarantee producing a well-behaved schedule, any online algorithm (even non-truthful) has a competitive ratio at least Ω( √ m), wherem is the number of machines. Then we propose a mechanism that guarantees truthfulness of the online jobs, and produces a schedule that is almost well-behaved. We show that our algorithm has a competitive ratio of O(logm). Moreover, for the case when the sizes of online jobs are bounded, the competitive ratio of our algorithm improves to O(1). Interestingly, we show several cases for which our mechanism is actually truthful against selfish machines.

AAAI Conference 2019 Conference Paper

“Bilingual Expert” Can Find Translation Errors

  • Kai Fan
  • Jiayi Wang
  • Bo Li
  • Fengming Zhou
  • Boxing Chen
  • Luo Si

The performances of machine translation (MT) systems are usually evaluated by the metric BLEU when the golden references are provided. However, in the case of model inference or production deployment, golden references are usually expensively available, such as human annotation with bilingual expertise. In order to address the issue of translation quality estimation (QE) without reference, we propose a general framework for automatic evaluation of the translation output for the QE task in the Conference on Statistical Machine Translation (WMT). We first build a conditional target language model with a novel bidirectional transformer, named neural bilingual expert model, which is pre-trained on large parallel corpora for feature extraction. For QE inference, the bilingual expert model can simultaneously produce the joint latent representation between the source and the translation, and real-valued measurements of possible erroneous tokens based on the prior knowledge learned from parallel data. Subsequently, the features will further be fed into a simple Bi-LSTM predictive model for quality estimation. The experimental results show that our approach achieves the state-of-the-art performance in most public available datasets of WMT 2017/2018 QE task.

IJCAI Conference 2018 Conference Paper

Dynamic Fair Division Problem with General Valuations

  • Bo Li
  • Wenyang Li
  • Yingkai Li

In this paper, we focus on how to dynamically allocate a divisible resource fairly among n players who arrive and depart over time. The players may have general heterogeneous valuations over the resource. It is known that the exact envy-free and proportional allocations may not exist in the dynamic setting [Walsh, 2011]. Thus, we will study to what extent we can guarantee the fairness in the dynamic setting. We first design two algorithms which are O(log n)-proportional and O(n)-envy-free for the setting with general valuations, and by constructing the adversary instances such that all dynamic algorithms must be at least Omega(1)-proportional and Omega(n/log n)-envy-free, we show that the bounds are tight up to a logarithmic factor. Moreover, we introduce the setting where the players' valuations are uniform on the resource but with different demands, which generalize the setting of [Friedman et al. , 2015]. We prove an O(log n) upper bound and a tight lower bound for this case.

IJCAI Conference 2018 Conference Paper

Generating Adversarial Examples with Adversarial Networks

  • Chaowei Xiao
  • Bo Li
  • Jun-Yan Zhu
  • Warren He
  • Mingyan Liu
  • Dawn Song

Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial exam- ples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply Adv- GAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92. 76% accuracy on a public MNIST black-box attack challenge.

AAAI Conference 2018 Conference Paper

Orthogonal Weight Normalization: Solution to Optimization Over Multiple Dependent Stiefel Manifolds in Deep Neural Networks

  • Lei Huang
  • Xianglong Liu
  • Bo Lang
  • Adams Yu
  • Yongliang Wang
  • Bo Li

Orthogonal matrix has shown advantages in training Recurrent Neural Networks (RNNs), but such matrix is limited to be square for the hidden-to-hidden transformation in RNNs. In this paper, we generalize such square orthogonal matrix to orthogonal rectangular matrix and formulating this problem in feed-forward Neural Networks (FNNs) as Optimization over Multiple Dependent Stiefel Manifolds (OMDSM). We show that the orthogonal rectangular matrix can stabilize the distribution of network activations and regularize FNNs. We propose a novel orthogonal weight normalization method to solve OMDSM. Particularly, it constructs orthogonal transformation over proxy parameters to ensure the weight matrix is orthogonal. To guarantee stability, we minimize the distortions between proxy parameters and canonical weights over all tractable orthogonal transformations. In addition, we design orthogonal linear module (OLM) to learn orthogonal filter banks in practice, which can be used as an alternative to standard linear module. Extensive experiments demonstrate that by simply substituting OLM for standard linear module without revising any experimental protocols, our method improves the performance of the state-of-the-art networks, including Inception and residual networks on CIFAR and ImageNet datasets.

IROS Conference 2017 Conference Paper

3D fully convolutional network for vehicle detection in point cloud

  • Bo Li

2D fully convolutional network has been recently successfully applied to the object detection problem on images. In this paper, we extend the fully convolutional network based detection techniques to 3D and apply it to point cloud data. The proposed approach is verified on the task of vehicle detection from lidar point cloud for autonomous driving. Experiments on the KITTI dataset shows significant performance improvement over the previous point cloud based detection approaches.

AAAI Conference 2017 Conference Paper

Engineering Agreement: The Naming Game with Asymmetric and Heterogeneous Agents

  • Jie Gao
  • Bo Li
  • Grant Schoenebeck
  • Fang-Yi Yu

Being popular in language evolution, cognitive science, and culture dynamics, the Naming Game has been widely used to analyze how agents reach global consensus via communications in multi-agent systems. Most prior work considered networks that are symmetric and homogeneous (e. g. , vertex transitive). In this paper we consider asymmetric or heterogeneous settings that complement the current literature: 1) we show that increasing asymmetry in network topology can improve convergence rates. The star graph empirically converges faster than all previously studied graphs; 2) we consider graph topologies that are particularly challenging for naming game such as disjoint cliques or multi-level trees and ask how much extra homogeneity (random edges) is required to allow convergence or fast convergence. We provided theoretical analysis which was confirmed by simulations; 3) we analyze how consensus can be manipulated when stubborn nodes are introduced at different points of the process. Early introduction of stubborn nodes can easily influence the outcome in certain family of networks while late introduction of stubborn nodes has much less power.

IJCAI Conference 2017 Conference Paper

Query-Driven Discovery of Anomalous Subgraphs in Attributed Graphs

  • Nannan Wu
  • Feng Chen
  • Jianxin Li
  • Jinpeng Huai
  • Bo Li

For a detection problem, a user often has some prior knowledge about the structure-specific subgraphs of interest, but few traditional approaches are capable of employing this knowledge. The main technical challenge is that few approaches can efficiently model the space of connected subgraphs that are isomorphic to a query graph. We present a novel, efficient approach for optimizing a generic nonlinear cost function subject to a query-specific structural constraint. Our approach enjoys strong theoretical guarantees on the convergence of a nearly optimal solution and a low time complexity. For the case study, we specialize the nonlinear function to several well-known graph scan statistics for anomalous subgraph discovery. Empirical evidence demonstrates that our method is superior to state-of-the-art methods in several real-world anomaly detection tasks.

AAAI Conference 2017 Conference Paper

Treatment Effect Estimation with Data-Driven Variable Decomposition

  • Kun Kuang
  • Peng Cui
  • Bo Li
  • Meng Jiang
  • Shiqiang Yang
  • Fei Wang

One fundamental problem in causal inference is the treatment effect estimation in observational studies when variables are confounded. Control for confounding effect is generally handled by propensity score. But it treats all observed variables as confounders and ignores the adjustment variables, which have no influence on treatment but are predictive of the outcome. Recently, it has been demonstrated that the adjustment variables are effective in reducing the variance of estimated treatment effect. However, how to automatically separate the confounders and adjustment variables in observational studies is still an open problem, especially in the scenarios of high dimensional variables, which are common in big data era. In this paper, we propose a Data-Driven Variable Decomposition (D2 VD) algorithm, which can 1) automatically separate confounders and adjustment variables with a data driven approach, and 2) simultaneously estimate treatment effect in observational studies with high dimensional variables. Under standard assumptions, we show experimentally that our D2 VD algorithm can automatically separate the variables precisely, and estimate treatment effect more accurately and with tighter confidence intervals than the state-of-the-art methods on both synthetic data and real online advertising dataset.

AAAI Conference 2016 Conference Paper

Behavioral Experiments in Email Filter Evasion

  • Liyiming Ke
  • Bo Li
  • Yevgeniy Vorobeychik

Despite decades of effort to combat spam, unwanted and even malicious emails, such as phish which aim to deceive recipients into disclosing sensitive information, still routinely find their way into one’s mailbox. To be sure, email filters manage to stop a large fraction of spam emails from ever reaching users, but spammers and phishers have mastered the art of filter evasion, or manipulating the content of email messages to avoid being filtered. We present a unique behavioral experiment designed to study email filter evasion. Our experiment is framed in somewhat broader terms: given the widespread use of machine learning methods for distinguishing spam and non-spam, we investigate how human subjects manipulate a spam template to evade a classification-based filter. We find that adding a small amount of noise to a filter significantly reduces the ability of subjects to evade it, observing that noise does not merely have a short-term impact, but also degrades evasion performance in the longer term. Moreover, we find that greater coverage of an email template by the classifier (filter) features significantly increases the difficulty of evading it. This observation suggests that aggressive feature reduction—a common practice in applied machine learning—can actually facilitate evasion. In addition to the descriptive analysis of behavior, we develop a synthetic model of human evasion behavior which closely matches observed behavior and effectively replicates experimental findings in simulation.

NeurIPS Conference 2016 Conference Paper

Data Poisoning Attacks on Factorization-Based Collaborative Filtering

  • Bo Li
  • Yining Wang
  • Aarti Singh
  • Yevgeniy Vorobeychik

Recommendation and collaborative filtering systems are important in modern information and e-commerce applications. As these systems are becoming increasingly popular in industry, their outputs could affect business decision making, introducing incentives for an adversarial party to compromise the availability or integrity of such systems. We introduce a data poisoning attack on collaborative filtering systems. We demonstrate how a powerful attacker with full knowledge of the learner can generate malicious data so as to maximize his/her malicious objectives, while at the same time mimicking normal user behaviors to avoid being detected. While the complete knowledge assumption seems extreme, it enables a robust assessment of the vulnerability of collaborative filtering schemes to highly motivated attacks. We present efficient solutions for two popular factorization-based collaborative filtering algorithms: the alternative minimization formulation and the nuclear norm minimization method. Finally, we test the effectiveness of our proposed algorithms on real-world data and discuss potential defensive strategies.

NeurIPS Conference 2014 Conference Paper

Feature Cross-Substitution in Adversarial Classification

  • Bo Li
  • Yevgeniy Vorobeychik

The success of machine learning, particularly in supervised settings, has led to numerous attempts to apply it in adversarial settings such as spam and malware detection. The core challenge in this class of applications is that adversaries are not static data generators, but make a deliberate effort to evade the classifiers deployed to detect them. We investigate both the problem of modeling the objectives of such adversaries, as well as the algorithmic problem of accounting for rational, objective-driven adversaries. In particular, we demonstrate severe shortcomings of feature reduction in adversarial settings using several natural adversarial objective functions, an observation that is particularly pronounced when the adversary is able to substitute across similar features (for example, replace words with synonyms or replace letters in words). We offer a simple heuristic method for making learning more robust to feature cross-substitution attacks. We then present a more general approach based on mixed-integer linear programming with constraint generation, which implicitly trades off overfitting and feature selection in an adversarial setting using a sparse regularizer along with an evasion model. Our approach is the first method for combining an adversarial classification algorithm with a very general class of models of adversarial classifier evasion. We show that our algorithmic approach significantly outperforms state-of-the-art alternatives.

IROS Conference 2010 Conference Paper

Keyframe detection for appearance-based visual SLAM

  • Hong Zhang 0013
  • Bo Li
  • Dan Yang 0001

This paper is concerned with the problem of keyframe detection in appearance-based visual SLAM. Appearance SLAM models a robot's environment topologically by a graph whose nodes represent strategically interesting places that have been visited by the robot and whose arcs represent spatial connectivity between these places. Specifically, we discuss and compare various methods for identifying the next location that is sufficiently different visually from the previously visited location or node in the map graph in order to decide whether a new node should be created. We survey existing techniques of keyframe detection in image retrieval and video analysis. Using experimental results obtained from visual SLAM datasets, we conclude that the feature matching method offers the best performance among five representative methods in terms of accurately measuring the amount of appearance change between robot's views and thus can serve as a simple and effective metric for detecting keyframes. This study fills an important but missing step in the current appearance SLAM research.

ICRA Conference 1987 Conference Paper

Optimal design of multiple arithmetic processor-based robot controllers

  • Shaheen Ahmad
  • Bo Li

In this paper we discuss preliminary design considerations for the optimal design of multiple-apu (arithmetic processing unit) based robot controllers. We justify this design interms of its ability to adopt to various different control, kinematic and trajectory computation methods which are being developed. We then show that with eight apu's, it is possible to compute the inverse kinematics, inverse dynamics and the trajectory for the PUMA arm in less then 3ms. In this design we assume the floating point processing times of a relatively slow 16. 7 MHZ 68881.