Arrow Research search

Author name cluster

Ding Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers
2 author rows

Possible papers

20

TIST Journal 2026 Journal Article

Clue and Context Fusion for Sarcasm Detection with Large Multimodal Models

  • Qiuyu Li
  • Yushan Pan
  • Ding Wang
  • Wei Wang
  • Xiaowei Huang
  • Zhijie Xu

Detecting sarcasm in social media is fundamentally different from general VLM benchmarks: it is a pragmatic contradiction problem in which the literal signal in one modality is intentionally misaligned with the intended meaning, while dominant pre-training (e.g., CLIP-style contrastive agreement) biases models toward modality alignment rather than incongruity detection. We present SCARF, a contradiction-aware framework that equips large multimodal models with explicit sarcasm cues and context-sensitive retrieval. SCARF constructs coarse scene cues and fine localized evidence via tag-constrained QA, then distills them with visual tokens into a [FUSION] control vector for the LLM; a label-contrastive retriever supplies type- and context-matched exemplars, and a local multi-view encoder surfaces micro-cues. With the same backbone and training data, SCARF attains 87.92% Acc/86.67% F1 on MMSD2.0 and 77.14% Acc/76.44% F1 zero-shot on XDMSD, outperforming a comparably fine-tuned LLaVA-1.5. Ablations show sarcasm clue fusion is the main driver of gains, and tag-constrained QA improves rationale grounding and reduces hallucinations.

TMLR Journal 2026 Journal Article

Decoding Safety Feedback from Diverse Raters: A Data-driven Lens on Responsiveness to Severity

  • Pushkar Mishra
  • Charvi Rastogi
  • Stephen R Pfohl
  • Alicia Parrish
  • Tian Huey Teh
  • Roma Patel
  • Mark Diaz
  • Ding Wang

Ensuring the safety of Generative AI requires a nuanced understanding of pluralistic viewpoints. In this paper, we introduce a novel data-driven approach for analyzing ordinal safety ratings in pluralistic settings. Specifically, we address the challenge of interpreting nuanced differences in safety feedback from a diverse population expressed via ordinal scales (e.g., a Likert scale). We define non-parametric responsiveness metrics that quantify how raters convey broader distinctions and granular variations in the severity of safety violations. Leveraging publicly available datasets of pluralistic safety feedback as our case studies, we investigate how raters from different demographic groups use an ordinal scale to express their perceptions of the severity of violations. We apply our metrics across violation types, demonstrating their utility in extracting nuanced insights that are crucial for aligning AI systems reliably in multi-cultural contexts. We show that our approach can inform rater selection and feedback interpretation by capturing nuanced viewpoints across different demographic groups, hence improving the quality of pluralistic data collection and in turn contributing to more robust AI alignment.

AAAI Conference 2026 Conference Paper

Latent Knowledge-Guided Video Diffusion for Scientific Phenomena Generation from a Single Initial Frame

  • Qinglong Cao
  • Xirui Li
  • Ding Wang
  • Chao Ma
  • Yuntian Chen
  • Xiaokang Yang

Video diffusion models have achieved impressive results in natural scene generation, yet they struggle to generalize to scientific phenomena such as fluid simulations and meteorological processes, where underlying dynamics are governed by scientific laws. These tasks pose unique challenges, including severe domain gaps, limited training data, and the lack of descriptive language annotations. To handle this dilemma, we extracted the latent scientific phenomena knowledge and further proposed a fresh framework that teaches video diffusion models to generate scientific phenomena from a single initial frame. Particularly, static knowledge is extracted via pre-trained masked autoencoders, while dynamic knowledge is derived from pre-trained optical flow prediction. Subsequently, based on the aligned spatial relations between the CLIP vision and language encoders, the visual embeddings of scientific phenomena, guided by latent scientific phenomena knowledge, are projected to generate the pseudo-language prompt embeddings in both spatial and frequency domains. By incorporating these prompts and fine-tuning the video diffusion model, we enable the generation of videos that better adhere to scientific laws. Extensive experiments on both computational fluid dynamics simulations and real-world typhoon observations demonstrate the effectiveness of our approach, achieving superior fidelity and consistency across diverse scientific scenarios.

AAAI Conference 2026 Conference Paper

LeanRAG: Knowledge-Graph-Based Generation with Semantic Aggregation and Hierarchical Retrieval

  • Yaoze Zhang
  • Rong Wu
  • Pinlong Cai
  • Xiaoman Wang
  • Guohang Yan
  • Song Mao
  • Ding Wang
  • Botian Shi

Retrieval-Augmented Generation (RAG) plays a crucial role in grounding Large Language Models by leveraging external knowledge, whereas the effectiveness is often compromised by the retrieval of contextually flawed or incomplete information. To address this, knowledge graph-based RAG methods have evolved towards hierarchical structures, organizing knowledge into multi-level summaries. However, these approaches still suffer from two critical, unaddressed challenges: high-level conceptual summaries exist as disconnected ``semantic islands'', lacking the explicit relations needed for cross-community reasoning; and the retrieval process itself remains structurally unaware, often degenerating into an inefficient flat search that fails to exploit the graph's rich topology. To overcome these limitations, we introduce LeanRAG, a framework that features a deeply collaborative design combining knowledge aggregation and retrieval strategies. LeanRAG first employs a novel semantic aggregation algorithm that forms entity clusters and constructs new explicit relations among aggregation-level summaries, creating a fully navigable semantic network. Then, a bottom-up, structure-guided retrieval strategy anchors queries to the most relevant fine-grained entities and then systematically traverses the graph's semantic pathways to gather concise yet contextually comprehensive evidence sets. The LeanRAG can mitigate the substantial overhead associated with path retrieval on graphs and minimize redundant information retrieval. Extensive experiments on four challenging QA benchmarks with different domains demonstrate that LeanRAG significantly outperforms existing methods in response quality while reducing 46% retrieval redundancy.

AAAI Conference 2025 Conference Paper

Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis

  • Weikai Li
  • Ding Wang
  • Zijian Ding
  • Atefeh Sohrabizadeh
  • Zongyue Qin
  • Jason Cong
  • Yizhou Sun

High-level synthesis (HLS) is a widely used tool in designing Field Programmable Gate Array (FPGA). HLS enables FPGA design with software programming languages by compiling the source code into an FPGA circuit. The source code includes a program (called ``kernel'') and several pragmas that instruct hardware synthesis, such as parallelization, pipeline, etc. While it is relatively easy for software developers to design the program, it heavily relies on hardware knowledge to design the pragmas, posing a big challenge for software developers. Recently, different machine learning algorithms, such as GNNs, have been proposed to automate the pragma design via performance prediction. However, when applying the trained model on new kernels, the significant domain shift often leads to unsatisfactory performance. We propose a more domain-generalizable model structure: a two-level hierarchical Mixture of Experts (MoE), that can be flexibly adapted to any GNN model. Different expert networks can learn to deal with different regions in the representation space, and they can utilize similar patterns between the old kernels and new kernels. In the low-level MoE, we apply MoE on three natural granularities of a program: node, basic block, and graph. The high-level MoE learns to aggregate the three granularities for the final decision. To stably train the hierarchical MoE, we further propose a two-stage training method. Extensive experiments verify the effectiveness of the hierarchical MoE.

IROS Conference 2025 Conference Paper

Improved Calibration for Panoramic Annular Lens Systems with Angular Modulation

  • Ding Wang
  • Junhua Wang
  • Yuhan Tian
  • Min Xu
  • Lingbao Kong

This paper addresses the challenges of calibrating Panoramic Annular Lens (PAL) systems, which exhibit unique projection characteristics due to their imaging relationship designed to compress blind zones. Traditional camera calibration methods often fail to accurately capture these properties. To resolve this limitation, we propose a novel projection model that incorporates angular modulation, enabling a more accurate representation of the PAL system’s imaging process. This formulation significantly improves the model’s ability to describe the relationship between object space and image space. We evaluate our approach on both synthetic and real-world datasets tailored for PAL cameras. Experimental results demonstrate that the model achieves sub-pixel accuracy, with reprojection errors typically ranging from 0. 1 to 0. 3 pixels on 2048×2048 images when using five distortion terms. This level of precision surpasses existing calibration models for panoramic cameras, making our method particularly suitable for high-accuracy applications. The datasets used in this study are publicly available at https://github.com/wwendy233/PALcalib.

NeurIPS Conference 2025 Conference Paper

Whose View of Safety? A Deep DIVE Dataset for Pluralistic Alignment of Text-to-Image Models

  • Charvi Rastogi
  • Tian Huey Teh
  • Pushkar Mishra
  • Roma Patel
  • Ding Wang
  • Mark Díaz
  • Alicia Parrish
  • Aida Mostafazadeh Davani

Current text-to-image (T2I) models often fail to account for diverse human experiences, leading to misaligned systems. We advocate for pluralism in AI alignment, where an AI understands and is steerable towards diverse, and often conflicting, human values. Our work provides three core contributions to achieve this in T2I models. First, we introduce a novel dataset for Diverse Intersectional Visual Evaluation (DIVE) -- the first multimodal dataset for pluralistic alignment. It enables deep alignment to diverse safety perspectives through a large pool of demographically intersectional human raters who provided extensive feedback across 1000 prompts, with high replication, capturing nuanced safety perceptions. Second, we empirically confirm demographics as a crucial proxy for diverse viewpoints in this domain, revealing significant, context-dependent differences in harm perception that diverge from conventional evaluations. Finally, we discuss implications for building aligned T2I models, including efficient data collection strategies, LLM judgment capabilities, and model steerability towards diverse perspectives. This research offers foundational tools for more equitable and aligned T2I systems. Content Warning: The paper includes sensitive content that may be harmful.

NeurIPS Conference 2024 Conference Paper

Beyond Efficiency: Molecular Data Pruning for Enhanced Generalization

  • Dingshuo Chen
  • Zhixun Li
  • Yuyan Ni
  • Guibin Zhang
  • Ding Wang
  • Qiang Liu
  • Shu Wu
  • Jeffrey X. Yu

With the emergence of various molecular tasks and massive datasets, how to perform efficient training has become an urgent yet under-explored issue in the area. Data pruning (DP), as an oft-stated approach to saving training burdens, filters out less influential samples to form a coreset for training. However, the increasing reliance on pretrained models for molecular tasks renders traditional in-domain DP methods incompatible. Therefore, we propose a Mol ecular data P runing framework for e nhanced G eneralization ( MolPeg ), which focuses on the source-free data pruning scenario, where data pruning is applied with pretrained models. By maintaining two models with different updating paces during training, we introduce a novel scoring function to measure the informativeness of samples based on the loss discrepancy. As a plug-and-play framework, MolPeg realizes the perception of both source and target domain and consistently outperforms existing DP methods across four downstream tasks. Remarkably, it can surpass the performance obtained from full-dataset training, even when pruning up to 60-70% of the data on HIV and PCBA dataset. Our work suggests that the discovery of effective data-pruning metrics could provide a viable path to both enhanced efficiency and superior generalization in transfer learning.

NeurIPS Conference 2023 Conference Paper

DICES Dataset: Diversity in Conversational AI Evaluation for Safety

  • Lora Aroyo
  • Alex Taylor
  • Mark Díaz
  • Christopher Homan
  • Alicia Parrish
  • Gregory Serapio-García
  • Vinodkumar Prabhakaran
  • Ding Wang

Machine learning approaches often require training and evaluation datasets with a clear separation between positive and negative examples. This requirement overly simplifies the natural subjectivity present in many tasks, and obscures the inherent diversity in human perceptions and opinions about many content items. Preserving the variance in content and diversity in human perceptions in datasets is often quite expensive and laborious. This is especially troubling when building safety datasets for conversational AI systems, as safety is socio-culturally situated in this context. To demonstrate this crucial aspect of conversational AI safety, and to facilitate in-depth model performance analyses, we introduce the DICES (Diversity In Conversational AI Evaluation for Safety) dataset that contains fine-grained demographics information about raters, high replication of ratings per item to ensure statistical power for analyses, and encodes rater votes as distributions across different demographics to allow for in-depth explorations of different aggregation strategies. The DICES dataset enables the observation and measurement of variance, ambiguity, and diversity in the context of safety for conversational AI. We further describe a set of metrics that show how rater diversity influences safety perception across different geographic regions, ethnicity groups, age groups, and genders. The goal of the DICES dataset is to be used as a shared resource and benchmark that respects diverse perspectives during safety evaluation of conversational AI systems.

ICML Conference 2021 Conference Paper

Learning Deep Neural Networks under Agnostic Corrupted Supervision

  • Boyang Liu
  • Mengying Sun
  • Ding Wang
  • Pang-Ning Tan
  • Jiayu Zhou

Training deep neural network models in the presence of corrupted supervision is challenging as the corrupted data points may significantly impact generalization performance. To alleviate this problem, we present an efficient robust algorithm that achieves strong guarantees without any assumption on the type of corruption and provides a unified framework for both classification and regression problems. Unlike many existing approaches that quantify the quality of the data points (e. g. , based on their individual loss values), and filter them accordingly, the proposed algorithm focuses on controlling the collective impact of data points on the average gradient. Even when a corrupted data point failed to be excluded by our algorithm, the data point will have a very limited impact on the overall loss, as compared with state-of-the-art filtering methods based on loss values. Extensive experiments on multiple benchmark datasets have demonstrated the robustness of our algorithm under different types of corruption. Our code is available at \url{https: //github. com/illidanlab/PRL}.

IJCAI Conference 2021 Conference Paper

RCA: A Deep Collaborative Autoencoder Approach for Anomaly Detection

  • Boyang Liu
  • Ding Wang
  • Kaixiang Lin
  • Pang-Ning Tan
  • Jiayu Zhou

Unsupervised anomaly detection plays a crucial role in many critical applications. Driven by the success of deep learning, recent years have witnessed growing interests in applying deep neural networks (DNNs) to anomaly detection problems. A common approach is using autoencoders to learn a feature representation for the normal observations in the data. The reconstruction error of the autoencoder is then used as outlier scores to detect the anomalies. However, due to the high complexity brought upon by the over-parameterization of DNNs, the reconstruction error of the anomalies could also be small, which hampers the effectiveness of these methods. To alleviate this problem, we propose a robust framework using collaborative autoencoders to jointly identify normal observations from the data while learning its feature representation. We investigate the theoretical properties of the framework and empirically show its outstanding performance as compared to other DNN-based methods. Our experimental results also show the resiliency of the framework to missing values compared to other baseline methods.

AAAI Conference 2020 Conference Paper

OMuLeT: Online Multi-Lead Time Location Prediction for Hurricane Trajectory Forecasting

  • Ding Wang
  • Boyang Liu
  • Pang-Ning Tan
  • Lifeng Luo

Hurricanes are powerful tropical cyclones with sustained wind speeds ranging from at least 74 mph (for category 1 storms) to more than 157 mph (for category 5 storms). Accurate prediction of the storm tracks is essential for hurricane preparedness and mitigation of storm impacts. In this paper, we cast the hurricane trajectory forecasting task as an online multi-lead time location prediction problem and present a framework called OMuLeT to improve path prediction by combining the 6-hourly and 12-hourly forecasts generated from an ensemble of dynamical (physical) hurricane models. OMuLeT employs an online learning with restart strategy to incrementally update the weights of the ensemble model combination as new observation data become available. It can also handle the varying dynamical models available for predicting the trajectories of different hurricanes. Experimental results using the Atlantic and Eastern Pacific hurricane data showed that OMuLeT significantly outperforms various baseline methods, including the official forecasts produced by the U. S. National Hurricane Center (NHC), by more than 10% in terms of its 48-hour lead time forecasts.

ICRA Conference 2007 Conference Paper

Behavior Based Adaptive Control for Autonomous Oceanographic Sampling

  • Donald P. Eickstedt
  • Michael R. Benjamin
  • Ding Wang
  • Joseph A. Curcio
  • Henrik Schmidt

This paper describes an investigation into the adaptive control of autonomous mobile sensor platforms for providing oceanographic sampling. Mobile sensor platforms provide an ability to rapidly sample oceanographic data of interest for real-time input into ocean environmental models with the goal of reducing the modeling uncertainty by introducing selected sampled data. The major objective of this paper is to describe the autonomy architecture developed to support adaptive sampling. This architecture consists of an open-source distributed autonomy architecture and an approach to behavior-based control of autonomous vehicles using multiple objective functions that allows reactive control in complex environments with multiple constraints. Experimental results are provided for an adaptive ocean thermal gradient tracking application performed by an autonomous surface craft in Monterey Bay. These results highlight not only the suitability of autonomous sensor platforms for providing adaptive sampling of the ocean environment but, also, the suitability of our behavior-based autonomy approach and distributed autonomy architecture for providing a simple, flexible, and scalable method for autonomous sensor platform control. The paper concludes with an overview of future adaptive sampling experiments planned with autonomous underwater sensor platforms using the same methodology.