Arrow Research search

Author name cluster

Tong Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

31 papers
2 author rows

Possible papers

31

AAAI Conference 2026 Conference Paper

Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation

  • Jinxing Zhou
  • Yanghao Zhou
  • Mingfei Han
  • Tong Wang
  • Xiaojun Chang
  • Hisham Cholakkal
  • Rao Muhammad Anwer

Referring Audio-Visual Segmentation (Ref-AVS) aims to segment target objects in audible videos based on given reference expressions. Prior works typically rely on learning latent embeddings via multimodal fusion to prompt a tunable SAM/SAM2 decoder for segmentation, which requires strong pixel-level supervision and lacks interpretability. From a novel perspective of explicit reference understanding, we propose TGS-Agent, which decomposes the task into a Think-Ground-Segment process, mimicking the human reasoning procedure by first identifying the referred object through multimodal analysis, followed by coarse-grained grounding and precise segmentation. To this end, we first propose Ref-Thinker, a multimodal language model capable of reasoning over textual, visual, and auditory cues. We construct an instruction-tuning dataset with explicit object-aware think-answer chains for Ref-Thinker fine-tuning. The object description inferred by Ref-Thinker is used as an explicit prompt for Grounding-DINO and SAM2, which perform grounding and segmentation without relying on pixel-level supervision. Additionally, we introduce R2-AVSBench, a new benchmark with linguistically diverse and reasoning-intensive references for better evaluating model generalization. Our approach achieves state-of-the-art results on both standard Ref-AVSBench and proposed R2-AVSBench.

ICML Conference 2025 Conference Paper

Clients Collaborate: Flexible Differentially Private Federated Learning with Guaranteed Improvement of Utility-Privacy Trade-off

  • Yuecheng Li
  • Lele Fu
  • Tong Wang
  • Jian Lou 0001
  • Bin Chen 0011
  • Lei Yang 0030
  • Jian Shen
  • Zibin Zheng

To defend against privacy leakage of user data, differential privacy is widely used in federated learning, but it is not free. The addition of noise randomly disrupts the semantic integrity of the model and this disturbance accumulates with increased communication rounds. In this paper, we introduce a novel federated learning framework with rigorous privacy guarantees, named FedCEO, designed to strike a trade-off between model utility and user privacy by letting clients " C *ollaborate with E ach O ther ". Specifically, we perform efficient tensor low-rank proximal optimization on stacked local model parameters at the server, demonstrating its capability to flexibly truncate high-frequency components in spectral space. This capability implies that our FedCEO can effectively recover the disrupted semantic information by smoothing the global semantic space for different privacy settings and continuous training processes. Moreover, we improve the SOTA utility-privacy trade-off bound by order of $\sqrt{d}$, where $d$ is the input dimension. We illustrate our theoretical results with experiments on representative datasets and observe significant performance improvements and strict privacy guarantees under different privacy settings. The *code is available at https: //github. com/6lyc/FedCEO_Collaborate-with-Each-Other.

ICRA Conference 2025 Conference Paper

Efficient Cross-Boundary Grasping in Stacked Clutter with Single-Visual Mapping Multi-Step

  • Yudong Luo
  • Tong Wang
  • Feiyu Xie
  • Na Zhao 0008
  • Xianping Fu
  • Yantao Shen 0001

In logistics applications, the vision-based technology for grasping target objects in the air is relatively mature. However, when operating across the air and water such as grasping marine products from the water, the visual information collected by the camera will be disturbed by ripples and bubbles on the water surface, resulting in low grasping efficiency. Therefore, we introduce a grasping strategy based on single-visual mapping for multi-step (SVMMS) strategy to achieve cross-medium operations involving stacked objects. Specifically, we design a multifunctional integrated Deep Q-learning-based network model to extract visual features from the scene to effectively detect stacked objects and outputs their hierarchical relationships. Moreover, we quantify the underlying relationship between motion logic during action execution and changes in RGB-D during action execution to help the robot achieve efficient and collision-free operations. Our approach also incorporates a time-series design with prioritized experience replay to globally optimize the action sequence. Additionally, we propose a novel sim2real method by combining domain randomization to address the difference in object sizes between the simulation and the real world. Extensive experiments in both simulation and physical environments show that SVMMS-Grasp significantly outperforms existing methods in terms of task success rate, stability, and operational efficiency.

ICRA Conference 2025 Conference Paper

Explore the LiDAR-Camera Dynamic Adjustment Fusion for 3D Object Detection

  • Yiran Yang
  • Xu Gao
  • Tong Wang
  • Xin Hao
  • Yifeng Shi
  • Xiao Tan 0001
  • Xiaoqing Ye

Camera and LiDAR serve as informative sensors for accurate and robust autonomous driving systems. However, these sensors often exhibit heterogeneous natures, resulting in distributional modality gaps that present significant challenges for fusion. To address this, a robust fusion technique is crucial, particularly for enhancing 3D object detection. In this paper, we introduce a dynamic adjustment technology aimed at aligning modal distributions and learning effective modality representations to enhance the fusion process. Specifically, we propose a triphase domain aligning module. This module adjusts the feature distributions from both the camera and LiDAR, bringing them closer to the ground truth domain and minimizing differences. Additionally, we explore improved representation acquisition methods for dynamic fusion, which includes modal interaction and specialty enhancement. Finally, an adaptive learning technique that merges the semantics and geometry information for dynamical instance optimization. Extensive experiments in the nuScenes dataset present competitive performance with state-of-the-art approaches. Our code will be released in the future.

JBHI Journal 2025 Journal Article

Multi-Task Adaptive Resolution Network for Lymph Node Metastasis Diagnosis From Whole Slide Images of Colorectal Cancer

  • Tong Wang
  • Su-Jin Shin
  • Mingkang Wang
  • Qi Xu
  • Guiyang Jiang
  • Fengyu Cong
  • Jeonghyun Kang
  • Hongming Xu

Automated detection of lymph node metastasis (LNM) holds great potential to alleviate the workload of doctors and reduce misinterpretations. Despite the practical successes achieved, effectively addressing the highly complex and heterogeneous tumor microenvironment remains an open and challenging problem, especially when tumor subtypes intermingle and are difficult to delineate. In this paper, we propose a multi-task adaptive resolution network, named MAR-Net, for LNM detection and subtyping in complex mixed-type cancers. Specifically, we construct a resolution-aware module to mine heterogeneous diagnostic information, which exploits the multi-scale pyramid information and adaptively combines multi-resolution structured features for comprehensive representation. Additionally, we adopt a multi-task learning approach that simultaneously addresses LNM detection and subtyping, reducing model instability during optimization and improving performance across both tasks. More importantly, to rectify the potential misclassification of tumor subtypes, we elaborately design a hierarchical subtying refinement (HSR) algorithm that leverages a generic segmentation model informed by pathologists' prior knowledge. Evaluations have been conducted on three private and one public cancer datasets (554 WSIs, 4. 8 million patches). Our experimental results demonstrate that the proposed method consistently achieves superior performance compared to the state-of-the-art methods, achieving 0. 5% to 3. 2% higher AUC in LNM detection and 3. 8% to 4. 4% higher AUC in LNM subtyping.

NeurIPS Conference 2025 Conference Paper

ProtoPairNet: Interpretable Regression through Prototypical Pair Reasoning

  • Rose Gurung
  • Ronilo Ragodos
  • Chiyu Ma
  • Tong Wang
  • Chaofan Chen

We present Prototypical Pair Network (ProtoPairNet), a novel interpretable architecture that combines deep learning with case-based reasoning to predict continuous targets. While prototype-based models have primarily addressed image classification with discrete outputs, extending these methods to continuous targets, such as regression, poses significant challenges. Existing architectures which rely heavily on one-to-one comparison with prototypes lack the directional information necessary for continuous predictions. Our method redefines the role of prototypes in such tasks by incorporating prototypical pairs into the reasoning process. Predictions are derived based on the input's relative dissimilarities to these pairs, leveraging an intuitive geometric interpretation. Our method further reduces the complexity of the reasoning process by relying on the single most relevant pair of prototypes, rather than all prototypes in the model as was done in prior works. Our model is versatile enough to be used in both vision-based regression and continuous control in reinforcement learning. Our experiments demonstrate that ProtoPairNet achieves performance on par with its black-box counterparts across these tasks. Comprehensive analyses confirm the meaningfulness of prototypical pairs and the faithfulness of our model’s interpretations, and extensive user studies highlight our model's improved interpretability over existing methods.

NeurIPS Conference 2025 Conference Paper

RayFusion: Ray Fusion Enhanced Collaborative Visual Perception

  • Shaohong Wang
  • Lu Bin
  • Xinyu Xiao
  • Hanzhi Zhong
  • Bowen Pang
  • Tong Wang
  • Zhiyu Xiang
  • Hangguan Shan

Collaborative visual perception methods have gained widespread attention in the autonomous driving community in recent years due to their ability to address sensor limitation problems. However, the absence of explicit depth information often makes it difficult for camera-based perception systems, e. g. , 3D object detection, to generate accurate predictions. To alleviate the ambiguity in depth estimation, we propose RayFusion, a ray-based fusion method for collaborative visual perception. Using ray occupancy information from collaborators, RayFusion reduces redundancy and false positive predictions along camera rays, enhancing the detection performance of purely camera-based collaborative perception systems. Comprehensive experiments show that our method consistently outperforms existing state-of-the-art models, substantially advancing the performance of collaborative visual perception. Our code will be made publicly available.

NeurIPS Conference 2025 Conference Paper

Self-Assembling Graph Perceptrons

  • Jialong Chen
  • Tong Wang
  • Bowen Deng
  • Luonan Chen
  • Zibin Zheng
  • Chuan Chen

Inspired by the workings of biological brains, humans have designed artificial neural networks (ANNs), sparking profound advancements across various fields. However, the biological brain possesses high plasticity, enabling it to develop simple, efficient, and powerful structures to cope with complex external environments. In contrast, the superior performance of ANNs often relies on meticulously crafted architectures, which can make them vulnerable when handling complex inputs. Moreover, overparameterization often characterizes the most advanced ANNs. This paper explores the path toward building streamlined and plastic ANNs. Firstly, we introduce the Graph Perceptron (GP), which extends the most fundamental ANN, the Multi-Layer Perceptron (MLP). Subsequently, we incorporate a self-assembly mechanism on top of GP called Self-Assembling Graph Perceptron (SAGP). During training, SAGP can autonomously adjust the network's number of neurons and synapses and their connectivity. SAGP achieves comparable or even superior performance with only about 5% of the size of an MLP. We also demonstrate the SAGP's advantages in enhancing model interpretability and feature selection.

AAAI Conference 2025 Conference Paper

THESAURUS: Contrastive Graph Clustering by Swapping Fused Gromov-Wasserstein Couplings

  • Bowen Deng
  • Tong Wang
  • Lele Fu
  • Sheng Huang
  • Chuan Chen
  • Tao Zhang

Graph node clustering is a fundamental unsupervised task. Existing methods typically train an encoder through self-supervised learning and then apply K-means to the encoder output. Some methods use this clustering result directly as the final assignment, while others initialize centroids based on this initial clustering and then finetune both the encoder and these learnable centroids. However, due to their reliance on K-means, these methods inherit its drawbacks when the cluster separability of encoder output is low, facing challenges from the Uniform Effect and Cluster Assimilation. We summarize three reasons for the low cluster separability in existing methods: (1) lack of contextual information prevents discrimination between similar nodes from different clusters; (2) training tasks are not sufficiently aligned with the downstream clustering task; (3) the cluster information in the graph structure is not appropriately exploited. To address these issues, we propose conTrastive grapH clustEring by SwApping fUsed gRomov-wasserstein coUplingS (THESAURUS). Our method introduces semantic prototypes to provide contextual information, and employs a cross-view assignment prediction pretext task that aligns well with the downstream clustering task. Additionally, it utilizes Gromov-Wasserstein Optimal Transport (GW-OT) along with the proposed prototype graph to thoroughly exploit cluster information in the graph structure. To adapt to diverse real-world data, THESAURUS updates the prototype graph and the prototype marginal distribution in OT by using momentum. Extensive experiments demonstrate that THESAURUS achieves higher cluster separability than the prior art, effectively mitigating the Uniform Effect and Cluster Assimilation issues.

NeurIPS Conference 2024 Conference Paper

Improving Decision Sparsity

  • Yiyang Sun
  • Tong Wang
  • Cynthia Rudin

Sparsity is a central aspect of interpretability in machine learning. Typically, sparsity is measured in terms of the size of a model globally, such as the number of variables it uses. However, this notion of sparsity is not particularly relevant for decision making; someone subjected to a decision does not care about variables that do not contribute to the decision. In this work, we dramatically expand a notion of decision sparsity called the Sparse Explanation Value (SEV) so that its explanations are more meaningful. SEV considers movement along a hypercube towards a reference point. By allowing flexibility in that reference and by considering how distances along the hypercube translate to distances in feature space, we can derive sparser and more meaningful explanations for various types of function classes. We present cluster-based SEV and its variant tree-based SEV, introduce a method that improves credibility of explanations, and propose algorithms that optimize decision sparsity in machine learning models.

AAAI Conference 2024 Conference Paper

Inspecting Prediction Confidence for Detecting Black-Box Backdoor Attacks

  • Tong Wang
  • Yuan Yao
  • Feng Xu
  • Miao Xu
  • Shengwei An
  • Ting Wang

Backdoor attacks have been shown to be a serious security threat against deep learning models, and various defenses have been proposed to detect whether a model is backdoored or not. However, as indicated by a recent black-box attack, existing defenses can be easily bypassed by implanting the backdoor in the frequency domain. To this end, we propose a new defense DTInspector against black-box backdoor attacks, based on a new observation related to the prediction confidence of learning models. That is, to achieve a high attack success rate with a small amount of poisoned data, backdoor attacks usually render a model exhibiting statistically higher prediction confidences on the poisoned samples. We provide both theoretical and empirical evidence for the generality of this observation. DTInspector then carefully examines the prediction confidences of data samples, and decides the existence of backdoor using the shortcut nature of backdoor triggers. Extensive evaluations on six backdoor attacks, four datasets, and three advanced attacking types demonstrate the effectiveness of the proposed defense.

AAAI Conference 2024 Conference Paper

Semantic-Aware Autoregressive Image Modeling for Visual Representation Learning

  • Kaiyou Song
  • Shan Zhang
  • Tong Wang

The development of autoregressive modeling (AM) in computer vision lags behind natural language processing (NLP) in self-supervised pre-training. This is mainly caused by the challenge that images are not sequential signals and lack a natural order when applying autoregressive modeling. In this study, inspired by human beings’ way of grasping an image, i.e., focusing on the main object first, we present a semantic-aware autoregressive image modeling (SemAIM) method to tackle this challenge. The key insight of SemAIM is to autoregressively model images from the semantic patches to the less semantic patches. To this end, we first calculate a semantic-aware permutation of patches according to their feature similarities and then perform the autoregression procedure based on the permutation. In addition, considering that the raw pixels of patches are low-level signals and are not ideal prediction targets for learning high-level semantic representation, we also explore utilizing the patch features as the prediction targets. Extensive experiments are conducted on a broad range of downstream tasks, including image classification, object detection, and instance/semantic segmentation, to evaluate the performance of SemAIM. The results demonstrate SemAIM achieves state-of-the-art performance compared with other self-supervised methods. Specifically, with ViT-B, SemAIM achieves 84.1% top-1 accuracy for fine-tuning on ImageNet, 51.3% AP and 45.4% AP for object detection and instance segmentation on COCO, which outperforms the vanilla MAE by 0.5%, 1.0%, and 0.5%, respectively. Code is available at https://github.com/skyoux/SemAIM.

NeurIPS Conference 2023 Conference Paper

DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions

  • Haochen Wang
  • Junsong Fan
  • Yuxi Wang
  • Kaiyou Song
  • Tong Wang
  • ZHAO-XIANG ZHANG

As it is empirically observed that Vision Transformers (ViTs) are quite insensitive to the order of input tokens, the need for an appropriate self-supervised pretext task that enhances the location awareness of ViTs is becoming evident. To address this, we present DropPos, a novel pretext task designed to reconstruct Dropped Positions. The formulation of DropPos is simple: we first drop a large random subset of positional embeddings and then the model classifies the actual position for each non-overlapping patch among all possible positions solely based on their visual appearance. To avoid trivial solutions, we increase the difficulty of this task by keeping only a subset of patches visible. Additionally, considering there may be different patches with similar visual appearances, we propose position smoothing and attentive reconstruction strategies to relax this classification problem, since it is not necessary to reconstruct their exact positions in these cases. Empirical evaluations of DropPos show strong capabilities. DropPos outperforms supervised pre-training and achieves competitive results compared with state-of-the-art self-supervised alternatives on a wide range of downstream benchmarks. This suggests that explicitly encouraging spatial reasoning abilities, as DropPos does, indeed contributes to the improved location awareness of ViTs. The code is publicly available at https: //github. com/Haochen-Wang409/DropPos.

NeurIPS Conference 2023 Conference Paper

Efficiently incorporating quintuple interactions into geometric deep learning force fields

  • Zun Wang
  • Guoqing Liu
  • Yichi Zhou
  • Tong Wang
  • Bin Shao

Machine learning force fields (MLFFs) have instigated a groundbreaking shift in molecular dynamics (MD) simulations across a wide range of fields, such as physics, chemistry, biology, and materials science. Incorporating higher order many-body interactions can enhance the expressiveness and accuracy of models. Recent models have achieved this by explicitly including up to four-body interactions. However, five-body interactions, which have relevance in various fields, are still challenging to incorporate efficiently into MLFFs. In this work, we propose the quintuple network (QuinNet), an end-to-end graph neural network that efficiently expresses many-body interactions up to five-body interactions with \emph{ab initio} accuracy. By analyzing the topology of diverse many-body interactions, we design the model architecture to efficiently and explicitly represent these interactions. We evaluate QuinNet on public datasets of small molecules, such as MD17 and its revised version, and show that it is compatible with other state-of-the-art models on these benchmarks. Moreover, QuinNet surpasses many leading models on larger and more complex molecular systems, such as MD22 and Chignolin, without increasing the computational complexity. We also use QuinNet as a force field for molecular dynamics (MD) simulations to demonstrate its accuracy and stability, and conduct an ablation study to elucidate the significance of five-body interactions. We open source our implementation at https: //github. com/Zun-Wang/QuinNet.

NeurIPS Conference 2023 Conference Paper

Geometric Transformer with Interatomic Positional Encoding

  • Yusong Wang
  • Shaoning Li
  • Tong Wang
  • Bin Shao
  • Nanning Zheng
  • Tie-Yan Liu

The widespread adoption of Transformer architectures in various data modalities has opened new avenues for the applications in molecular modeling. Nevertheless, it remains elusive that whether the Transformer-based architecture can do molecular modeling as good as equivariant GNNs. In this paper, by designing Interatomic Positional Encoding (IPE) thatparameterizes atomic environments as Transformer's positional encodings, we propose Geoformer, a novel geometric Transformer to effectively model molecular structures for various molecular property prediction. We evaluate Geoformer on several benchmarks, including the QM9 dataset and the recently proposed Molecule3D dataset. Compared with both Transformers and equivariant GNN models, Geoformer outperforms the state-of-the-art (SoTA) algorithms on QM9, and achieves the best performance on Molecule3D for both random and scaffold splits. By introducing IPE, Geoformer paves the way for molecular geometric modeling based on Transformer architecture. Codes are available at https: //github. com/microsoft/AI2BMD/tree/Geoformer.

JMLR Journal 2023 Journal Article

ProtoryNet - Interpretable Text Classification Via Prototype Trajectories

  • Dat Hong
  • Tong Wang
  • Stephen Baek

We propose a novel interpretable deep neural network for text classification, called ProtoryNet, based on a new concept of prototype trajectories. Motivated by the prototype theory in modern linguistics, ProtoryNet makes a prediction by finding the most similar prototype for each sentence in a text sequence and feeding an RNN backbone with the proximity of each sentence to the corresponding active prototype. The RNN backbone then captures the temporal pattern of the prototypes, which we refer to as prototype trajectories. Prototype trajectories enable intuitive and fine-grained interpretation of the reasoning process of the RNN model, in resemblance to how humans analyze texts. We also design a prototype pruning procedure to reduce the total number of prototypes used by the model for better interpretability. Experiments on multiple public datasets demonstrate that ProtoryNet achieves higher accuracy than the baseline prototype-based deep neural net and narrows the performance gap when compared to state-of-the-art black-box models. In addition, after prototype pruning, the resulting ProtoryNet models only need less than or around 20 prototypes for all datasets, which significantly benefits interpretability. Furthermore, we report survey results indicating that human users find ProtoryNet more intuitive and easier to understand compared to other prototype-based methods. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

TMLR Journal 2022 Journal Article

Direct Molecular Conformation Generation

  • Jinhua Zhu
  • Yingce Xia
  • Chang Liu
  • Lijun Wu
  • Shufang Xie
  • Yusong Wang
  • Tong Wang
  • Tao Qin

Molecular conformation generation aims to generate three-dimensional coordinates of all the atoms in a molecule and is an important task in bioinformatics and pharmacology. Previous methods usually first predict the interatomic distances, the gradients of interatomic distances or the local structures (e.g., torsion angles) of a molecule, and then reconstruct its 3D conformation. How to directly generate the conformation without the above intermediate values is not fully explored. In this work, we propose a method that directly predicts the coordinates of atoms: (1) the loss function is invariant to roto-translation of coordinates and permutation of symmetric atoms; (2) the newly proposed model adaptively aggregates the bond and atom information and iteratively refines the coordinates of the generated conformation. Our method achieves the best results on GEOM-QM9 and GEOM-Drugs datasets. Further analysis shows that our generated conformations have closer properties (e.g., HOMO-LUMO gap) with the groundtruth conformations. In addition, our method improves molecular docking by providing better initial conformations. All the results demonstrate the effectiveness of our method and the great potential of the direct approach. The code is released at \url{https://github.com/DirectMolecularConfGen/DMCG}.

NeurIPS Conference 2022 Conference Paper

ProtoX: Explaining a Reinforcement Learning Agent via Prototyping

  • Ronilo Ragodos
  • Tong Wang
  • Qihang Lin
  • Xun Zhou

While deep reinforcement learning has proven to be successful in solving control tasks, the ``black-box'' nature of an agent has received increasing concerns. We propose a prototype-based post-hoc \emph{policy explainer}, ProtoX, that explains a black-box agent by prototyping the agent's behaviors into scenarios, each represented by a prototypical state. When learning prototypes, ProtoX considers both visual similarity and scenario similarity. The latter is unique to the reinforcement learning context since it explains why the same action is taken in visually different states. To teach ProtoX about visual similarity, we pre-train an encoder using contrastive learning via self-supervised learning to recognize states as similar if they occur close together in time and receive the same action from the black-box agent. We then add an isometry layer to allow ProtoX to adapt scenario similarity to the downstream task. ProtoX is trained via imitation learning using behavior cloning, and thus requires no access to the environment or agent. In addition to explanation fidelity, we design different prototype shaping terms in the objective function to encourage better interpretability. We conduct various experiments to test ProtoX. Results show that ProtoX achieved high fidelity to the original black-box agent while providing meaningful and understandable explanations.

JMLR Journal 2021 Journal Article

Hybrid Predictive Models: When an Interpretable Model Collaborates with a Black-box Model

  • Tong Wang
  • Qihang Lin

Interpretable machine learning has become a strong competitor for black-box models. However, the possible loss of the predictive performance for gaining understandability is often inevitable, especially when it needs to satisfy users with diverse backgrounds or high standards for what is considered interpretable. This tension puts practitioners in a dilemma of choosing between high accuracy (black-box models) and interpretability (interpretable models). In this work, we propose a novel framework for building a Hybrid Predictive Model that integrates an interpretable model with any pre-trained black-box model to combine their strengths. The interpretable model substitutes the black-box model on a subset of data where the interpretable model is most competent, gaining transparency at a low cost of the predictive accuracy. We design a principled objective function that considers predictive accuracy, model interpretability, and model transparency (defined as the percentage of data processed by the interpretable substitute.) Under this framework, we propose two hybrid models, one substituting with association rules and the other with linear models, and design customized training algorithms for both models. We test the hybrid models on structured data and text data where interpretable models collaborate with various state-of-the-art black-box models. Results show that hybrid models obtain an efficient trade-off between transparency and predictive performance, characterized by pareto frontiers. Finally, we apply the proposed model on a real-world patients dataset for predicting cardiovascular disease and propose multi-model Pareto frontiers to assist model selection in real applications. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2021. ( edit, beta )

ICML Conference 2019 Conference Paper

Gaining Free or Low-Cost Interpretability with Interpretable Partial Substitute

  • Tong Wang

This work addresses the situation where a black-box model with good predictive performance is chosen over its interpretable competitors, and we show interpretability is still achievable in this case. Our solution is to find an interpretable substitute on a subset of data where the black-box model is overkill or nearly overkill while leaving the rest to the black-box. This transparency is obtained at minimal cost or no cost of the predictive performance. Under this framework, we develop a Hybrid Rule Sets (HyRS) model that uses decision rules to capture the subspace of data where the rules are as accurate or almost as accurate as the black-box provided. To train a HyRS, we devise an efficient search algorithm that iteratively finds the optimal model and exploits theoretically grounded strategies to reduce computation. Our framework is agnostic to the black-box during training. Experiments on structured and text data show that HyRS obtains an effective trade-off between transparency and interpretability.

NeurIPS Conference 2019 Conference Paper

Metalearned Neural Memory

  • Tsendsuren Munkhdalai
  • Alessandro Sordoni
  • Tong Wang
  • Adam Trischler

We augment recurrent neural networks with an external memory mechanism that builds upon recent progress in metalearning. We conceptualize this memory as a rapidly adaptable function that we parameterize as a deep neural network. Reading from the neural memory function amounts to pushing an input (the key vector) through the function to produce an output (the value vector). Writing to memory means changing the function; specifically, updating the parameters of the neural network to encode desired information. We leverage training and algorithmic techniques from metalearning to update the neural memory function in one shot. The proposed memory-augmented model achieves strong performance on a variety of learning problems, from supervised question answering to reinforcement learning.

AAAI Conference 2018 Conference Paper

A Semantic QA-Based Approach for Text Summarization Evaluation

  • Ping Chen
  • Fei Wu
  • Tong Wang
  • Wei Ding

Many Natural Language Processing and Computational Linguistics applications involve the generation of new texts based on some existing texts, such as summarization, text simplification and machine translation. However, there has been a serious problem haunting these applications for decades, that is, how to automatically and accurately assess quality of these applications. In this paper, we will present some preliminary results on one especially useful and challenging problem in NLP system evaluation – how to pinpoint content differences of two text passages (especially for large passages such as articles and books). Our idea is intuitive and very different from existing approaches. We treat one text passage as a small knowledge base, and ask it a large number of questions to exhaustively identify all content points in it. By comparing the correctly answered questions from two text passages, we will be able to compare their content precisely. The experiment using 2007 DUC summarization corpus clearly shows promising results.

NeurIPS Conference 2018 Conference Paper

Multi-value Rule Sets for Interpretable Classification with Feature-Efficient Representations

  • Tong Wang

We present the Multi-value Rule Set (MRS) for interpretable classification with feature efficient presentations. Compared to rule sets built from single-value rules, MRS adopts a more generalized form of association rules that allows multiple values in a condition. Rules of this form are more concise than classical single-value rules in capturing and describing patterns in data. Our formulation also pursues a higher efficiency of feature utilization, which reduces possible cost in data collection and storage. We propose a Bayesian framework for formulating an MRS model and develop an efficient inference method for learning a maximum a posteriori, incorporating theoretically grounded bounds to iteratively reduce the search space and improve the search efficiency. Experiments on synthetic and real-world data demonstrate that MRS models have significantly smaller complexity and fewer features than baseline models while being competitive in predictive accuracy.

JMLR Journal 2017 Journal Article

A Bayesian Framework for Learning Rule Sets for Interpretable Classification

  • Tong Wang
  • Cynthia Rudin
  • Finale Doshi-Velez
  • Yimin Liu
  • Erica Klampfl
  • Perry MacNeille

We present a machine learning algorithm for building classifiers that are comprised of a small number of short rules. These are restricted disjunctive normal form models. An example of a classifier of this form is as follows: If $X$ satisfies (condition $A$ AND condition $B$) OR (condition $C$) OR $\cdots$, then $Y=1$. Models of this form have the advantage of being interpretable to human experts since they produce a set of rules that concisely describe a specific class. We present two probabilistic models with prior parameters that the user can set to encourage the model to have a desired size and shape, to conform with a domain-specific definition of interpretability. We provide a scalable MAP inference approach and develop theoretical bounds to reduce computation by iteratively pruning the search space. We apply our method (Bayesian Rule Sets -- BRS ) to characterize and predict user behavior with respect to in-vehicle context-aware personalized recommender systems. Our method has a major advantage over classical associative classification methods and decision trees in that it does not greedily grow the model. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

IJCAI Conference 2017 Conference Paper

Predicting the Quality of Short Narratives from Social Media

  • Tong Wang
  • Ping Chen
  • Boyang Li

An important and difficult challenge in building computational models for narratives is the automatic evaluation of narrative quality. Quality evaluation connects narrative understanding and generation as generation systems need to evaluate their own products. To circumvent difficulties in acquiring annotations, we employ upvotes in social media as an approximate measure for story quality. We collected 54, 484 answers from a crowd-powered question-and-answer website, Quora, and then used active learning to build a classifier that labeled 28, 320 answers as stories. To predict the number of upvotes without the use of social network features, we create neural networks that model textual regions and the interdependence among regions, which serve as strong benchmarks for future research. To our best knowledge, this is the first large-scale study for automatic evaluation of narrative quality.

AAAI Conference 2016 Conference Paper

Text Simplification Using Neural Machine Translation

  • Tong Wang
  • Ping Chen
  • John Rochford
  • Jipeng Qiang

Text simplification (TS) is the technique of reducing the lexical, syntactical complexity of text. Existing automatic TS systems can simplify text only by lexical simplification or by manually defined rules. Neural Machine Translation (NMT) is a recently proposed approach for Machine Translation (MT) that is receiving a lot of research interest. In this paper, we regard original English and simplified English as two languages, and apply a NMT model–Recurrent Neural Network (RNN) encoder-decoder on TS to make the neural network to learn text simplification rules by itself. Then we discuss challenges and strategies about how to apply a NMT model to the task of text simplification.