Arrow Research search

Author name cluster

Zhi Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

44 papers
2 author rows

Possible papers

44

AAAI Conference 2026 Conference Paper

DynamicEarth: How Far Are We from Open-Vocabulary Change Detection?

  • Kaiyu Li
  • Xiangyong Cao
  • Yupeng Deng
  • Chao Pang
  • Zepeng Xin
  • Hui Qiao
  • Tieliang Gong
  • Deyu Meng

Monitoring Earth's evolving land covers requires methods capable of detecting changes across a wide range of categories and contexts. Existing change detection methods are hindered by their dependency on predefined classes, reducing their effectiveness in open-world applications. To address this issue, we introduce open-vocabulary change detection (OVCD), a novel task that bridges vision and language to detect changes across any category. Considering the lack of high-quality data and annotation, we propose two training-free frameworks, M-C-I and I-M-C, which leverage and integrate off-the-shelf foundation models for the OVCD task. The insight behind the M-C-I~framework is to discover all potential changes and then classify these changes, while the insight of I-M-C~framework is to identify all targets of interest and then determine whether their states have changed. Based on these two frameworks, we instantiate to obtain several methods, e.g., SAM-DINOv2-SegEarth-OV, Grounding-DINO-SAM2-DINO, etc. Extensive evaluations on 4 benchmark datasets demonstrate the superior generalization and robustness of our OVCD methods over existing supervised and unsupervised methods. To support continued exploration, we release DynamicEarth, a dedicated codebase designed to advance research and application of OVCD.

AAAI Conference 2026 Conference Paper

Medical Vision–Language Pretraining with LLM-Guided Temporal Supervision

  • Liang Bai
  • Zhi Wang
  • Huimin Yan
  • Xian Yang

Medical vision–language pretraining typically relies on static image–text pairs, overlooking temporal cues vital for understanding clinical progression. This limits model sensitivity to evolving semantics and reduces their effectiveness in real-world clinical reasoning. To address this challenge, we propose TAMM—a temporal alignment framework that leverages weak but semantically rich supervision from large language models (LLMs). Given temporally adjacent clinical reports, LLMs automatically generate (i) coarse-grained trend labels (e.g., improving or worsening), and (ii) fine-grained rationales explaining the supporting clinical evidence. These complementary signals inject temporal semantics without requiring manual annotation, and guide vision–language representation learning to capture trend-sensitive cross-modal alignment and rationale-grounded coherence. Experiments on multiple medical benchmarks demonstrate that TAMM improves retrieval and classification performance while yielding more interpretable, temporally consistent embeddings. Our results highlight the potential of leveraging LLM-derived supervision to equip vision–language models with temporal awareness critical for clinical applications.

AAAI Conference 2026 Conference Paper

MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm

  • Xiao Fan
  • Jingyan Jiang
  • Zhaoru Chen
  • Fanding Huang
  • Xiao Chen
  • Qinting Jiang
  • Bowen Zhang
  • Xing Tang

Test-time adaptation (TTA) has proven effective in mitigating performance drops under single-domain distribution shifts by updating model parameters during inference. However, real-world deployments often involve mixed distribution shifts---where test samples are affected by diverse and potentially conflicting domain factors---posing significant challenges even for state-of-the-art TTA methods. A key limitation in existing approaches is their reliance on a unified adaptation path, which fails to account for the fact that optimal gradient directions can vary significantly across different domains. Moreover, current benchmarks focus only on synthetic or homogeneous shifts, failing to capture the complexity of real-world heterogeneous mixed distribution shifts. To address this, we propose MoETTA, a novel entropy-based TTA framework that integrates the Mixture-of-Experts (MoE) architecture. Rather than enforcing a single parameter update rule for all test samples, MoETTA introduces a set of structurally decoupled experts, enabling specialization along diverse gradient directions. This design allows the model to better accommodate heterogeneous shifts through flexible and disentangled parameter updates. To simulate realistic deployment conditions, we introduce two new benchmarks: potpourri and potpourri+. While classical settings focus solely on synthetic corruptions (i.e., ImageNet-C), potpourri encompasses a broader range of domain shifts—including natural, artistic, and adversarial distortions—capturing more realistic deployment challenges. On top of that, potpourri+ further includes source-domain samples to evaluate robustness against catastrophic forgetting. Extensive experiments across three mixed distribution shifts settings show that MoETTA consistently outperforms strong baselines, establishing new state-of-the-art performance and highlighting the benefit of modeling multiple adaptation directions via expert-level diversity.

AAAI Conference 2026 Conference Paper

SIAM: Towards Generalizable Articulated Object Modeling via Single Robot-Object Interaction

  • Yuyan Liu
  • Li Zhang
  • Di Wu
  • Yan Zhang
  • Anran Huang
  • Zhi Wang
  • Liu Liu
  • Dan Guo

Articulated object modeling, which represents interconnected rigid bodies with their geometry, part segmentation, articulation tree, and physical properties, is crucial for robotic perception and manipulation. Recently existing methods like SAGCI leverage Interactive Perception (IP) to refine models through robot interaction. However, SAGCI suffers from prior-dependency (requiring initialization), neglects kinematic/dynamic constraints, and generates non-watertight meshes. To overcome these limitations, we propose SIAM, a novel framework for efficient and generalizable Single-Interaction Articulated Modeling. Given an initial point cloud, SIAM first enables minimal robot interaction to trigger object motion. It then precisely segments parts by analyzing point cloud differences pre- and post-interaction. For joint parameter estimation, we introduce an optimization incorporating novel kinematic energy constraints, enhancing physical consistency. Finally, we reconstruct a high-quality, topologically watertight mesh by learning 3D Gaussian Primitives from multi-view RGB-D observations under deformation. Extensive experiments on the PartNet-Mobility benchmark demonstrate state-of-the-art articulation modeling performance. Successful real-world deployment with an xArm robot further validates the framework's practicality and transferability. SIAM achieves accurate, prior-free modeling with significantly reduced interaction cost.

NeurIPS Conference 2025 Conference Paper

Accelerating Parallel Diffusion Model Serving with Residual Compression

  • Jiajun Luo
  • Yicheng Xiao
  • Jianru Xu
  • Yangxiu You
  • Rongwei Lu
  • Chen Tang
  • Jingyan Jiang
  • Zhi Wang

Diffusion models produce realistic images and videos but require substantial computational resources, necessitating multi-accelerator parallelism for real-time deployment. However, parallel inference introduces significant communication overhead from exchanging large activations between devices, limiting efficiency and scalability. We present CompactFusion, a compression framework that significantly reduces communication while preserving generation quality. Our key observation is that diffusion activations exhibit strong temporal redundancy—adjacent steps produce highly similar activations, saturating bandwidth with near-duplicate data carrying little new information. To address this inefficiency, we seek a more compact representation that encodes only the essential information. CompactFusion achieves this via Residual Compression that transmits only compressed residuals (step-wise activation differences). Based on empirical analysis and theoretical justification, we show that it effectively removes redundant data, enabling substantial data reduction while maintaining high fidelity. We also integrate lightweight error feedback to prevent error accumulation. CompactFusion establishes a new paradigm for parallel diffusion inference, delivering lower latency and significantly higher generation quality than prior methods. On 4$\times$L20, it achieves $3. 0\times$ speedup while greatly improving fidelity. It also uniquely supports communication-heavy strategies like sequence parallelism on slow networks, achieving $6. 7\times$ speedup over prior overlap-based method. CompactFusion applies broadly across diffusion models and parallel settings, and integrates easily without requiring pipeline rework. Portable implementation demonstrated on xDiT is publicly available at https: //github. com/Cobalt-27/CompactFusion

NeurIPS Conference 2025 Conference Paper

DeepHalo: A Neural Choice Model with Controllable Context Effects

  • Shuhan Zhang
  • Zhi Wang
  • Rui Gao
  • Shuang Li

Modeling human decision-making is central to applications such as recommendation, preference learning, and human-AI alignment. While many classic models assume context-independent choice behavior, a large body of behavioral research shows that preferences are often influenced by the composition of the choice set itself---a phenomenon known as the context effect or Halo effect. These effects can manifest as pairwise (first-order) or even higher-order interactions among the available alternatives. Recent models that attempt to capture such effects either focus on the featureless setting or, in the feature-based setting, rely on restrictive interaction structures or entangle interactions across all orders, which limits interpretability. In this work, we propose DeepHalo, a neural modeling framework that incorporates features while enabling explicit control over interaction order and principled interpretation of context effects. Our model enables systematic identification of interaction effects by order and serves as a universal approximator of context-dependent choice functions when specialized to a featureless setting. Experiments on synthetic and real-world datasets demonstrate strong predictive performance while providing greater transparency into the drivers of choice.

ICML Conference 2025 Conference Paper

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

  • Zican Hu
  • Wei Liu 0131
  • Xiaoye Qu
  • Xiangyu Yue 0001
  • Chunlin Chen
  • Zhi Wang
  • Yu Cheng 0001

While showing sophisticated reasoning abilities, large language models (LLMs) still struggle with long-horizon decision-making tasks due to deficient exploration and long-term credit assignment, especially in sparse-reward scenarios. Inspired by the divide-and-conquer principle, we propose an innovative framework GLIDER ( G rounding L anguage Models as Eff I cient D ecision-Making Agents via Offline Hi E rarchical R einforcement Learning) that introduces a parameter-efficient and generally applicable hierarchy to LLM policies. We develop a scheme where the low-level controller is supervised with abstract, step-by-step plans that are learned and instructed by the high-level policy. This design decomposes complicated problems into a series of coherent chain-of-thought reasoning sub-tasks, providing flexible temporal abstraction to significantly enhance exploration and learning for long-horizon tasks. Furthermore, GLIDER facilitates fast online adaptation to non-stationary environments owing to the strong transferability of its task-agnostic low-level skills. Experiments on ScienceWorld and ALFWorld benchmarks show that GLIDER achieves consistent performance gains, along with enhanced generalization capabilities.

ICRA Conference 2025 Conference Paper

DoorBot: Closed-Loop Task Planning and Manipulation for Door Opening in the Wild with Haptic Feedback

  • Zhi Wang
  • Yuchen Mo
  • Shengmiao Jin
  • Wenzhen Yuan 0001

Robots operating in unstructured environments face significant challenges when interacting with everyday objects like doors. They particularly struggle to generalize across diverse door types and conditions. Existing vision-based and open-loop planning methods often lack the robustness to handle varying door designs, mechanisms, and push/pull configurations. In this work, we propose a haptic-aware closed-loop hierarchical control framework that enables robots to explore and open different unseen doors in the wild. Our approach leverages real-time haptic feedback, allowing the robot to adjust its strategy dynamically based on force feedback during manipulation. We test our system on 20 unseen doors across different buildings, featuring diverse appearances and mechanical types. Our framework achieves a 90% success rate, demonstrating its ability to generalize and robustly handle varied door-opening tasks. This scalable solution offers potential applications in broader open-world articulated object manipulation tasks.

AAAI Conference 2025 Conference Paper

Enhancing Implicit Neural Representations via Symmetric Power Transformation

  • Weixiang Zhang
  • Shuzhao Xie
  • Chengwei Ren
  • Shijia Ge
  • Mingzi Wang
  • Zhi Wang

We propose symmetric power transformation to enhance the capacity of Implicit Neural Representation (INR) from the perspective of data transformation. Unlike prior work utilizing random permutation or index rearrangement, our method features a reversible operation that does not require additional storage consumption. Specifically, we first investigate the characteristics of data that can benefit the training of INR, proposing the Range-Defined Symmetric Hypothesis, which posits that specific range and symmetry can improve the expressive ability of INR. Based on this hypothesis, we propose a nonlinear symmetric power transformation to achieve both range-defined and symmetric properties simultaneously. We use the power coefficient to redistribute data to approximate symmetry within the target range. To improve the robustness of the transformation, we further design deviation-aware calibration and adaptive soft boundary to address issues of extreme deviation boosting and continuity breaking. Extensive experiments are conducted to verify the performance of the proposed method, demonstrating that our transformation can reliably improve INR compared with other data transformations. We also conduct 1D audio, 2D image and 3D video fitting tasks to demonstrate the effectiveness and applicability of our method.

NeurIPS Conference 2025 Conference Paper

Feature-Based Instance Neighbor Discovery: Advanced Stable Test-Time Adaptation in Dynamic World

  • Qinting Jiang
  • Chuyang Ye
  • Dongyan Wei
  • Bingli Wang
  • Yuan Xue
  • Jingyan Jiang
  • Zhi Wang

Despite progress, deep neural networks still suffer performance declines under distribution shifts between training and test domains, leading to a substantial decrease in Quality of Experience (QoE) for applications. Existing test-time adaptation (TTA) methods are challenged by dynamic, multiple test distributions within batches. We observe that feature distributions across different domains inherently cluster into distinct groups with varying means and variances. This divergence reveals a critical limitation of previous global normalization strategies in TTA, which inevitably distort the original data characteristics. Based on this insight, we propose Feature-based Instance Neighbor Discovery (FIND), which comprises three key components: Layer-Wise Feature Disentanglement (LFD), Feature-Aware Batch Normalization (FABN) and Selective FABN (S-FABN). LFD stably captures features with similar distributions at each layer by constructing graph structures; while FABN optimally combines source statistics with test-time distribution-specific statistics for robust feature representation. Finally, S-FABN determines which layers require feature partitioning and which can remain unified, thus enhancing the efficiency of inference. Extensive experiments demonstrate that FIND significantly outperforms existing methods, achieving up to approximately 30\% accuracy improvement in dynamic scenarios while maintaining computational efficiency. The source code is available at https: //github. com/Peanut-255/FIND.

AAAI Conference 2025 Conference Paper

JAQ: Joint Efficient Architecture Design and Low-Bit Quantization with Hardware-Software Co-Exploration

  • Mingzi Wang
  • Yuan Meng
  • Chen Tang
  • Weixiang Zhang
  • Yijian Qin
  • Yang Yao
  • Yingxin Li
  • Tongtong Feng

The co-design of neural network architectures, quantization precisions, and hardware accelerators offers a promising approach to achieving an optimal balance between performance and efficiency, particularly for model deployment on resource-constrained edge devices. In this work, we propose the JAQ Framework, which jointly optimizes the three critical dimensions. However, effectively automating the design process across the vast search space of those three dimensions poses significant challenges, especially when pursuing extremely low-bit quantization. Specifical, the primary challenges include: (1) Memory overhead in software-side: Low-precision quantization-aware training can lead to significant memory usage due to storing large intermediate features and latent weights for backpropagation, potentially causing memory exhaustion. (2) Search time-consuming in hardware-side: The discrete nature of hardware parameters and the complex interplay between compiler optimizations and individual operators make the accelerator search time-consuming. To address these issues, JAQ mitigates the memory overhead through a channel-wise sparse quantization (CSQ) scheme, selectively applying quantization to the most sensitive components of the model during optimization. Additionally, JAQ designs BatchTile, which employs a hardware generation network to encode all possible tiling modes, thereby speeding up the search for the optimal compiler mapping strategy. Extensive experiments demonstrate the effectiveness of JAQ, achieving approximately 7% higher Top-1 accuracy on ImageNet compared to previous methods and reducing the hardware search time per iteration to 0.15 seconds.

NeurIPS Conference 2025 Conference Paper

Learning to Reason under Off-Policy Guidance

  • Jianhao Yan
  • Yafu Li
  • Zican Hu
  • Zhi Wang
  • Ganqu Cui
  • Xiaoye Qu
  • Yu Cheng
  • Yue Zhang

Recent advances in large reasoning models (LRMs) demonstrate that sophisticated behaviors such as multi-step reasoning and self-reflection can emerge via reinforcement learning with verifiable rewards~(RLVR). However, existing RLVR approaches are inherently ``on-policy'', limiting learning to a model's own outputs and failing to acquire reasoning abilities beyond its initial capabilities. To address this issue, we introduce LUFFY (Learning to reason Under oFF-policY guidance), a framework that augments RLVR with off-policy reasoning traces. LUFFY dynamically balances imitation and exploration by combining off-policy demonstrations with on-policy rollouts during training. Specifically, LUFFY combines the Mixed-Policy GRPO framework, which has a theoretically guaranteed convergence rate, alongside policy shaping via regularized importance sampling to avoid superficial and rigid imitation during mixed-policy training. Compared with previous RLVR methods, LUFFY achieves an over +6. 4 average gain across six math benchmarks and an advantage of over +6. 2 points in out-of-distribution tasks. Most significantly, we show that LUFFY successfully trains weak models in scenarios where on-policy RLVR completely fails. These results provide compelling evidence that LUFFY transcends the fundamental limitations of on-policy RLVR and demonstrates the great potential of utilizing off-policy guidance in RLVR.

NeurIPS Conference 2025 Conference Paper

Mixture-of-Experts Meets In-Context Reinforcement Learning

  • Wenhao Wu
  • Fuhong Liu
  • Haoru Li
  • Zican Hu
  • Daoyi Dong
  • Chunlin Chen
  • Zhi Wang

In-context reinforcement learning (ICRL) has emerged as a promising paradigm for adapting RL agents to downstream tasks through prompt conditioning. However, two notable challenges remain in fully harnessing in-context learning within RL domains: the intrinsic multi-modality of the state-action-reward data and the diverse, heterogeneous nature of decision tasks. To tackle these challenges, we propose T2MIR ( T oken- and T ask-wise M oE for I n-context R L), an innovative framework that introduces architectural advances of mixture-of-experts (MoE) into transformer-based decision models. T2MIR substitutes the feedforward layer with two parallel layers: a token-wise MoE that captures distinct semantics of input tokens across multiple modalities, and a task-wise MoE that routes diverse tasks to specialized experts for managing a broad task distribution with alleviated gradient conflicts. To enhance task-wise routing, we introduce a contrastive learning method that maximizes the mutual information between the task and its router representation, enabling more precise capture of task-relevant information. The outputs of two MoE components are concatenated and fed into the next layer. Comprehensive experiments show that T2MIR significantly facilitates in-context learning capacity and outperforms various types of baselines. We bring the potential and promise of MoE to ICRL, offering a simple and scalable architectural enhancement to advance ICRL one step closer toward achievements in language and vision communities. Our code is available at https: //github. com/NJU-RL/T2MIR.

ICLR Conference 2025 Conference Paper

PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment

  • Daiwei Chen
  • Yi Chen
  • Aniket Rege
  • Zhi Wang
  • Ramya Korlakai Vinayak

Foundation models trained on internet-scale data benefit from extensive alignment to human preferences before deployment. However, existing methods typically assume a homogeneous preference shared by all individuals, overlooking the diversity inherent in human values. In this work, we propose a general reward modeling framework for pluralistic alignment (PAL), which incorporates diverse preferences from the ground up. PAL has a modular design that leverages commonalities across users while catering to individual personalization, enabling efficient few-shot localization of preferences for new users. Extensive empirical evaluation demonstrates that PAL matches or outperforms state-of-the-art methods on both text-to-text and text-to-image tasks: on Reddit TL;DR Summary, PAL is 1.7% more accurate for seen users and 36% more accurate for unseen users compared to the previous best method, with 100× less parameters. On Pick-a-Pic v2, PAL is 2.5% more accurate than the best method with 156× fewer learned parameters. Finally, we provide theoretical analysis for generalization of rewards learned via PAL framework showcasing the reduction in number of samples needed per user.

JBHI Journal 2025 Journal Article

Quantum-Resistant Privacy Preservation for Mobile Healthcare Services in Connected Transportation Systems via Deep Neural Architectures

  • Xinyue Li
  • Bo Yi
  • Xingsi Xue
  • Zhi Wang
  • Jing Yang

The rapid convergence of connected transportation networks and real-time healthcare services has given rise to new security and privacy challenges. Conventional cryptographic mechanisms, primarily designed for classical adversaries, may soon be rendered obsolete by quantum computers, posing dire risks to the confidentiality of sensitive medical data. This work proposes a quantum-resistant privacy preservation framework for mobile healthcare systems operating in vehicular networks. Leveraging lattice-based cryptography-specifically Ring Learning-with-Errors (Ring-LWE)-our approach ensures robust encryption and key management, rendering patient data impervious to quantum-based attacks. Complementing this cryptographic layer is a deep neural network architecture that integrates convolutional and attention-based modules to detect network anomalies with high accuracy and minimal latency. We demonstrate the feasibility of our method through comprehensive experiments that measure (1) cryptographic overhead, (2) intrusion detection effectiveness, and (3) end-to-end system performance under realistic conditions and varied load scenarios. Experimental results show that the proposed scheme can maintain sub-100 ms end-to-end latencies for healthcare data transfer in high-traffic urban networks, detecting a wide range of attacks at accuracy levels exceeding 95%. These findings underscore the potential of combining post-quantum cryptographic primitives with advanced deep learning to secure time-sensitive medical applications within next-generation intelligent transportation systems.

NeurIPS Conference 2025 Conference Paper

Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision

  • Shilin Zhang
  • Zican Hu
  • Wenhao Wu
  • Xinyi Xie
  • Jianxiang Tang
  • Chunlin Chen
  • Daoyi Dong
  • Yu Cheng

Offline meta-RL usually tackles generalization by inferring task beliefs from high-quality samples or warmup explorations. The restricted form limits their generality and usability since these supervision signals are expensive and even infeasible to acquire in advance for unseen tasks. Learning directly from the raw text about decision tasks is a promising alternative to leverage a much broader source of supervision. In the paper, we propose T ext-to- D ecision A gent ( T2DA ), a simple and scalable framework that supervises offline meta-RL with natural language. We first introduce a generalized world model to encode multi-task decision data into a dynamics-aware embedding space. Then, inspired by CLIP, we predict which textual description goes with which decision embedding, effectively bridging their semantic gap via contrastive language-decision pre-training and aligning the text embeddings to comprehend the environment dynamics. After training the text-conditioned generalist policy, the agent can directly realize zero-shot text-to-decision generation in response to language instructions. Comprehensive experiments on MuJoCo and Meta-World benchmarks show that T2DA facilitates high-capacity zero-shot generalization and outperforms various types of baselines. Our code is available at https: //github. com/NJU-RL/T2DA.

NeurIPS Conference 2025 Conference Paper

Understanding Bias Terms in Neural Representations

  • Weixiang Zhang
  • Boxi Li
  • Shuzhao Xie
  • Chengwei Ren
  • Yuan Xue
  • Zhi Wang

In this paper, we examine the impact and significance of bias terms in Implicit Neural Representations (INRs). While bias terms are known to enhance nonlinear capacity by shifting activations in typical neural networks, we discover their functionality differs markedly in neural representation networks. Our analysis reveals that INR performance neither scales with increased number of bias terms nor shows substantial improvement through bias term gradient propagation. We demonstrate that bias terms in INRs primarily serve to eliminate \textit{spatial aliasing} caused by symmetry from both coordinates and activation functions, with input-layer bias terms yielding the most significant benefits. These findings challenge the conventional practice of implementing full-bias INR architecture. We propose using freezing bias terms exclusively in input layers, which consistently outperforms fully biased networks in signal fitting tasks. Furthermore, we introduce Feature-Biased INRs~(Feat-Bias), which initialize input-layer bias with high-level features extracted from pre-trained models. This feature-biasing approach effectively addresses the limited performance in INR post-processing tasks due to neural parameter uninterpretability, achieving superior accuracy while reducing parameter count and improving reconstruction quality.

NeurIPS Conference 2024 Conference Paper

Beyond task diversity: provable representation transfer for sequential multitask linear bandits

  • Thang Duong
  • Zhi Wang
  • Chicheng Zhang

We study lifelong learning in linear bandits, where a learner interacts with a sequence of linear bandit tasks whose parameters lie in an $m$-dimensional subspace of $\mathbb{R}^d$, thereby sharing a low-rank representation. Current literature typically assumes that the tasks are diverse, i. e. , their parameters uniformly span the $m$-dimensional subspace. This assumption allows the low-rank representation to be learned before all tasks are revealed, which can be unrealistic in real-world applications. In this work, we present the first nontrivial result for sequential multi-task linear bandits without the task diversity assumption. We develop an algorithm that efficiently learns and transfers low-rank representations. When facing $N$ tasks, each played over $\tau$ rounds, our algorithm achieves a regret guarantee of $\tilde{O}\big (Nm \sqrt{\tau} + N^{\frac{2}{3}} \tau^{\frac{2}{3}} d m^{\frac13} + Nd^2 + \tau m d \big)$ under the ellipsoid action set assumption. This result can significantly improve upon the baseline of $\tilde{O} \left (Nd \sqrt{\tau}\right)$ that does not leverage the low-rank structure when the number of tasks $N$ is sufficiently large and $m \ll d$. We also demonstrate empirically on synthetic data that our algorithm outperforms baseline algorithms, which rely on the task diversity assumption.

JBHI Journal 2024 Journal Article

Cross-Domain Nuclei Detection in Histopathology Images Using Graph-Based Nuclei Feature Alignment

  • Zhi Wang
  • Kai Fan
  • Xiaoya Zhu
  • Honglei Liu
  • Gang Meng
  • Minghui Wang
  • Ao Li

As powerful tools deep neural networks have been successfully adopted for nuclei detection in histopathology images, whereas require the same probability distribution between training and testing data. However, domain shift among histopathology images widely exists in real-world applications and severely deteriorates the detection performance of deep neural networks. Despite encouraging results of existing domain adaptation methods, there remain challenges for cross-domain nuclei detection task. First, in view of the tiny size of nuclei, it is actually very difficult to obtain sufficient nuclei features, thus leading to a negative influence for feature alignment. Second, due to unavailable annotations in target domain, some extracted features contain background pixels and are thereby indiscriminative, which can largely confuse the alignment procedure. To address these challenges, in this article, we propose an end-to-end graph-based nuclei feature alignment (GNFA) method for boosting cross-domain nuclei detection. Concretely, sufficient nuclei features are generated from nuclei graph convolutional network (NGCN) by aggregating information of adjacent nuclei upon construction of nuclei graph for successful alignment. In addition, importance learning module (ILM) is designed to further select discriminative nuclei features for mitigating negative influence of background pixels in target domain during alignment. By utilizing sufficient and discriminative node features generated from GNFA, our method can successfully perform feature alignment and effectively alleviate domain shift problem for nuclei detection. Extensive experiments of multiple adaptation scenarios reveal that our method achieves state-of-the-art performance in cross-domain nuclei detection compared with existing domain adaptation methods.

IJCAI Conference 2024 Conference Paper

Invertible Residual Rescaling Models

  • Jinmin Li
  • Tao Dai
  • Yaohua Zha
  • Yilu Luo
  • Longfei Lu
  • Bin Chen
  • Zhi Wang
  • Shu-Tao Xia

Invertible Rescaling Networks (IRNs) and their variants have witnessed remarkable achievements in various image processing tasks like image rescaling. However, we observe that IRNs with deeper networks are difficult to train, thus hindering the representational ability of IRNs. To address this issue, we propose Invertible Residual Rescaling Models (IRRM) for image rescaling by learning a bijection between a high-resolution image and its low-resolution counterpart with a specific distribution. Specifically, we propose IRRM to build a deep network, which contains several Residual Downscaling Modules (RDMs) with long skip connections. Each RDM consists of several Invertible Residual Blocks (IRBs) with short connections. In this way, RDM allows rich low-frequency information to be bypassed by skip connections and forces models to focus on extracting high-frequency information from the image. Extensive experiments show that our IRRM performs significantly better than other state-of-the-art methods with much fewer parameters and complexity. Particularly, our IRRM has respectively PSNR gains of at least 0. 3 dB over HCFlow and IRN in the x4 rescaling while only using 60% parameters and 50% FLOPs. The code will be available at https: //github. com/THU-Kingmin/IRRM.

IROS Conference 2024 Conference Paper

KOSMOS-E: Learning to Follow Instruction for Robotic Grasping

  • Zhi Wang
  • Xun Wu
  • Shaohan Huang
  • Li Dong 0004
  • Wenhui Wang 0003
  • Shuming Ma
  • Furu Wei

Tuning on instruction-following data has been shown to enhance the capabilities and controllability of language models, but the idea is less explored in the robotic field. In this work, we introduce KOSMOS-E, a Multimodal Large Language Model (MLLM) that leverages instruction-following robotic grasping data to enhance capabilities for precise and intricate robotic grasping maneuvers. To achieve this, we craft a large-scale instruction-following robotic grasping dataset, termed INSTRUCT-GRASP, primarily comprising two aspects: (i) grasp a single object following varying levels of granularity descriptions, e. g. , different angles and aspects, and (ii) grasp a specific object within a multi-object environment following specific attributes, e. g. , color and shape. Extensive experiments show the effectiveness of KOSMOS-E on robotic grasping tasks across a variety of environments.

NeurIPS Conference 2024 Conference Paper

LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling

  • Yaohua Zha
  • Naiqi Li
  • Yanzi Wang
  • Tao Dai
  • Hang Guo
  • Bin Chen
  • Zhi Wang
  • Zhihao Ouyang

The pre-trained point cloud model based on Masked Point Modeling (MPM) has exhibited substantial improvements across various tasks. However, these models heavily rely on the Transformer, leading to quadratic complexity and limited decoder, hindering their practice application. To address this limitation, we first conduct a comprehensive analysis of existing Transformer-based MPM, emphasizing the idea that redundancy reduction is crucial for point cloud analysis. To this end, we propose a Locally constrained Compact point cloud Model (LCM) consisting of a locally constrained compact encoder and a locally constrained Mamba-based decoder. Our encoder replaces self-attention with our local aggregation layers to achieve an elegant balance between performance and efficiency. Considering the varying information density between masked and unmasked patches in the decoder inputs of MPM, we introduce a locally constrained Mamba-based decoder. This decoder ensures linear complexity while maximizing the perception of point cloud geometry information from unmasked patches with higher information density. Extensive experimental results show that our compact model significantly surpasses existing Transformer-based models in both performance and efficiency, especially our LCM-based Point-MAE model, compared to the Transformer-based model, achieved an improvement of 1. 84%, 0. 67%, and 0. 60% in performance on the three variants of ScanObjectNN while reducing parameters by 88% and computation by 73%. The code is available at https: //github. com/zyh16143998882/LCM.

NeurIPS Conference 2024 Conference Paper

Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement

  • Zhi Wang
  • Li Zhang
  • Wenhao Wu
  • Yuanheng Zhu
  • Dongbin Zhao
  • Chunlin Chen

A longstanding goal of artificial general intelligence is highly capable generalists that can learn from diverse experiences and generalize to unseen tasks. The language and vision communities have seen remarkable progress toward this trend by scaling up transformer-based models trained on massive datasets, while reinforcement learning (RL) agents still suffer from poor generalization capacity under such paradigms. To tackle this challenge, we propose Meta Decision Transformer (Meta-DT), which leverages the sequential modeling ability of the transformer architecture and robust task representation learning via world model disentanglement to achieve efficient generalization in offline meta-RL. We pretrain a context-aware world model to learn a compact task representation, and inject it as a contextual condition to the causal transformer to guide task-oriented sequence generation. Then, we subtly utilize history trajectories generated by the meta-policy as a self-guided prompt to exploit the architectural inductive bias. We select the trajectory segment that yields the largest prediction error on the pretrained world model to construct the prompt, aiming to encode task-specific information complementary to the world model maximally. Notably, the proposed framework eliminates the requirement of any expert demonstration or domain knowledge at test time. Experimental results on MuJoCo and Meta-World benchmarks across various dataset types show that Meta-DT exhibits superior few and zero-shot generalization capacity compared to strong baselines while being more practical with fewer prerequisites. Our code is available at https: //github. com/NJU-RL/Meta-DT.

IJCAI Conference 2024 Conference Paper

Nonconvex Multiview Subspace Clustering Framework with Efficient Method Designs and Theoretical Analysis

  • Zhi Wang
  • Zhuo Liu
  • Dong Hu
  • Tao Jia

Multi-view subspace clustering (MvSC) is one of the most effective methods for understanding and processing high-dimensional data. However, existing MvSC methods still have two shortcomings: (1) they adopt the nuclear norm as the low-rank constraint, which makes it impossible to fully exploit the mutually complementary subspace information, and (2) they do not handle disjoint and confounding points carefully, which may degrade the purity and distinctiveness of cross-view fusion. To address these issues, in this paper we propose a novel MvSC model with nonconvex ℓq regularization. Specially, our proposed model can not only effectively capture the intrinsic global low-rank structure, but also accurately cluster disjoint and confounding data samples into corresponding subspaces. Then, an efficient algorithm is developed with convergence guarantee. Furthermore, we prove that the sequence generated by our proposed algorithm converges to the desirable Karush-Kuhn-Tucker (KKT) critical point. Extensive experiments on various datasets verify the superiority of our proposed model. MATLAB code is available at https: //github. com/wangzhi-swu/NLRSC-MvSC.

AAAI Conference 2024 Conference Paper

OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

  • Jinyi Liu
  • Zhi Wang
  • Yan Zheng
  • Jianye Hao
  • Chenjia Bai
  • Junjie Ye
  • Zhen Wang
  • Haiyin Piao

In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream principle for directing exploration towards less explored areas, characterized by higher uncertainty. However, in the presence of environmental stochasticity (noise), purely optimistic exploration may lead to excessive probing of high-noise areas, consequently impeding exploration efficiency. Hence, in exploring noisy environments, while optimism-driven exploration serves as a foundation, prudent attention to alleviating unnecessary over-exploration in high-noise areas becomes beneficial. In this work, we propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control. OVD-Explorer proposes a new measurement of the policy's exploration ability considering noise in optimistic perspectives, and leverages gradient ascent to drive exploration. Practically, OVD-Explorer can be easily integrated with continuous control RL algorithms. Extensive evaluations on the MuJoCo and GridChaos tasks demonstrate the superiority of OVD-Explorer in achieving noise-aware optimistic exploration.

AAAI Conference 2024 Conference Paper

Procedural Level Generation with Diffusion Models from a Single Example

  • Shiqi Dai
  • Xuanyu Zhu
  • Naiqi Li
  • Tao Dai
  • Zhi Wang

Level generation is a central focus of Procedural Content Generation (PCG), yet deep learning-based approaches are limited by scarce training data, i.e., human-designed levels. Despite being a dominant framework, Generative Adversarial Networks (GANs) exhibit a substantial quality gap between generated and human-authored levels, alongside rising training costs, particularly with increasing token complexity. In this paper, we introduce a diffusion-based generative model that learns from just one example. Our approach involves two core components: 1) an efficient yet expressive level representation, and 2) a latent denoising network with constrained receptive fields. To start with, our method utilizes token semantic labels, similar to word embeddings, to provide dense representations. This strategy not only surpasses one-hot encoding in representing larger game levels but also improves stability and accelerates convergence in latent diffusion. In addition, we adapt the denoising network architecture to confine the receptive field to localized patches of the data, aiming to facilitate single-example learning. Extensive experiments demonstrate that our model is capable of generating stylistically congruent samples of arbitrary sizes compared to manually designed levels. It suits a wide range of level structures with fewer artifacts than GAN-based approaches. The source code is available at https://github.com/shiqi-dai/diffusioncraft.

AAAI Conference 2024 Conference Paper

Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders

  • Yaohua Zha
  • Huizhen Ji
  • Jinmin Li
  • Rongsheng Li
  • Tao Dai
  • Bin Chen
  • Zhi Wang
  • Shu-Tao Xia

Learning 3D representation plays a critical role in masked autoencoder (MAE) based pre-training methods for point cloud, including single-modal and cross-modal based MAE. Specifically, although cross-modal MAE methods learn strong 3D representations via the auxiliary of other modal knowledge, they often suffer from heavy computational burdens and heavily rely on massive cross-modal data pairs that are often unavailable, which hinders their applications in practice. Instead, single-modal methods with solely point clouds as input are preferred in real applications due to their simplicity and efficiency. However, such methods easily suffer from limited 3D representations with global random mask input. To learn compact 3D representations, we propose a simple yet effective Point Feature Enhancement Masked Autoencoders (Point-FEMAE), which mainly consists of a global branch and a local branch to capture latent semantic features. Specifically, to learn more compact features, a share-parameter Transformer encoder is introduced to extract point features from the global and local unmasked patches obtained by global random and local block mask strategies, followed by a specific decoder to reconstruct. Meanwhile, to further enhance features in the local branch, we propose a Local Enhancement Module with local patch convolution to perceive fine-grained local context at larger scales. Our method significantly improves the pre-training efficiency compared to cross-modal alternatives, and extensive downstream experiments underscore the state-of-the-art effectiveness, particularly outperforming our baseline (Point-MAE) by 5.16%, 5.00%, and 5.04% in three variants of ScanObjectNN, respectively. Code is available at https://github.com/zyh16143998882/AAAI24-PointFEMAE.

AAAI Conference 2024 Conference Paper

Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding

  • Taolin Zhang
  • Sunan He
  • Tao Dai
  • Zhi Wang
  • Bin Chen
  • Shu-Tao Xia

In recent years, vision language pre-training frameworks have made significant progress in natural language processing and computer vision, achieving remarkable performance improvement on various downstream tasks. However, when extended to point cloud data, existing works mainly focus on building task-specific models, and fail to extract universal 3D vision-language embedding that generalize well. We carefully investigate three common tasks in semantic 3D scene understanding, and derive key insights into the development of a pre-training model. Motivated by these observations, we propose a vision-language pre-training framework 3DVLP (3D vision-language pre-training with object contrastive learning), which transfers flexibly on 3D vision-language downstream tasks. 3DVLP takes visual grounding as the proxy task and introduces Object-level IoU-guided Detection (OID) loss to obtain high-quality proposals in the scene. Moreover, we design Object-level Cross-Contrastive alignment (OCC) task and Object-level Self-Contrastive learning (OSC) task to align the objects with descriptions and distinguish different objects in the scene, respectively. Extensive experiments verify the excellent performance of 3DVLP on three 3D vision-language tasks, reflecting its superiority in semantic 3D scene understanding. Code is available at https://github.com/iridescentttt/3DVLP.

NeurIPS Conference 2023 Conference Paper

BadTrack: A Poison-Only Backdoor Attack on Visual Object Tracking

  • Bin Huang
  • Jiaqian Yu
  • Yiwei Chen
  • Siyang Pan
  • Qiang Wang
  • Zhi Wang

Visual object tracking (VOT) is one of the most fundamental tasks in computer vision community. State-of-the-art VOT trackers extract positive and negative examples that are used to guide the tracker to distinguish the object from the background. In this paper, we show that this characteristic can be exploited to introduce new threats and hence propose a simple yet effective poison-only backdoor attack. To be specific, we poison a small part of the training data by attaching a predefined trigger pattern to the background region of each video frame, so that the trigger appears almost exclusively in the extracted negative examples. To the best of our knowledge, this is the first work that reveals the threat of poison-only backdoor attack on VOT trackers. We experimentally show that our backdoor attack can significantly degrade the performance of both two-stream Siamese and one-stream Transformer trackers on the poisoned data while gaining comparable performance with the benign trackers on the clean data.

AAAI Conference 2023 Conference Paper

Curriculum Multi-Negative Augmentation for Debiased Video Grounding

  • Xiaohan Lan
  • Yitian Yuan
  • Hong Chen
  • Xin Wang
  • Zequn Jie
  • Lin Ma
  • Zhi Wang
  • Wenwu Zhu

Video Grounding (VG) aims to locate the desired segment from a video given a sentence query. Recent studies have found that current VG models are prone to over-rely the groundtruth moment annotation distribution biases in the training set. To discourage the standard VG model's behavior of exploiting such temporal annotation biases and improve the model generalization ability, we propose multiple negative augmentations in a hierarchical way, including cross-video augmentations from clip-/video-level, and self-shuffled augmentations with masks. These augmentations can effectively diversify the data distribution so that the model can make more reasonable predictions instead of merely fitting the temporal biases. However, directly adopting such data augmentation strategy may inevitably carry some noise shown in our cases, since not all of the handcrafted augmentations are semantically irrelevant to the groundtruth video. To further denoise and improve the grounding accuracy, we design a multi-stage curriculum strategy to adaptively train the standard VG model from easy to hard negative augmentations. Experiments on newly collected Charades-CD and ActivityNet-CD datasets demonstrate our proposed strategy can improve the performance of the base model on both i.i.d and o.o.d scenarios.

AAAI Conference 2023 Conference Paper

FSR: A General Frequency-Oriented Framework to Accelerate Image Super-resolution Networks

  • Jinmin Li
  • Tao Dai
  • Mingyan Zhu
  • Bin Chen
  • Zhi Wang
  • Shu-Tao Xia

Deep neural networks (DNNs) have witnessed remarkable achievement in image super-resolution (SR), and plenty of DNN-based SR models with elaborated network designs have recently been proposed. However, existing methods usually require substantial computations by operating in spatial domain. To address this issue, we propose a general frequency-oriented framework (FSR) to accelerate SR networks by considering data characteristics in frequency domain. Our FSR mainly contains dual feature aggregation module (DFAM) to extract informative features in both spatial and transform domains, followed by a four-path SR-Module with different capacities to super-resolve in the frequency domain. Specifically, DFAM further consists of a transform attention block (TABlock) and a spatial context block (SCBlock) to extract global spectral information and local spatial information, respectively, while SR-Module is a parallel network container that contains four to-be-accelerated branches. Furthermore, we propose an adaptive weight strategy for a trade-off between image details recovery and visual quality. Extensive experiments show that our FSR can save FLOPs by almost 40% while reducing inference time by 50% for other SR methods (e.g., FSRCNN, CARN, SRResNet and RCAN). Code is available at https://github.com/THU-Kingmin/FSR.

AAAI Conference 2023 Conference Paper

Weakly-Supervised Semantic Segmentation for Histopathology Images Based on Dataset Synthesis and Feature Consistency Constraint

  • Zijie Fang
  • Yang Chen
  • Yifeng Wang
  • Zhi Wang
  • Xiangyang Ji
  • Yongbing Zhang

Tissue segmentation is a critical task in computational pathology due to its desirable ability to indicate the prognosis of cancer patients. Currently, numerous studies attempt to use image-level labels to achieve pixel-level segmentation to reduce the need for fine annotations. However, most of these methods are based on class activation map, which suffers from inaccurate segmentation boundaries. To address this problem, we propose a novel weakly-supervised tissue segmentation framework named PistoSeg, which is implemented under a fully-supervised manner by transferring tissue category labels to pixel-level masks. Firstly, a dataset synthesis method is proposed based on Mosaic transformation to generate synthesized images with pixel-level masks. Next, considering the difference between synthesized and real images, this paper devises an attention-based feature consistency, which directs the training process of a proposed pseudo-mask refining module. Finally, the refined pseudo-masks are used to train a precise segmentation model for testing. Experiments based on WSSS4LUAD and BCSS-WSSS validate that PistoSeg outperforms the state-of-the-art methods. The code is released at https://github.com/Vison307/PistoSeg.

TIST Journal 2023 Journal Article

What Your Next Check-in Might Look Like: Next Check-in Behavior Prediction

  • Heli Sun
  • Chen Cao
  • Xuguang Chu
  • Tingting Hu
  • Junzhi Lu
  • Liang He
  • Zhi Wang
  • Hui He

In recent years, the next-POI recommendation has become a trending research topic in the field of trajectory data mining. For protection of user privacy, users’ complete GPS trajectories are difficult to obtain. The check-in information posted by users on social networks has become an important data source for Spatio-temporal Trajectory research. However, state-of-the-art methods neglect the social meaning and the information dissemination function of check-in behavior. The social meaning is an important reason why users are willing to post check-in on social networks, and the information dissemination function means, users can affect each other’s behavior by check-ins. The above characteristics of the check-in behavior make it different from the visiting behavior. We consider a new problem of predicting the next check-in behavior including the check-in time, the POI (point-of-interest) where the check-in is located, functional semantics of the POI, and so on. To solve the proposed problem, we build a multi-task learning model called DPMTM, and a pre-training module is designed to extract dynamic social semantics of check-in behaviors. Our results show that the DPMTM model works well in the check-in behavior problem.

JAIR Journal 2022 Journal Article

HEBO: Pushing The Limits of Sample-Efficient Hyper-parameter Optimisation

  • Alexander I. Cowen-Rivers
  • Wenlong Lyu
  • Rasul Tutunov
  • Zhi Wang
  • Antoine Grosnit
  • Ryan Rhys Griffiths
  • Alexandre Max Maraval
  • Hao Jianye

In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for black-box optimisers. Based on these findings, we propose a Heteroscedastic and Evolutionary Bayesian Optimisation solver (HEBO). HEBO performs non-linear input and output warping, admits exact marginal log-likelihood optimisation and is robust to the values of learned parameters. We demonstrate HEBO’s empirical efficacy on the NeurIPS 2020 Black-Box Optimisation challenge, where HEBO placed first. Upon further analysis, we observe that HEBO significantly outperforms existing black-box optimisers on 108 machine learning hyperparameter tuning tasks comprising the Bayesmark benchmark. Our findings indicate that the majority of hyper-parameter tuning tasks exhibit heteroscedasticity and non-stationarity, multiobjective acquisition ensembles with Pareto front solutions improve queried configurations, and robust acquisition maximisers afford empirical advantages relative to their non-robust counterparts. We hope these findings may serve as guiding principles for practitioners of Bayesian optimisation.

NeurIPS Conference 2021 Conference Paper

Provably efficient multi-task reinforcement learning with model transfer

  • Chicheng Zhang
  • Zhi Wang

We study multi-task reinforcement learning (RL) in tabular episodic Markov decision processes (MDPs). We formulate a heterogeneous multi-player RL problem, in which a group of players concurrently face similar but not necessarily identical MDPs, with a goal of improving their collective performance through inter-player information sharing. We design and analyze a model-based algorithm, and provide gap-dependent and gap-independent regret upper and lower bounds that characterize the intrinsic complexity of the problem.

AAAI Conference 2020 Conference Paper

A Spherical Convolution Approach for Learning Long Term Viewport Prediction in 360 Immersive Video

  • Chenglei Wu
  • Ruixiao Zhang
  • Zhi Wang
  • Lifeng Sun

Viewport prediction for 360 video forecasts a viewer’s viewport when he/she watches a 360 video with a head-mounted display, which benefits many VR/AR applications such as 360 video streaming and mobile cloud VR. Existing studies based on planar convolutional neural network (CNN) suffer from the image distortion and split caused by the sphere-to-plane projection. In this paper, we start by proposing a spherical convolution based feature extraction network to distill spatial-temporal 360 information. We provide a solution for training such a network without a dedicated 360 image or video classification dataset. We differ with previous methods, which base their predictions on image pixel-level information, and propose a semantic content and preference based viewport prediction scheme. In this paper, we adopt a recurrent neural network (RNN) network to extract a user’s personal preference of 360 video content from minutes of embedded viewing histories. We utilize this semantic preference as spatial attention to help network find the “interested” regions on a future video. We further design a tailored mixture density network (MDN) based viewport prediction scheme, including viewport modeling, tailored loss function, etc, to improve efficiency and accuracy. Our extensive experiments demonstrate the rationality and performance of our method, which outperforms state-of-theart methods, especially in long-term prediction.

IS Journal 2020 Journal Article

SMSS: Secure Member Selection Strategy in Federated Learning

  • Kun Zhao
  • Wei Xi
  • Zhi Wang
  • Jizhong Zhao
  • Ruimeng Wang
  • Zhiping Jiang

Data security and user privacy-issue have become an important field. As federated learning (FL) could solve the problems from data security and privacy-issue, it starts to be applied in many different applied machine learning tasks. However, FL does not verify the quality of the data from different parties in the system. Hence, the low-quality datasets with fewer common entities can be cotrained with others. This could result in a huge amount of computing-resources waste, and the attack on the FL model from malicious clients as federal members. To solve this problem, this article proposes a secure member selection strategy (SMSS), which can evaluate the data qualities of members before training. With SMSS, only datasets share more common entities than a certain threshold can be selected for learning, whereas malicious clients with fewer common objects cannot acquire any information about the model. This article implements SMSS, and evaluate its performance via several extensive experiments. Experimental results demonstrate that SMSS is safe, efficient, and effective.

AAAI Conference 2019 Conference Paper

Better Fine-Tuning via Instance Weighting for Text Classification

  • Zhi Wang
  • Wei Bi
  • Yan Wang
  • Xiaojiang Liu

Transfer learning for deep neural networks has achieved great success in many text classification applications. A simple yet effective transfer learning method is to fine-tune the pretrained model parameters. Previous fine-tuning works mainly focus on the pre-training stage and investigate how to pretrain a set of parameters that can help the target task most. In this paper, we propose an Instance Weighting based Finetuning (IW-Fit) method, which revises the fine-tuning stage to improve the final performance on the target domain. IW-Fit adjusts instance weights at each fine-tuning epoch dynamically to accomplish two goals: 1) identify and learn the specific knowledge of the target domain effectively; 2) well preserve the shared knowledge between the source and the target domains. The designed instance weighting metrics used in IW-Fit are model-agnostic, which are easy to implement for general DNN-based classifiers. Experimental results show that IW-Fit can consistently improve the classification accuracy on the target domain.

AAAI Conference 2018 Conference Paper

Load Scheduling of Simple Temporal Networks Under Dynamic Resource Pricing

  • T. K. Satish Kumar
  • Zhi Wang
  • Anoop Kumar
  • Craig Rogers
  • Craig Knoblock

We study load scheduling of simple temporal networks (STNs) under dynamic pricing of resources. We are given a set of processes and a set of simple temporal constraints between their execution times, i. e. , an STN. Each process uses a certain amount of resource for execution. The unit price of the resource is a function of time, f(t). The goal is to find a schedule of a given STN that trades off makespan minimization against cost minimization within a user-specified suboptimality bound. We provide a polynomial-time algorithm for solving the load scheduling problem when f(t) is piecewise constant. This has important applications in many real-world domains including the smart home and smart grid domains. We then study the dependency of the unit price of the resource on time as well as the total demand at that time. This leads to a further characterization of tractable, NP-hard, and conjectured tractable cases.