Arrow Research search

Author name cluster

Hang Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

32 papers
2 author rows

Possible papers

32

AAAI Conference 2026 Conference Paper

Can Protective Watermarking Safeguard the Copyright of 3D Gaussian Splatting?

  • Wenkai Huang
  • Yijia Guo
  • Gaolei Li
  • Lei Ma
  • Hang Zhang
  • Liwen Hu
  • Jiazheng Wang
  • Jianhua Li

3D Gaussian Splatting (3DGS) has emerged as a powerful representation for 3D scenes, widely adopted due to its exceptional efficiency and high-fidelity visual quality. Given the significant value of 3DGS assets, recent works have introduced specialized watermarking schemes to ensure copyright protection and ownership verification. However, can existing 3D Gaussian watermarking approaches genuinely guarantee robust protection of the 3D assets? In this paper, for the first time, we systematically explore and validate possible vulnerabilities of 3DGS watermarking frameworks. We demonstrate that conventional watermark removal techniques designed for 2D images do not effectively generalize to the 3DGS scenario due to the specialized rendering pipeline and unique attributes of each gaussian primitives. Motivated by this insight, we propose GSPure, the first watermark purification framework specifically for 3DGS watermarking representations. By analyzing view-dependent rendering contributions and exploiting geometrically accurate feature clustering, GSPure precisely isolates and effectively removes watermark-related Gaussian primitives while preserving scene integrity. Extensive experiments demonstrate that our GSPure achieves the best watermark purification performance, reducing watermark PSNR by up to 16.34dB while minimizing degradation to original scene fidelity with less than 1dB PSNR loss. Moreover, it consistently outperforms existing methods in both effectiveness and generalization.

JBHI Journal 2026 Journal Article

Hierarchical Multi-View Graph Diffusion Weighted Model for Cancer Subtype Identification

  • Yunhe Wang
  • Hang Zhang
  • Zhengyu Du
  • Yanchi Su
  • Xiangtao Li

Accurate cancer subtype identification is crucial for personalized medicine, as it enables precise diagnosis based on molecular characteristics. With the advent of large-scale multi-omics data from various resources, researchers now have unprecedented opportunities to explore cancer subtypes comprehensively. However, the inherent complexity, high dimensionality, and heterogeneity of these datasets present significant statistical and computational challenges, often leading to suboptimal clustering performance when inter-omics heterogeneity is overlooked. To address these challenges, we propose a novel method called the Hierarchical Multi-view Graph Diffusion Weighted (HMGDW) model for cancer subtype identification. Our approach begins with the generation of multiple base clusterings through random feature sampling, effectively mitigating the impact of high dimensionality. These base clusterings are subsequently integrated via a late integration strategy to yield the consensus clustering result. Then, we introduce a graph diffusion weighted mechanism that prioritizes views with the most significant contributions to the unified graph representation. Lastly, we conducted extensive experiments on both generic multi-view datasets and multi-omics cancer multi-omics datasets. The experimental results demonstrate that HMGDW consistently outperforms several state-of-the-art methods, achieving robust and accurate clustering. Additionally, a case study on the acute myeloid leukemia (AML) dataset validates the practical efficacy of our model in identifying clinically relevant subtypes.

AAAI Conference 2026 Conference Paper

Multi-level Style Preference Optimization: An Adaptive Detection Framework for Human-Machine Hybrid Text

  • Zehao Wang
  • Lianwei Wu
  • Wenbo An
  • Hang Zhang
  • Yaxiong Wang

Large language model (LLM) generated texts now rival human quality, creating four text categories: purely machine-generated, machine-rewritten, machine-polished, and human-written content. Traditional detection methods face significant challenges in human-machine hybrid scenarios where LLMs perform rewriting or polishing, as existing approaches focus on single-level features and fail to capture subtle, multi-layered machine traces. To address this, we propose the Multi-level Style Preference Optimization (MSPO) framework, capturing machine style features at multiple granularities: sequence-level (overall consistency), phrase-level (distinctive n-gram patterns), and lexical-level (word selection distributions). We further incorporate four text complexity indicators (Type-Token Ratio, Average Sentence Length, Average Word Length, and Punctuation Ratio) to dynamically adjust optimization parameters based on human-machine text complexity differences, enhancing adaptability across diverse text types. Additionally, we construct a comprehensive detection dataset spanning three representative domains (scientific writing, news articles, and creative writing) across four text types (human-written, purely machine-generated, machine-rewritten, and machine-polished), generated using state-of-the-art LLMs for robust evaluation. Experimental results demonstrate that MSPO significantly outperforms existing methods across all text types. On the challenging rewritten texts, MSPO achieves up to 82.14% AUROC, representing an improvement of 11.15 percentage points over the strongest baseline ImBD, while maintaining robust cross-domain generalizability across scientific, news, and creative writing domains.

AAAI Conference 2026 Conference Paper

Persistent Autoregressive Mapping with Traffic Rules for Autonomous Driving

  • Shiyi Liang
  • Xinyuan Chang
  • Changjie Wu
  • Huiyuan Yan
  • Yifan Bai
  • Xinran Liu
  • Hang Zhang
  • Yujian Yuan

Safe autonomous driving requires both accurate HD map construction and persistent awareness of traffic rules, even when their associated signs are no longer visible. However, existing methods either focus solely on geometric elements or treat rules as temporary classifications, failing to capture their persistent effectiveness across extended driving sequences. In this paper, we present PAMR (Persistent Autoregressive Mapping with Traffic Rules), a novel framework that performs autoregressive co-construction of lane vectors and traffic rules from visual observations. Our approach introduces two key mechanisms: Map-Rule Co-Construction for processing driving scenes in temporal segments, and Map-Rule Cache for maintaining rule consistency across these segments. To properly evaluate continuous and consistent map generation, we develop MapDRv2, featuring improved lane geometry annotations. Extensive experiments demonstrate that PAMR achieves superior performance in joint vector-rule mapping tasks, while maintaining persistent rule effectiveness throughout extended driving sequences.

AAAI Conference 2026 Conference Paper

Plug-and-Play Clarifier: A Zero-Shot Multimodal Framework for Egocentric Intent Disambiguation

  • Sicheng Yang
  • Yukai Huang
  • Weitong Cai
  • Shitong Sun
  • You He
  • Jiankang Deng
  • Hang Zhang
  • Jifei Song

The performance of egocentric AI agents is fundamentally limited by multimodal intent ambiguity. This challenge arises from a combination of underspecified language, imperfect visual data, and deictic gestures, which frequently leads to task failure. Existing monolithic Vision-Language Models (VLMs) struggle to resolve these multimodal ambiguous inputs, often failing silently or hallucinating responses. To address these ambiguities, we introduce the Plug-and-Play Clarifier, a zero-shot and modular framework that decomposes the problem into discrete, solvable sub-tasks. Specifically, our framework consists of three synergistic modules: (1) a text clarifier that uses dialogue-driven reasoning to interactively disambiguate linguistic intent, (2) a vision clarifier that delivers real-time guidance feedback, instructing users to adjust their positioning for improved capture quality, and (3) a cross-modal clarifier with grounding mechanism that robustly interprets 3D pointing gestures and identifies the specific objects users are pointing to. Extensive experiments demonstrate that our framework improves the intent clarification performance of small language models (4-8B) by approximately 30%, making them competitive with significantly larger counterparts. We also observe consistent gains when applying our framework to these larger models. Furthermore, our vision clarifier increases corrective guidance accuracy by over 20%, and our cross-modal clarifier improves semantic answer accuracy for referential grounding by 5%. Overall, our method provides a plug-and-play framework that effectively resolves multimodal ambiguity and significantly enhances user experience in egocentric interaction.

AAAI Conference 2026 Conference Paper

SCoNE: Spherical Consistent Neighborhoods Ensemble for Effective and Efficient Multi-View Anomaly Detection

  • Yang Xu
  • Hang Zhang
  • Yixiao Ma
  • Ye Zhu
  • Kai Ming Ting

The core problem in multi-view anomaly detection is to represent local neighborhoods of normal instances consistently across all views. Recent approaches consider a representation of local neighborhood in each view independently, and then capture the consistent neighbors across all views via a learning process. They suffer from two key issues. First, there is no guarantee that they can capture consistent neighbors well, especially when the same neighbors are in regions of varied densities in different views, resulting in inferior detection accuracy. Second, the learning process has a high computational cost of O(N^2), rendering them inapplicable for large datasets. To address these issues, we propose a novel method termed Spherical Consistent Neighborhoods Ensemble (SCoNE). It has two unique features: (a) the consistent neighborhoods are represented with multi-view instances directly, requiring no intermediate representations as used in existing approaches; and (b) the neighborhoods have data-dependent properties, which lead to large neighborhoods in sparse regions and small neighborhoods in dense regions. The data-dependent properties enable local neighborhoods in different views to be represented well as consistent neighborhoods, without learning. This leads to O(N) time complexity. Empirical evaluations show that SCoNE has superior detection accuracy and runs orders-of-magnitude faster in large datasets than existing approaches.

AAAI Conference 2026 Conference Paper

Splats in Splats: Robust and Effective 3D Steganography Towards Gaussian Splatting

  • Yijia Guo
  • Wenkai Huang
  • Yang Li
  • Gaolei Li
  • Hang Zhang
  • Liwen Hu
  • Jianhua Li
  • Tiejun Huang

3D Gaussian splatting (3DGS) has demonstrated impressive 3D reconstruction performance with explicit scene representations. Given the widespread application of 3DGS in 3D reconstruction and generation tasks, there is an urgent need to protect the copyright of 3DGS assets. However, existing copyright protection techniques for 3DGS overlook the usability of 3D assets, posing challenges for practical deployment. Here we describe splats in splats, the first 3DGS steganography framework that embeds 3D content in 3DGS itself without modifying any attributes. To achieve this, we take a deep insight into spherical harmonics (SH) and devise an importance-graded SH coefficient encryption strategy to embed the hidden SH coefficients. Furthermore, we employ a convolutional autoencoder to establish a mapping between the original Gaussian primitives' opacity and the hidden Gaussian primitives' opacity. Extensive experiments indicate that our method significantly outperforms existing 3D steganography techniques, with 5.31% higher scene fidelity and 3x faster rendering speed, while ensuring security, robustness, and user experience.

AAAI Conference 2026 Conference Paper

UniMapGen: A Generative Framework for Large-Scale Map Construction from Multi-modal Data

  • Yujian Yuan
  • Changjie Wu
  • Xinyuan Chang
  • Sijin Wang
  • Hang Zhang
  • Shiyi Liang
  • Shuang Zeng
  • Mu Xu

Large-scale map construction is foundational for critical applications such as autonomous driving and navigation systems. Traditional large-scale map construction approaches mainly rely on costly and inefficient special data collection vehicles and labor-intensive annotation processes. While existing satellite-based methods have demonstrated promising potential in enhancing the efficiency and coverage of map construction, they exhibit two major limitations: (1) inherent drawbacks of satellite data (e.g., occlusions, outdatedness) and (2) inefficient vectorization from perception-based methods, resulting in discontinuous and rough roads that require extensive post-processing. This paper presents a novel generative framework, UniMapGen, for large-scale map construction, offering three key innovations: (1) representing lane lines as discrete sequence and establishing an iterative strategy to generate more complete and smooth map vectors than traditional perception-based methods. (2) proposing a flexible architecture that supports multi-modal inputs, enabling dynamic selection among BEV, PV, and text prompt, to overcome the drawbacks of satellite data. (3) developing a state update strategy for global continuity and consistency of the constructed large-scale map. UniMapGen achieves state-of-the-art performance on the OpenSatMap dataset. Furthermore, UniMapGen can infer occluded roads and predict roads missing from dataset annotations.

IJCAI Conference 2025 Conference Paper

GraphProt: Certified Black-Box Shielding Against Backdoored Graph Models

  • Xiao Yang
  • Yuni Lai
  • Kai Zhou
  • Gaolei Li
  • Jianhua Li
  • Hang Zhang

Graph learning models have been empirically proven to be vulnerable to backdoor threats, wherein adversaries submit trigger-embedded inputs to manipulate the model predictions. Current graph backdoor defenses manifest several limitations: 1) dependence on model-related details, 2) necessitation of additional fine-tuning, and 3) reliance on extra explainability tools, all of which are infeasible under stringent privacy policies. To address those limitations, we propose GraphProt, a certified black-box defense method to suppress backdoor attacks on GNN-based graph classifiers. Our GraphProt operates in a model-agnostic manner and solely leverages graph input. Specifically, GraphProt first introduces designed topology-feature-filtration to mitigate graph anomalies. Subsequently, subgraphs are sampled via a formulated strategy integrating topology and features, followed by a robust model inference through a majority vote-based subgraph prediction ensemble. Our results across benchmark attacks and datasets show GraphProt effectively reduces attack success rates while preserving regular graph classification accuracy.

ICLR Conference 2025 Conference Paper

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

  • Zehan Wang 0001
  • Ziang Zhang
  • Minjie Hong
  • Hang Zhang
  • Luping Liu
  • Rongjie Huang 0001
  • Xize Cheng
  • Shengpeng Ji

Recently, human-computer interaction with various modalities has shown promising applications, like GPT-4o and Gemini. Meanwhile, multimodal representation models have emerged as the foundation for these versatile multimodal understanding and generation pipeline. Models like CLIP, CLAP and ImageBind can map their specialized modalities into respective joint spaces. To construct a high-quality omni representation space that can be shared and expert in any modality, we propose to merge these advanced models into a unified space in scale. With this insight, we present \textbf{OmniBind}, advanced multimodal joint representation models via fusing knowledge of 14 pre-trained spaces, which support 3D, audio, image, video and language inputs. To alleviate the interference between different knowledge sources in integrated space, we dynamically assign weights to different spaces by learning routers with two objectives: cross-modal overall alignment and language representation decoupling. Notably, since binding and routing spaces only require lightweight networks, OmniBind is extremely training-efficient. Extensive experiments demonstrate the versatility and superiority of OmniBind as an omni representation model, highlighting its great potential for diverse applications, such as any-query and composable multimodal understanding.

NeurIPS Conference 2025 Conference Paper

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

  • Sicong Leng
  • Yun Xing
  • Zesen Cheng
  • Yang Zhou
  • Hang Zhang
  • Xin Li
  • Deli Zhao
  • Shijian Lu

Recent advancements in large multimodal models (LMMs) have significantly enhanced performance across diverse tasks, with ongoing efforts to further integrate additional modalities such as video and audio. However, most existing LMMs remain vulnerable to hallucinations, the discrepancy between the factual multimodal input and the generated textual output, which has limited their applicability in various real-world scenarios. This paper presents the first systematic investigation of hallucinations in LMMs involving the three most common modalities: language, visual, and audio. Our study reveals two key contributors to hallucinations: overreliance on unimodal priors and spurious inter-modality correlations. To address these challenges, we introduce the benchmark The Curse of Multi-Modalities (CMM), which comprehensively evaluates hallucinations in LMMs, providing a detailed analysis of their underlying issues. Our findings highlight key vulnerabilities, including imbalances in modality integration and biases from training data, underscoring the need for balanced cross-modal learning and enhanced hallucination mitigation strategies. Based on our observations and findings, we suggest potential research directions that could enhance the reliability of LMMs.

JAIR Journal 2025 Journal Article

Towards a Robust Persistence Diagram via Data-dependent Kernel

  • Hang Zhang
  • Kaifeng Zhang
  • Kai Ming Ting
  • Ye Zhu

Topological Data Analysis (TDA) is used to extract topological features such as rings from point clouds. Recent works have identified that existing methods, which construct persistence diagrams in TDA, are not robust to noise and varied densities in a point cloud. This causes these methods to obtain incorrect topological features. We analyze the necessary properties of an approach that can address these two issues, and propose a new filter function for TDA based on a new data-dependent kernel that possesses these properties. Our empirical evaluation reveals that (i) the proposed kernel provides a better mean for UMAP dimensionality reduction (ii) the proposed filter function can significantly improve the performance of Topological Point Cloud Clustering (iii) the proposed filter function is a more effective way of constructing Persistence Diagram for t-SNE visualization and SVM classification than three existing methods of TDA, In addition, we explore the proposed filter’s performance on a more complex deformation named Riemannian stretching. Our proposed filter equipped with Sample Fermat distance outperforms all the other filters when noise and Riemannian stretching coexist. Code is available at https://github.com/IsolationKernel/Codes/tree/main/Lambda-kernel.

NeurIPS Conference 2024 Conference Paper

Empowering and Assessing the Utility of Large Language Models in Crop Science

  • Hang Zhang
  • Jiawei Sun
  • Renqi Chen
  • Wei Liu
  • Zhonghang Yuan
  • Xinzhe Zheng
  • Zhefan Wang
  • Zhiyuan Yang

Large language models (LLMs) have demonstrated remarkable efficacy across knowledge-intensive tasks. Nevertheless, their untapped potential in crop science presents an opportunity for advancement. To narrow this gap, we introduce CROP, which includes a novel instruction tuning dataset specifically designed to enhance LLMs’ professional capabilities in the crop science sector, along with a benchmark that serves as a comprehensive evaluation of LLMs’ understanding of the domain knowledge. The CROP dataset is curated through a task-oriented and LLM-human integrated pipeline, comprising 210, 038 single-turn and 1, 871 multi-turn dialogues related to crop science scenarios. The CROP benchmark includes 5, 045 multiple-choice questions covering three difficulty levels. Our experiments based on the CROP benchmark demonstrate notable enhancements in crop science-related tasks when LLMs are fine-tuned with the CROP dataset. To the best of our knowledge, CROP dataset is the first-ever instruction tuning dataset in the crop science domain. We anticipate that CROP will accelerate the adoption of LLMs in the domain of crop science, ultimately contributing to global food production.

NeurIPS Conference 2024 Conference Paper

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

  • Yongxin Zhu
  • Bocheng Li
  • Hang Zhang
  • Xin Li
  • Linli Xu
  • Lidong Bing

Latent-based image generative models, such as Latent Diffusion Models (LDMs) and Mask Image Models (MIMs), have achieved notable success in image generation tasks. These models typically leverage reconstructive autoencoders like VQGAN or VAE to encode pixels into a more compact latent space and learn the data distribution in the latent space instead of directly from pixels. However, this practice raises a pertinent question: Is it truly the optimal choice? In response, we begin with an intriguing observation: despite sharing the same latent space, autoregressive models significantly lag behind LDMs and MIMs in image generation. This finding contrasts sharply with the field of NLP, where the autoregressive model GPT has established a commanding presence. To address this discrepancy, we introduce a unified perspective on the relationship between latent space and generative models, emphasizing the stability of latent space in image generative modeling. Furthermore, we propose a simple but effective discrete image tokenizer to stabilize the latent space for image generative modeling by applying K-Means on the latent features of self-supervised learning models. Experimental results show that image autoregressive modeling with our tokenizer (DiGIT) benefits both image understanding and image generation with the next token prediction principle, which is inherently straightforward for GPT models but challenging for other generative models. Remarkably, for the first time, a GPT-style autoregressive model for images outperforms LDMs, which also exhibits substantial improvement akin to GPT when scaling up model size. Our findings underscore the potential of an optimized latent space and the integration of discrete tokenization in advancing the capabilities of image generative models. The code is available at \url{https: //github. com/DAMO-NLP-SG/DiGIT}.

NeurIPS Conference 2023 Conference Paper

Bounded rationality in structured density estimation

  • Tianyuan Teng
  • Kevin Li
  • Hang Zhang

Learning to accurately represent environmental uncertainty is crucial for adaptive and optimal behaviors in various cognitive tasks. However, it remains unclear how the human brain, constrained by finite cognitive resources, constructs an internal model from an infinite space of probability distributions. In this study, we explore how these learned distributions deviate from the ground truth, resulting in observable inconsistency in a novel structured density estimation task. During each trial, human participants were asked to form and report the latent probability distribution functions underlying sequentially presented independent observations. As the number of observations increased, the reported predictive density became closer to the ground truth. Nevertheless, we observed an intriguing inconsistency in human structure estimation, specifically a large error in the number of reported clusters. Such inconsistency is invariant to the scale of the distribution and persists across stimulus modalities. We modeled uncertainty learning as approximate Bayesian inference in a nonparametric mixture prior of distributions. Human reports were best explained under resource rationality embodied in a decaying tendency towards model expansion. Our study offers insights into human cognitive processes under uncertainty and lays the groundwork for further exploration of resource-rational representations in the brain under more complex tasks.

IJCAI Conference 2023 Conference Paper

Spatially Covariant Lesion Segmentation

  • Hang Zhang
  • Rongguang Wang
  • Jinwei Zhang
  • Dongdong Liu
  • Chao Li
  • Jiahao Li

Compared to natural images, medical images usually show stronger visual patterns and therefore this adds flexibility and elasticity to resource-limited clinical applications by injecting proper priors into neural networks. In this paper, we propose spatially covariant pixel-aligned classifier (SCP) to improve the computational efficiency and meantime maintain or increase accuracy for lesion segmentation. SCP relaxes the spatial invariance constraint imposed by convolutional operations and optimizes an underlying implicit function that maps image coordinates to network weights, the parameters of which are obtained along with the backbone network training and later used for generating network weights to capture spatially covariant contextual information. We demonstrate the effectiveness and efficiency of the proposed SCP using two lesion segmentation tasks from different imaging modalities: white matter hyperintensity segmentation in magnetic resonance imaging and liver tumor segmentation in contrast-enhanced abdominal computerized tomography. The network using SCP has achieved 23. 8, 64. 9 and 74. 7 reduction in GPU memory usage, FLOPs, and network size with similar or better accuracy for lesion segmentation.

JBHI Journal 2022 Journal Article

Symmetry-Aware Deep Learning for Cerebral Ventricle Segmentation With Intra-Ventricular Hemorrhage

  • Yineng Hua
  • Zengqiang Yan
  • Zhuo Kuang
  • Hang Zhang
  • Xianbo Deng
  • Li Yu

Cerebral ventricles are one of the prominent structures in the brain, segmenting which can provide rich information for brain-related disease diagnosis. Unfortunately, cerebral ventricle segmentation in complex clinical cases, such as in the coexistence with other lesions/hemorrhages, remains unexplored. In this paper, we, for the first time, focus on cerebral ventricle segmentation with the presence of intra-ventricular hemorrhages (IVH). To overcome the occlusions formed by IVH, we propose a symmetry-aware deep learning approach inspired by contrastive self-supervised learning. Specifically, for each slice, we jointly employ the raw slice and the horizontally flipped slice as inputs and penalize the consistency loss between the corresponding segmentation maps in addition to their segmentation losses. In this way, the symmetry of cerebral ventricles is enforced to eliminate the occlusions brought by IVH. Extensive experimental results show that the proposed symmetry-aware deep learning approach achieves consistent performance improvements for ventricle segmentation in both normal ( i. e. without IVH) and challenging cases ( i. e. with IVH). Through evaluation of multiple backbone networks, we demonstrate the architecture-independence of the proposed approach for performance improvements. Moreover, we re-design an end-to-end version of symmetry-aware deep learning, making it more extendable to other approaches for brain-related analysis.

AAAI Conference 2021 Conference Paper

Efficient Folded Attention for Medical Image Reconstruction and Segmentation

  • Hang Zhang
  • Jinwei Zhang
  • Rongguang Wang
  • Qihao Zhang
  • Pascal Spincemaille
  • Thanh D. Nguyen
  • Yi Wang

Recently, 3D medical image reconstruction (MIR) and segmentation (MIS) based on deep neural networks have been developed with promising results, and attention mechanism has been further designed for performance enhancement. However, the large size of 3D volume images poses a great computational challenge to traditional attention methods. In this paper, we propose a folded attention (FA) approach to improve the computational efficiency of traditional attention methods on 3D medical images. The main idea is that we apply tensor folding and unfolding operations to construct four small sub-affinity matrices to approximate the original affinity matrix. Through four consecutive sub-attention modules of FA, each element in the feature tensor can aggregate spatial-channel information from all other elements. Compared to traditional attention methods, with the moderate improvement of accuracy, FA can substantially reduce the computational complexity and GPU memory consumption. We demonstrate the superiority of our method on two challenging tasks for 3D MIR and MIS, which are quantitative susceptibility mapping and multiple sclerosis lesion segmentation.

JMLR Journal 2020 Journal Article

GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing

  • Jian Guo
  • He He
  • Tong He
  • Leonard Lausen
  • Mu Li
  • Haibin Lin
  • Xingjian Shi
  • Chenguang Wang

We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customization. Leveraging the MXNet ecosystem, the deep learning models in GluonCV and GluonNLP can be deployed onto a variety of platforms with different programming languages. The Apache 2.0 license has been adopted by GluonCV and GluonNLP to allow for software distribution, modification, and usage. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

RLDM Conference 2013 Conference Abstract

Learned Myopic or Far-Sighted: Experience Shapes Human Temporal Horizon in Sequential

  • Hang Zhang
  • Hyoseok Kim
  • Nathaniel Daw
  • Laurence Maloney

We investigated how well people make sequential decisions to achieve the long-term goal. In video-game-like settings, a spaceship flew across a row of three mountains of increasing heights. Before each mountain, subjects could elevate the spaceship by either a constant and small height (CS) or a variant but on average larger height (VL) to avoid crashing. The goal was to survive beyond the last mountain. The optimal choice before a specific mountain depended on the heights of all future mountains. We tested whether subjects could learn the optimal policy or base their choices only on a short horizon, i. e. on the immediate mountain. Methods: We constructed two combinations of mountain heights, A and B, which differed in how early a short horizon would be penalized. For A, a short horizon would yield the optimal choice before the first mountain and not increase crash rate until the last mountain. In contrast, for B, a short horizon would increase crash rate as early as the second mountain. Each subject completed 4 blocks of 60 trials, in the block order of ABAB or BABA. Sixteen naı̈ve subjects were evenly assigned to the two groups. Results: The two groups differed in their learning trajectories. (1) The ABAB group achieved a higher probability of survival in the last (. 63) than in the first two blocks (. 50), but the BABA did not (both. 54). (2) The ABAB had a shorter horizon than the BABA: When VL was the optimal choice and involved long- term considerations, ABAB chose VL less than BABA did (53 % vs. 67 %). (3) The BABA appeared to be far-sighted: When CS was the optimal choice and reduced crash at the immediate mountain, BABA chose CS less than ABAB did (60 % vs. 81 %). Conclusion: Human individuals’ temporal horizon in a sequential- decision task depends on their initial experience with the task. People may learn to be myopic or far-sighted.