Author name cluster

Xin Dong

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers

2 author rows

EAAI Journal 2026 Journal Article

Distribution-aware neural network: A novel patient representation learning algorithm for Traditional Chinese medicine diagnosis prediction

Zongyao Zhao
Xin Dong
Xinpeng Song
Chenxi Zhao
Weiyu Li
Zuoyuan Luo
Geyan Pan
Sicen Wang

Traditional Chinese medicine (TCM) diagnosis involves complex and implicit associations between heterogeneous symptoms and diagnostic patterns, and distributional heterogeneity across diseases and patients. Existing intelligent diagnostic models focus primarily on architectural optimization but lack explicit modeling of underlying symptom-diagnosis distributions, resulting in limited robustness, cross-disease generalization, and interpretability. To address these challenges, we propose a distribution-aware neural network (DANN) for diagnostic representation learning. The proposed framework incorporates explicit representations of both global and class-conditional feature distributions and integrates discriminative pruning and latent structure decomposition to capture population-level diagnostic regularities and fine-grained differential variations. In addition, we introduce a cross-disease clinical dataset (TCM-Chronic) covering 15 chronic diseases, 5876 clinical cases, and 97 diagnostic labels to simulate real-world comorbidity scenarios. Experiments on both a public multilabel dataset (TCM-Lung) and a cross-disease dataset demonstrate that the DANN consistently outperforms state-of-the-art machine learning, deep learning, and large language model baselines. With respect to TCM-Lung, the F1-score of the DANN is 0. 5112, which is 3. 6 percentage points greater than that of the strongest baseline. With respect to TCM-Chronic, the DANN achieves an F1-score of 0. 7146, outperforming the random forest by 7. 21 percentage points. Ablation and expert evaluations further confirm that distribution-aware modeling contributes to increased diagnostic robustness and better interpretability. These results indicate that explicitly modeling diagnostic feature distributions provides an effective paradigm for intelligent diagnosis, with potential applicability beyond TCM to broader clinical decision-support tasks.

Details DOI

IROS Conference 2025 Conference Paper

An Easy Method for Extrinsic Calibration of Camera and Time-of-Flight Sensor

Tianyou Zhang
Jing Liu
Dragos Axinte
Xin Dong

A multi-zone (typically 8×8) time-of-flight (ToF) sensor offers a low-cost, low-power, and compact solution for range measurement, making it ideal for specialized robotic applications. However, its low resolution limits its usability. Pairing a ToF sensor with a camera enhances depth perception and can solve the unscaled metric problem in mono depth estimation. Advances in deep learning further enable high-quality depth map reconstruction from ToF-camera data, providing a cost-effective alternative. However, accurate ToF-camera calibration remains a challenge due to ToF sensor’s coarse depth output. This work presents a simple yet effective method for the extrinsic calibration of a ToF sensor with an RGB camera using only a chessboard and two whiteboards. A tailored two-plane fitting algorithm is proposed specifically for the ToF sensor. Moreover, our approach leverages parallel lines with vanishing points and geometric constraints from plane intersections. This eliminates the need for robotic arm movements or SLAM-based sensor pose reconstruction, significantly reducing complexity while maintaining high accuracy. Experimental results demonstrate that our method lowers the root mean square (RMS) depth difference from 96. 59 mm to 67. 89 mm, underscoring its effectiveness in practical applications. Code is publicly available in https://github.com/Tianyou-Nottingham/ToF-Camera-Calibration.

Details

EAAI Journal 2025 Journal Article

Consistency-based decision-making method with linguistic Q-rung orthopair fuzzy preference relation for power battery selection of new energy vehicles

Xin Dong
Peide Liu
Peng Wang
Xiaoming Wu

In the era of global petrochemical depletion and increasingly serious environmental pollution, new energy vehicles, as a key industry to build a sustainable low-carbon society, have been paid more and more attention by countries all over the world. As the “heart” of new energy vehicles, power battery plays an important role in the core competitiveness of enterprises. Aiming at the fuzziness and uncertainty of complex power battery selection, a two-stage consistency optimization model based on preference relations and an interactive consistency improvement process are established in this paper. Firstly, by considering the interaction between membership and non-membership, this paper proposes an improved linguistic q-rung orthopair fuzzy weighted averaging operator. Then, the concept of linguistic q-rung orthopair fuzzy preference relation (Lq-ROFPR) is proposed, and its additive consistency index is given based on linguistic scaling function. Whereafter, for the Lq-ROFPR with unacceptable consistency, an interactive mechanism is proposed to improve the consistency level, which considers the minimum adjustment size of preference modification and the minimum number of adjustment elements in turn. Moreover, the method for solving the multi-attribute decision-making problems is formed and applied to the selection of power batteries in XP automobile company. Finally, the simulation experiment and comparative analysis with other methods show the effectiveness and rationality of this method in consistency optimization.

Details DOI

AAAI Conference 2025 Conference Paper

DOGR: Leveraging Document-Oriented Contrastive Learning in Generative Retrieval

Penghao Lu
Xin Dong
Yuansheng Zhou
Lei Cheng
Chuan Yuan
Linjian Mo

Generative retrieval constitutes an innovative approach in information retrieval, leveraging generative language models(LM) to generate a ranked list of document identifiers (docid) for a given query. It simplifies the retrieval pipeline by replacing the large external index with model parameters. However, existing works merely learned the relationship between queries and document identifiers, which is unable to directly represent the relevance between queries and documents. To address the above problem, we propose a novel and general generative retrieval framework, namely Leveraging Document-Oriented Contrastive Learning in Generative Retrieval (DOGR), which leverages contrastive learning to improve generative retrieval tasks. It adopts a two-stage learning strategy that captures the relationship between queries and documents comprehensively through direct interactions. Furthermore, negative sampling methods and corresponding contrastive learning objectives are implemented to enhance the learning of semantic representations, thereby promoting a thorough comprehension of the relationship between queries and documents. Experimental results demonstrate that DOGR achieves state-of-the-art performance compared to existing generative retrieval methods on two public benchmark datasets. Further experiments have shown that our framework is generally effective for common identifier construction techniques.

PDF Details DOI

AAAI Conference 2025 Conference Paper

DreamFit: Garment-Centric Human Generation via a Lightweight Anything-Dressing Encoder

Ente Lin
Xujie Zhang
Fuwei Zhao
Yuxuan Luo
Xin Dong
Long Zeng
Xiaodan Liang

Diffusion models for garment-centric human generation from text or image prompts have garnered emerging attention for their great application potential. However, existing methods often face a dilemma: lightweight approaches, such as adapters, are prone to generate inconsistent textures; while finetune-based methods involve high training costs and struggle to maintain the generalization capabilities of pretrained diffusion models, limiting their performance across diverse scenarios. To address these challenges, we propose DreamFit, which incorporates a lightweight Anything-Dressing Encoder specifically tailored for the garment-centric human generation. DreamFit has three key advantages: (1) Lightweight training: with the proposed adaptive attention and LoRA modules, DreamFit significantly minimizes the model complexity to 83.4M trainable parameters. (2) Anything-Dressing: Our model generalizes surprisingly well to a wide range of (non-)garments, creative styles, and prompt instructions, consistently delivering high-quality results across diverse scenarios. (3) Plug-and-play: DreamFit is engineered for smooth integration with any community control plugins for diffusion models, ensuring easy compatibility and minimizing adoption barriers. To further enhance generation quality, DreamFit leverages pretrained large multi-modal models (LMMs) to enrich the prompt with fine-grained garment descriptions, thereby reducing the prompt gap between training and inference. We conduct comprehensive experiments on both 768 x 512 high-resolution benchmarks and in-the-wild images. DreamFit surpasses all existing methods, highlighting its state-of-the-art capabilities of garment-centric human generation.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Event-based HDR Structured Light

Jiacheng Fu
Yue Li
Xin Dong
Wenming Weng
Yueyi Zhang
Zhiwei Xiong

Event-based structured light (SL) systems have attracted increasing attention for their potential in high-performance 3D measurement. Despite the inherent HDR capability of event cameras, reflective and absorptive surfaces still cause event cluttering and absence, which produce overexposed and underexposed regions that degrade the reconstruction quality. In this work, we present the first HDR 3D measurement framework specifically designed for event-based SL systems. First, we introduce a multi-contrast HDR coding strategy that facilitates imaging of areas with different reflectance. Second, to alleviate inter-frame interference caused by overexposed and underexposed areas, we propose a universal confidence-driven stereo matching strategy. Specifically, we estimate a confidence map as the fusion weight for features via an energy-guided confidence estimation. Further, we propose the confidence propagation volume, an innovative cost volume that offers both effective suppression of inter-frame interference and strong representation capability. Third, we contribute an event-based SL simulator and propose the first event-based HDR SL dataset. We also collect a real-world benchmarking dataset with ground truth. We validate the effectiveness of our method with the proposed confidence-driven strategy on both synthetic and real-world datasets. Experimental results demonstrate that our proposed HDR framework enables accurate 3D measurement even under extreme conditions.

PDF Details

NeurIPS Conference 2025 Conference Paper

Nemotron-CLIMB: Clustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Shizhe Diao
Yu Yang
Yonggan Fu
Xin Dong
Dan Su
Markus Kliegl
ZIJIA CHEN
Peter Belcak

Pre-training datasets are typically collected from web content and lack inherent domain divisions. For instance, widely used datasets like Common Crawl do not include explicit domain labels, while manually curating labeled datasets such as The Pile is labor-intensive. Consequently, identifying an optimal pre-training data mixture remains a challenging problem, despite its significant benefits for pre-training performance. To address these challenges, we propose CLustering-based Iterative Data Mixture Bootstrapping (Nemotron-CLIMB), an automated framework that discovers, evaluates, and refines data mixtures in a pre-training setting. Specifically, Nemotron-CLIMB embeds and clusters large-scale datasets in a semantic space and then iteratively searches for optimal mixtures using a smaller proxy model and a predictor. This strategy enables effective domain adaptation without relying solely on curated data. When continuously trained on 400B tokens with this mixture, our 1B model exceeds the state-of-the-art Llama-3. 2-1B by 2. 0%. Moreover, we observe that optimizing for a specific domain (e. g. , Social Sciences) yields a 5% improvement over random sampling. Finally, we introduce Nemotron-ClimbLab, a filtered 1. 2-trillion-token corpus with 20 clusters as a research playground, and Nemotron-ClimbMix, a compact yet powerful 400-billion-token dataset designed for efficient pre-training that delivers superior performance under an equal token budget. We analyze the final data mixture, elucidating the characteristics of an optimal data mixture.

PDF Details

NeurIPS Conference 2025 Conference Paper

Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

Yonggan Fu
Xin Dong
Shizhe Diao
Matthijs Van keirsbilck
Hanrong Ye
Wonmin Byeon
Yashaswi Karnati
Lucas Liebenwein

Efficient deployment of small language models (SLMs) is essential for numerous real-world applications with stringent latency constraints. While previous work on SLM design has primarily focused on reducing the number of parameters to achieve parameter-optimal SLMs, parameter efficiency does not necessarily translate into proportional real-device speed-ups. This work aims to identify the key determinants of SLMs' real-device latency and offer generalizable principles and methodologies for SLM design and training when real-device latency is the primary consideration. Specifically, we identify two central architectural factors: depth–width ratios and operator choices. The former is crucial for small-batch-size latency, while the latter affects both latency and large-batch-size throughput. In light of this, we first study latency-optimal depth–width ratios, with the key finding that although deep–thin models generally achieve better accuracy under the same parameter budget, they may not lie on the accuracy–latency trade-off frontier. Next, we explore emerging efficient attention alternatives to evaluate their potential as candidate building operators. Using the identified promising operators, we construct an evolutionary search framework to automatically discover latency-optimal combinations of these operators within hybrid SLMs, thereby advancing the accuracy–latency frontier. In addition to architectural improvements, we further enhance SLM training using a weight normalization technique that enables more effective weight updates and improves final convergence. This technique can serve as a generalizable component for future SLMs. Combining these methods, we introduce a new family of hybrid SLMs, called Nemotron-Flash, which significantly advances the accuracy–efficiency frontier of state-of-the-art SLMs, e. g. , achieving over +5. 5\% average accuracy, 1. 3$\times$/1. 9$\times$ lower latency, and 18. 7$\times$/45. 6$\times$ higher throughput compared to Qwen3-1. 7B/0. 6B, respectively.

PDF Details

JBHI Journal 2025 Journal Article

NRAG: A Knowledge-Enhanced LLM Framework for Interpretable Neurosurgical Disease Diagnosis in Outpatient and Emergency Settings

Haoyu Tian
Yiming Liu
Xinyu Dai
Xin Dong
Jian Yu
Wei Wei
Boran Wang
Xuezhong Zhou

Large language models (LLMs) have achieved state-of-the-art performance in numerous domains, yet their clinical deployment faces critical barriers, particularly insufficient reasoning in complex scenarios and limited interpretability. These challenges are exacerbated in neurosurgical diagnosis for outpatient and emergency settings, where time-sensitive decision-making, fragmented data, and complex comorbidities render conventional free-text-based modeling approaches unreliable. To address the limitations of existing LLMs in medical auxiliary diagnosis, particularly in interpretability and predictive performance, this study proposed NRAG, an auxiliary diagnosis method that combines LLMs with knowledge graphs (KGs). It extracts symptom descriptions from clinical records and performs personalized retrieval of associated paths in KG, and supplements potential patient symptoms to optimize the diagnosis model. Comparative experiments involving multiple general-domain and medical-domain LLMs, along with case studies, were conducted to validate the NRAG's effectiveness. Experimental results demonstrate that integrating KG significantly improves diagnosis accuracy, achieving an F1-score of 0. 8150. It also substantially improves model interpretability and performs excellently in expert evaluations. Ablation studies and comparative experiments with other general-domain and medical-domain LLMs confirm the superior performance of the proposed NRAG. NRAG effectively supplements missing symptom information and provides knowledge-path-based evidence for diagnosis results, while improving the precision and interpretability of intelligent diagnosis. Furthermore, this approach sets the foundation for intelligent diagnoses in neurosurgery while providing a methodological framework for the integration of in-depth clinical data mining with medical knowledge base resources.

Details DOI

NeurIPS Conference 2025 Conference Paper

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu
Shizhe Diao
Ximing Lu
Jian Hu
Xin Dong
Yejin Choi
Jan Kautz
Yi Dong

Recent advances in reasoning-centric language models have highlighted reinforcement learning (RL) as a promising method for aligning models with verifiable rewards. However, it remains contentious whether RL truly expands a model’s reasoning capabilities or merely amplifies high-reward outputs already latent in the base model’s distribution, and whether continually scaling up RL compute reliably leads to improved reasoning performance. In this work, we challenge prevailing assumptions by demonstrating that prolonged RL (ProRL) training can uncover novel reasoning strategies that are inaccessible to base models, even under extensive sampling. We introduce ProRL, a novel training methodology that incorporates KL divergence control, reference policy resetting, and a diverse suite of tasks. Our empirical analysis reveals that RL-trained models consistently outperform base models across a wide range of pass@$k$ evaluations, including scenarios where base models fail entirely regardless of the number of attempts. We further show that reasoning boundary improvements correlates strongly with task competence of base model and training duration, suggesting that RL can explore and populate new regions of solution space over time. These findings offer new insights into the conditions under which RL meaningfully expands reasoning boundaries in language models and establish a foundation for future work on long-horizon RL for reasoning. We will release model weights and data to support further research.

PDF Details

NeurIPS Conference 2025 Conference Paper

VisualLens: Personalization through Task-Agnostic Visual History

Wang Bill Zhu
Deqing Fu
Kai Sun
Yi Lu
Zhaojiang Lin
Seungwhan Moon
Kanika Narang
Mustafa Canim

Existing recommendation systems either rely on user interaction logs, such as online shopping history for shopping recommendations, or focus on text signals. However, item-based histories are not always accessible and generalizable for multimodal recommendation. We hypothesize that a user's visual history --- comprising images from daily life --- can offer rich, task-agnostic insights into their interests and preferences, and thus be leveraged for effective personalization. To this end, we propose VisualLens, a novel framework that leverages multimodal large language models (MLLMs) to enable personalization using task-agnostic visual history. VisualLens extracts, filters, and refines a spectrum user profile from the visual history to support personalized recommendation. We created two new benchmarks, Google-Review-V and Yelp-V, with task-agnostic visual histories, and show that VisualLens improves over state-of-the-art item-based multimodal recommendations by 5-10\% on Hit@3, and outperforms GPT-4o by 2-5\%. Further analysis shows that VisualLens is robust across varying history lengths and excels at adapting to both longer histories and unseen content categories.

PDF Details

ICLR Conference 2024 Conference Paper

The Cost of Scaling Down Large Language Models: Reducing Model Size Affects Memory before In-context Learning

Tian Jin
Nolan Clement
Xin Dong
Vaishnavh Nagarajan
Michael Carbin
Jonathan Ragan-Kelley
Gintare Karolina Dziugaite

We study how down-scaling large language model (LLM) size impacts LLM capabilities. We begin by measuring the effects of weight pruning – a popular technique for reducing model size – on the two abilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in context. Surprisingly, we find that existing pruning techniques affect these two abilities of LLMs differently. For example, pruning more than 30% of weights significantly decreases an LLM’s ability to recall facts presented during pre-training. Yet pruning 60-70% of weights largely preserves an LLM’s ability to process information in-context, ranging from retrieving answers based on information presented in context to learning parameterized functions such as a linear classifier based on a few examples. Moderate pruning impairs LLM’s ability to recall facts learnt from pre-training. However, its effect on model’s ability to process information presented in context is much less pronounced. The said disparate effects similarly arise when replacing the original model with a smaller dense one with reduced width and depth. This similarity suggests that model size reduction in general underpins the said disparity.

Details

NeurIPS Conference 2023 Conference Paper

Is Heterogeneity Notorious? Taming Heterogeneity to Handle Test-Time Shift in Federated Learning

Yue Tan
Chen Chen
Weiming Zhuang
Xin Dong
Lingjuan Lyu
Guodong Long

Federated learning (FL) is an effective machine learning paradigm where multiple clients can train models based on heterogeneous data in a decentralized manner without accessing their private data. However, existing FL systems undergo performance deterioration due to feature-level test-time shifts, which are well investigated in centralized settings but rarely studied in FL. The common non-IID issue in FL usually refers to inter-client heterogeneity during training phase, while the test-time shift refers to the intra-client heterogeneity during test phase. Although the former is always deemed to be notorious for FL, there is still a wealth of useful information delivered by heterogeneous data sources, which may potentially help alleviate the latter issue. To explore the possibility of using inter-client heterogeneity in handling intra-client heterogeneity, we firstly propose a contrastive learning-based FL framework, namely FedICON, to capture invariant knowledge among heterogeneous clients and consistently tune the model to adapt to test data. In FedICON, each client performs sample-wise supervised contrastive learning during the local training phase, which enhances sample-wise invariance encoding ability. Through global aggregation, the invariance extraction ability can be mutually boosted among inter-client heterogeneity. During the test phase, our test-time adaptation procedure leverages unsupervised contrastive learning to guide the model to smoothly generalize to test data under intra-client heterogeneity. Extensive experiments validate the effectiveness of the proposed FedICON in taming heterogeneity to handle test-time shift problems.

PDF Details

AAAI Conference 2023 Conference Paper

REMIT: Reinforced Multi-Interest Transfer for Cross-Domain Recommendation

Caiqi Sun
Jiewei Gu
Binbin Hu
Xin Dong
Hai Li
Lei Cheng
Linjian Mo

Cold-start problem is one of the most challenging problems for recommender systems. One promising solution to this problem is cross-domain recommendation (CDR) which leverages rich information from an auxiliary source domain to improve the performance of recommender system in the target domain. In particular, the family of embedding and mapping methods for CDR is very effective, which explicitly learn a mapping function from source embeddings to target embeddings to transfer user’s preferences. Recent works usually transfer an overall source embedding by modeling a common or personalized preference bridge for all users. However, a unified user embedding cannot reflect the user’s multiple interests in auxiliary source domain. In this paper, we propose a novel framework called reinforced multi-interest transfer for CDR (REMIT). Specifically, we first construct a heterogeneous information network and employ different meta-path based aggregations to get user’s multiple interests in source domain, then transform different interest embeddings with different meta-generated personalized bridge functions for each user. To better coordinate the transformed user interest embeddings and the item embedding in target domain, we systematically develop a reinforced method to dynamically assign weights to transformed interests for different training instances and optimize the performance of target model. In addition, the REMIT is a general framework that can be applied upon various base models in target domain. Our extensive experimental results on large real-world datasets demonstrate the superior performance and compatibility of REMIT.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Multi-Task Recurrent Modular Networks

Dongkuan Xu
Wei Cheng
Xin Dong
Bo Zong
Wenchao Yu
Jingchao Ni
Dongjin Song
Xuchao Zhang

We consider the models of deep multi-task learning with recurrent architectures that exploit regularities across tasks to improve the performance of multiple sequence processing tasks jointly. Most existing architectures are painstakingly customized to learn task relationships for different problems, which is not flexible enough to model the dynamic task relationships and lacks generalization abilities to novel test-time scenarios. We propose multi-task recurrent modular networks (MT-RMN) that can be incorporated in any multi-task recurrent models to address the above drawbacks. MT-RMN consists of a shared encoder and multiple task-specific decoders, and recurrently operates over time. For better flexibility, it modularizes the encoder into multiple layers of sub-networks and dynamically controls the connection between these subnetworks and the decoders at different time steps, which provides the recurrent networks with varying degrees of parameter sharing for tasks with dynamic relatedness. For the generalization ability, MT-RMN aims to discover a set of generalizable sub-networks in the encoder that are assembled in different ways for different tasks. The policy networks augmented with the differentiable routers are utilized to make the binary connection decisions between the sub-networks. The experimental results on three multi-task sequence processing datasets consistently demonstrate the effectiveness of MT-RMN.

PDF Details

AAAI Conference 2020 Conference Paper

ABSent: Cross-Lingual Sentence Representation Mapping with Bidirectional GANs

Zuohui Fu
Yikun Xian
Shijie Geng
Yingqiang Ge
Yuting Wang
Xin Dong
Guang Wang
Gerard de Melo

A number of cross-lingual transfer learning approaches based on neural networks have been proposed for the case when large amounts of parallel text are at our disposal. However, in many real-world settings, the size of parallel annotated training data is restricted. Additionally, prior cross-lingual mapping research has mainly focused on the word level. This raises the question of whether such techniques can also be applied to effortlessly obtain cross-lingually aligned sentence representations. To this end, we propose an Adversarial Bidirectional Sentence Embedding Mapping (ABSent) framework, which learns mappings of cross-lingual sentence representations from limited quantities of parallel data. The experiments show that our method outperforms several technically more powerful approaches, especially under challenging low-resource circumstances. The source code is available from https: //github. com/zuohuif/ABSent along with relevant datasets.

PDF Details

AAAI Conference 2020 Conference Paper

Asymmetrical Hierarchical Networks with Attentive Interactions for Interpretable Review-Based Recommendation

Xin Dong
Jingchao Ni
Wei Cheng
Zhengzhang Chen
Bo Zong
Dongjin Song
Yanchi Liu
Haifeng Chen

Recently, recommender systems have been able to emit substantially improved recommendations by leveraging userprovided reviews. Existing methods typically merge all reviews of a given user (item) into a long document, and then process user and item documents in the same manner. In practice, however, these two sets of reviews are notably different: users’ reviews reﬂect a variety of items that they have bought and are hence very heterogeneous in their topics, while an item’s reviews pertain only to that single item and are thus topically homogeneous. In this work, we develop a novel neural network model that properly accounts for this important difference by means of asymmetric attentive modules. The user module learns to attend to only those signals that are relevant with respect to the target item, whereas the item module learns to extract the most salient contents with regard to properties of the item. Our multi-hierarchical paradigm accounts for the fact that neither are all reviews equally useful, nor are all sentences within each review equally pertinent. Extensive experimental results on a variety of real datasets demonstrate the effectiveness of our method.

PDF Details

AAAI Conference 2020 Conference Paper

RTN: Reparameterized Ternary Network

Yuhang Li
Xin Dong
Sai Qian Zhang
Haoli Bai
Yuanpeng Chen
Wei Wang

To deploy deep neural networks on resource-limited devices, quantization has been widely explored. In this work, we study the extremely low-bit networks which have tremendous speed-up, memory saving with quantized activation and weights. We ﬁrst bring up three omitted issues in extremely low-bit networks: the squashing range of quantized values; the gradient vanishing during backpropagation and the unexploited hardware acceleration of ternary networks. By reparameterizing quantized activation and weights vector with full precision scale and offset for ﬁxed ternary vector, we decouple the range and magnitude from direction to extenuate above problems. Learnable scale and offset can automatically adjust the range of quantized values and sparsity without gradient vanishing. A novel encoding and computation pattern are designed to support efﬁcient computing for our reparameterized ternary network (RTN). Experiments on ResNet- 18 for ImageNet demonstrate that the proposed RTN ﬁnds a much better efﬁciency between bitwidth and accuracy and achieves up to 26. 76% relative accuracy improvement compared with state-of-the-art methods. Moreover, we validate the proposed computation pattern on Field Programmable Gate Arrays (FPGA), and it brings 46. 46× and 89. 17× savings on power and area compared with the full precision convolution.

PDF Details

AAAI Conference 2018 Conference Paper

Cross-Lingual Propagation for Deep Sentiment Analysis

Xin Dong
Gerard de Melo

Across the globe, people are voicing their opinion in social media and various other online fora. Given such data, modern deep learning-based sentiment analysis methods excel at determining the sentiment polarity of what is being said about companies, products, etc. Unfortunately, such deep methods require signiﬁcant training data, while for many languages, resources and training data are scarce. In this work, we present a cross-lingual propagation algorithm that yields sentiment embedding vectors for numerous languages. We then rely on a dual-channel convolutional neural architecture to incorporate them into the network. This allows us to achieve gains in deep sentiment analysis across a range of languages and domains.

PDF Details

AAAI Conference 2017 Conference Paper

A Hybrid Collaborative Filtering Model with Deep Structure for Recommender Systems

Xin Dong
Lei Yu
Zhonghuo Wu
Yuxia Sun
Lingfeng Yuan
Fangxi Zhang

Collaborative ﬁltering(CF) is a widely used approach in recommender systems to solve many real-world problems. Traditional CF-based methods employ the user-item matrix which encodes the individual preferences of users for items for learning to make recommendation. In real applications, the rating matrix is usually very sparse, causing CF-based methods to degrade signiﬁcantly in recommendation performance. In this case, some improved CF methods utilize the increasing amount of side information to address the data sparsity problem as well as the cold start problem. However, the learned latent factors may not be effective due to the sparse nature of the user-item matrix and the side information. To address this problem, we utilize advances of learning effective representations in deep learning, and propose a hybrid model which jointly performs deep users and items’ latent factors learning from side information and collaborative ﬁltering from the rating matrix. Extensive experimental results on three real-world datasets show that our hybrid model outperforms other methods in effectively utilizing side information and achieves performance improvement.

PDF Details

NeurIPS Conference 2017 Conference Paper

Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon

Xin Dong
Shangyu Chen
Sinno Pan

How to develop slim and accurate deep neural networks has become crucial for real- world applications, especially for those employed in embedded systems. Though previous work along this research line has shown some promising results, most existing methods either fail to significantly compress a well-trained deep network or require a heavy retraining process for the pruned deep network to re-boost its prediction performance. In this paper, we propose a new layer-wise pruning method for deep neural networks. In our proposed method, parameters of each individual layer are pruned independently based on second order derivatives of a layer-wise error function with respect to the corresponding parameters. We prove that the final prediction performance drop after pruning is bounded by a linear combination of the reconstructed errors caused at each layer. By controlling layer-wise errors properly, one only needs to perform a light retraining process on the pruned network to resume its original prediction performance. We conduct extensive experiments on benchmark datasets to demonstrate the effectiveness of our pruning method compared with several state-of-the-art baseline methods. Codes of our work are released at: https: //github. com/csyhhu/L-OBS.

PDF Details

EAAI Journal 2013 Journal Article

Performance evaluation of subsea BOP control systems using dynamic Bayesian networks with imperfect repair and preventive maintenance

Baoping Cai
Yonghong Liu
Qian Fan
Yunwei Zhang
Shilin Yu
Zengkai Liu
Xin Dong

The work presents a dynamic Bayesian networks (DBN) modeling of series, parallel and 2-out-of-3 (2oo3) voting systems, taking account of common-cause failure, imperfect coverage, imperfect repair and preventive maintenance. Seven basic events of one, two or three component failure are proposed to model the common-cause failure of the three-components-systems. The imperfect coverage is modeled in the conditional probability table by defining a coverage factor. A multi-state degraded component is used to model the imperfect repair and preventive maintenance. Using the proposed method, a DBN modeling of a subsea blowout preventer (BOP) control system is built, and the reliability and availability are evaluated. The mutual information is researched in order to assess the important degree of basic events. The effects of degradation probability, failure rate and mean time to repair (MTTR) on the performances are studied. The results show that the repairs and maintenance can improve the system performance significantly, whereas the imperfect repair cannot degrade the system performance significantly in comparison with the perfect repair, and the preventive maintenance can improve the system performance slightly in comparison with the imperfect repair. In order to improve the performance of subsea BOP control system, the single surface components and the components with all-common-cause failure should given more attention. The influence of degradation probability on the performance is in the order of PLC, PC and ES. The influence of failure rate and MTTR on the performance is in the order of PLC, ES, PC, DO, DI and AI.

Details DOI