Arrow Research search

Author name cluster

Junjie Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers
2 author rows

Possible papers

19

AAAI Conference 2026 Conference Paper

Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems

  • Zengyu Zou
  • Jingyuan Wang
  • Yixuan Huang
  • Junjie Wu

This paper addresses the cooperative Multi-Vehicle Dynamic Pickup and Delivery Problem with Stochastic Requests (MVDPDPSR) and proposes an end-to-end centralized decision-making framework based on sequence-to-sequence, named Multi-Agent Pointer Transformer (MAPT). MVDPDPSR is an extension of the vehicle routing problem and a spatio-temporal system optimization problem, widely applied in scenarios such as on-demand delivery. Classical operations research methods face bottlenecks in computational complexity and time efficiency when handling large-scale dynamic problems. Although existing reinforcement learning methods have achieved some progress, they still encounter several challenges: 1) Independent decoding across multiple vehicles fails to model joint action distributions; 2) The feature extraction network struggles to capture inter-entity relationships; 3) The joint action space is exponentially large. To address these issues, we designed the MAPT framework, which employs a Transformer Encoder to extract entity representations, combines a Transformer Decoder with a Pointer Network to generate joint action sequences in an AutoRegressive manner, and introduces a Relation-Aware Attention module to capture inter-entity relationships. Additionally, we guide the model's decision-making using informative priors to facilitate effective exploration. Experiments on 8 datasets demonstrate that MAPT significantly outperforms existing baseline methods in terms of performance and exhibits substantial computational time advantages compared to classical operations research methods.

ICLR Conference 2025 Conference Paper

Asymmetric Factorized Bilinear Operation for Vision Transformer

  • Junjie Wu
  • Qilong Wang 0001
  • Jiangtao Xie
  • Pengfei Zhu 0001
  • Qinghua Hu

As a core component of Transformer-like deep architectures, a feed-forward network (FFN) for channel mixing is responsible for learning features of each token. Recent works show channel mixing can be enhanced by increasing computational burden or can be slimmed at the sacrifice of performance. Although some efforts have been made, existing works are still struggling to solve the paradox of performance and complexity trade-offs. In this paper, we propose an Asymmetric Factorized Bilinear Operation (AFBO) to replace FFN of vision transformer (ViT), which attempts to efficiently explore rich statistics of token features for achieving better performance and complexity trade-off. Specifically, our AFBO computes second-order statistics via a spatial-channel factorized bilinear operation for feature learning, which replaces a simple linear projection in FFN and enhances the feature learning ability of ViT by modeling second-order correlation among token features. Furthermore, our AFBO presents two structured-sparsity channel mapping strategies, namely Grouped Cross Channel Mapping (GCCM) and Overlapped Cycle Channel Mapping (OCCM). They decompose bilinear operation into grouped channel features by considering information interaction between groups, significantly reducing computational complexity while guaranteeing model performance. Finally, our AFBO is built with GCCM and OCCM in an asymmetric way, aiming to achieve a better trade-off. Note that our AFBO is model-agnostic, which can be flexibly integrated with existing ViTs. Experiments are conducted with twenty ViTs on various tasks, and the results show our AFBO is superior to its counterparts while improving existing ViTs in terms of generalization and robustness.

AAAI Conference 2025 Conference Paper

Bridging Traffic State and Trajectory for Dynamic Road Network and Trajectory Representation Learning

  • Chengkai Han
  • Jingyuan Wang
  • Yongyao Wang
  • Xie Yu
  • Hao Lin
  • Chao Li
  • Junjie Wu

Effective urban traffic management is vital for sustainable city development, relying on intelligent systems with machine learning tasks such as traffic flow prediction and travel time estimation. Traditional approaches usually focus on static road network and trajectory representation learning, and overlook the dynamic nature of traffic states and trajectories, which is crucial for downstream tasks. To address this gap, we propose TRACK, a novel framework to bridge traffic state and trajectory data for dynamic road network and trajectory representation learning. TRACK leverages graph attention networks (GAT) to encode static and spatial road segment features, and introduces a transformer-based model for trajectory representation learning. By incorporating transition probabilities from trajectory data into GAT attention weights, TRACK captures dynamic spatial features of road segments. Meanwhile, TRACK designs a traffic transformer encoder to capture the spatial-temporal dynamics of road segments from traffic state data. To further enhance dynamic representations, TRACK proposes a co-attentional transformer encoder and a trajectory-traffic state matching task. Extensive experiments on real-life urban traffic datasets demonstrate the superiority of TRACK over state-of-the-art baselines. Case studies confirm TRACK’s ability to capture spatial-temporal dynamics effectively.

TMLR Journal 2025 Journal Article

Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models

  • Junjie Wu
  • Tsz Ting Chung
  • Kai Chen
  • Dit-Yan Yeung

Despite the outstanding performance in vision-language reasoning, Large Vision-Language Models (LVLMs) might generate hallucinated contents that do not exist in the given image. Most existing LVLM hallucination benchmarks are constrained to evaluate the object-related hallucinations. However, the potential hallucination on the relations between two objects, i.e., relation hallucination, still lacks investigation. To remedy that, we design a unified framework to measure object and relation hallucination in LVLMs simultaneously. The core idea of our framework is to evaluate hallucinations in (object, relation, object) triplets extracted from LVLMs’ responses, making it easily generalizable to various vision-language tasks. Based on our framework, we further introduce Tri-HE, a novel Triplet-level Hallucination Evaluation benchmark which can be used to study both object and relation hallucination at the same time. With comprehensive evaluations on Tri-HE, we observe that the relation hallucination issue is even more serious than object hallucination among existing LVLMs, highlighting a previously neglected problem towards reliable LVLMs. Moreover, based on our findings, we design a simple training-free approach that effectively mitigates hallucinations for LVLMs.

NeurIPS Conference 2025 Conference Paper

UniGTE: Unified Graph–Text Encoding for Zero-Shot Generalization across Graph Tasks and Domains

  • Duo Wang
  • Yuan Zuo
  • Guangyue Lu
  • Junjie Wu

Generalizing to unseen graph tasks without task-specific supervision is challenging: conventional graph neural networks are typically tied to a fixed label space, while large language models (LLMs) struggle to capture graph structure. We introduce UniGTE, an instruction-tuned encoder–decoder framework that unifies structural and semantic reasoning. The encoder augments a pretrained autoregressive LLM with learnable alignment tokens and a structure-aware graph–text attention mechanism, enabling it to attend jointly to a tokenized graph and a natural-language task prompt while remaining permutation-invariant to node order. This yields compact, task-aware graph representations. Conditioned solely on these representations, a frozen LLM decoder predicts and reconstructs: it outputs the task answer and simultaneously paraphrases the input graph in natural language. The reconstruction objective regularizes the encoder to preserve structural cues. UniGTE is instruction-tuned on five datasets spanning node-, edge-, and graph-level tasks across diverse domains, yet requires no fine-tuning at inference. It achieves new state-of-the-art zero-shot results on node classification, link prediction, graph classification and graph regression under cross-task and cross-domain settings, demonstrating that tight integration of graph structure with LLM semantics enables robust, transferable graph reasoning.

NeurIPS Conference 2024 Conference Paper

LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings

  • Duo Wang
  • Yuan Zuo
  • Fengzhi Li
  • Junjie Wu

Zero-shot graph machine learning, especially with graph neural networks (GNNs), has garnered significant interest due to the challenge of scarce labeled data. While methods like self-supervised learning and graph prompt learning have been extensively explored, they often rely on fine-tuning with task-specific labels, limiting their effectiveness in zero-shot scenarios. Inspired by the zero-shot capabilities of instruction-fine-tuned large language models (LLMs), we introduce a novel framework named Token Embedding-Aligned Graph Language Model (TEA-GLM) that leverages LLMs as cross-dataset and cross-task zero-shot learners for graph machine learning. Concretely, we pretrain a GNN, aligning its representations with token embeddings of an LLM. We then train a linear projector that transforms the GNN's representations into a fixed number of graph token embeddings without tuning the LLM. A unified instruction is designed for various graph tasks at different levels, such as node classification (node-level) and link prediction (edge-level). These design choices collectively enhance our method's effectiveness in zero-shot learning, setting it apart from existing methods. Experiments show that our graph token embeddings help the LLM predictor achieve state-of-the-art performance on unseen datasets and tasks compared to other methods using LLMs as predictors. Our code is available at https: //github. com/W-rudder/TEA-GLM.

AAAI Conference 2023 Conference Paper

Spatio-Temporal Self-Supervised Learning for Traffic Flow Prediction

  • Jiahao Ji
  • Jingyuan Wang
  • Chao Huang
  • Junjie Wu
  • Boren Xu
  • Zhenhe Wu
  • Junbo Zhang
  • Yu Zheng

Robust prediction of citywide traffic flows at different time periods plays a crucial role in intelligent transportation systems. While previous work has made great efforts to model spatio-temporal correlations, existing methods still suffer from two key limitations: i) Most models collectively predict all regions' flows without accounting for spatial heterogeneity, i.e., different regions may have skewed traffic flow distributions. ii) These models fail to capture the temporal heterogeneity induced by time-varying traffic patterns, as they typically model temporal correlations with a shared parameterized space for all time periods. To tackle these challenges, we propose a novel Spatio-Temporal Self-Supervised Learning (ST-SSL) traffic prediction framework which enhances the traffic pattern representations to be reflective of both spatial and temporal heterogeneity, with auxiliary self-supervised learning paradigms. Specifically, our ST-SSL is built over an integrated module with temporal and spatial convolutions for encoding the information across space and time. To achieve the adaptive spatio-temporal self-supervised learning, our ST-SSL first performs the adaptive augmentation over the traffic flow graph data at both attribute- and structure-levels. On top of the augmented traffic graph, two SSL auxiliary tasks are constructed to supplement the main traffic prediction task with spatial and temporal heterogeneity-aware augmentation. Experiments on four benchmark datasets demonstrate that ST-SSL consistently outperforms various state-of-the-art baselines. Since spatio-temporal heterogeneity widely exists in practical datasets, the proposed framework may also cast light on other spatial-temporal applications. Model implementation is available at https://github.com/Echo-Ji/ST-SSL.

TIST Journal 2020 Journal Article

Dancing with Trump in the Stock Market

  • Kun Yuan
  • Guannan Liu
  • Junjie Wu
  • Hui Xiong

It is always deemed crucial to identify the key factors that could have significant impact on the stock market trend. Recently, an interesting phenomenon has emerged that some of President Trump’s posts in Twitter can surge into a dominant role on the stock market for a certain time period, although studies along this line are still in their infancy. Therefore, in this article, we study whether and how this new-rising information can help boost the performance of stock market prediction. Specifically, we have found that the echoing reinforced effect of financial news with Trump’s market-related tweets can influence the market movement—that is, some of Trump’s tweets directly impact the stock market in a short time, and the impact can be further intensified when it echoes with other financial news reports. Along this line, we propose a deep information echoing model to predict the hourly stock market trend, such as the rise and fall of the Dow Jones Industrial Average. In particular, to model the discovered echoing reinforced impact, we design a novel information echoing module with a gating mechanism in a sequential deep learning framework to capture the fused knowledge from both Trump’s tweets and financial news. Extensive experiments have been conducted on the real-world U.S. stock market data to validate the effectiveness of our model and its interpretability in understanding the usability of Trump’s posts. Our proposed deep echoing model outperforms other baselines by achieving the best accuracy of 60.42% and obtains remarkable accumulated profits in a trading simulation, which confirms our assumption that Trump’s tweets contain indicative information for short-term market trends. Furthermore, we find that Trump’s tweets about trade and political events are more likely to be associated with short-term market movement, and it seems interesting that the impact would not degrade as time passes.

TIST Journal 2020 Journal Article

Exploring Correlation Network for Cheating Detection

  • Ping Luo
  • Kai Shu
  • Junjie Wu
  • Li Wan
  • Yong Tan

The correlation network, typically formed by computing pairwise correlations between variables, has recently become a competitive paradigm to discover insights in various application domains, such as climate prediction, financial marketing, and bioinformatics. In this study, we adopt this paradigm to detect cheating behavior hidden in business distribution channels, where falsified big deals are often made by collusive partners to obtain lower product prices—a behavior deemed to be extremely harmful to the sale ecosystem. To this end, we assume that abnormal deals are likely to occur between two partners if their purchase-volume sequences have a strong negative correlation. This seemingly intuitive rule, however, imposes several research challenges. First, existing correlation measures are usually symmetric and thus cannot distinguish the different roles of partners in cheating. Second, the tick-to-tick correspondence between two sequences might be violated due to the possible delay of purchase behavior, which should also be captured by correlation measures. Finally, the fact that any pair of sequences could be correlated may result in a number of false-positive cheating pairs, which need to be corrected in a systematic manner. To address these issues, we propose a correlation network analysis framework for cheating detection. In the framework, we adopt an asymmetric correlation measure to distinguish the two roles, namely, cheating seller and cheating buyer, in a cheating alliance. Dynamic Time Warping is employed to address the time offset between two sequences in computing the correlation. We further propose two graph-cut methods to convert the correlation network into a bipartite graph to rank cheating partners, which simultaneously helps to remove false-positive correlation pairs. Based on a 4-year real-world channel dataset from a worldwide IT company, we demonstrate the effectiveness of the proposed method in comparison to competitive baseline methods.

AAAI Conference 2019 Conference Paper

SVM-Based Deep Stacking Networks

  • Jingyuan Wang
  • Kai Feng
  • Junjie Wu

The deep network model, with the majority built on neural networks, has been proved to be a powerful framework to represent complex data for high performance machine learning. In recent years, more and more studies turn to nonneural network approaches to build diverse deep structures, and the Deep Stacking Network (DSN) model is one of such approaches that uses stacked easy-to-learn blocks to build a parameter-training-parallelizable deep network. In this paper, we propose a novel SVM-based Deep Stacking Network (SVM-DSN), which uses the DSN architecture to organize linear SVM classifiers for deep learning. A BP-like layer tuning scheme is also proposed to ensure holistic and local optimizations of stacked SVMs simultaneously. Some good math properties of SVM, such as the convex optimization, is introduced into the DSN framework by our model. From a global view, SVM-DSN can iteratively extract data representations layer by layer as a deep neural network but with parallelizability, and from a local view, each stacked SVM can converge to its optimal solution and obtain the support vectors, which compared with neural networks could lead to interesting improvements in anti-saturation and interpretability. Experimental results on both image and text data sets demonstrate the excellent performances of SVM-DSN compared with some competitive benchmark models.

AAAI Conference 2018 Conference Paper

CD-CNN: A Partially Supervised Cross-Domain Deep Learning Model for Urban Resident Recognition

  • Jingyuan Wang
  • Xu He
  • Ze Wang
  • Junjie Wu
  • Nicholas Jing Yuan
  • Xing Xie
  • Zhang Xiong

Driven by the wave of urbanization in recent decades, the research topic about migrant behavior analysis draws great attention from both academia and the government. Nevertheless, subject to the cost of data collection and the lack of modeling methods, most of existing studies use only questionnaire surveys with sparse samples and non-individual level statistical data to achieve coarse-grained studies of migrant behaviors. In this paper, a partially supervised cross-domain deep learning model named CD-CNN is proposed for migrant/native recognition using mobile phone signaling data as behavioral features and questionnaire survey data as incomplete labels. Specifically, CD-CNN features in decomposing the mobile data into location domain and communication domain, and adopts a joint learning framework that combines two convolutional neural networks with a feature balancing scheme. Moreover, CD-CNN employs a three-step algorithm for training, in which the co-training step is of great value to partially supervised cross-domain learning. Comparative experiments on the city Wuxi demonstrate the high predictive power of CD-CNN. Two interesting applications further highlight the ability of CD-CNN for in-depth migrant behavioral analysis.

AAAI Conference 2017 Conference Paper

Collaborative Company Profiling: Insights from an Employee’s Perspective

  • Hao Lin
  • Hengshu Zhu
  • Yuan Zuo
  • Chen Zhu
  • Junjie Wu
  • Hui Xiong

Company profiling is an analytical process to build an indepth understanding of company’s fundamental characteristics. It serves as an effective way to gain vital information of the target company and acquire business intelligence. Traditional approaches for company profiling rely heavily on the availability of rich finance information about the company, such as finance reports and SEC filings, which may not be readily available for many private companies. However, the rapid prevalence of online employment services enables a new paradigm — to obtain the variety of company’s information from their employees’ online ratings and comments. This, in turn, raises the challenge to develop company pro- files from an employee’s perspective. To this end, in this paper, we propose a method named Company Profiling based Collaborative Topic Regression (CPCTR), for learning the latent structural patterns of companies. By formulating a joint optimization framework, CPCTR has the ability in collaboratively modeling both textual (e. g. , reviews) and numerical information (e. g. , salaries and ratings). Indeed, with the identi- fied patterns, including the positive/negative opinions and the latent variable that influences salary, we can effectively carry out opinion analysis and salary prediction. Extensive experiments were conducted on a real-world data set to validate the effectiveness of CPCTR. The results show that our method provides a comprehensive understanding of company characteristics and delivers a more effective prediction of salaries than other baselines.

AAAI Conference 2017 Conference Paper

Coupling Implicit and Explicit Knowledge for Customer Volume Prediction

  • Jingyuan Wang
  • Yating Lin
  • Junjie Wu
  • Zhong Wang
  • Zhang Xiong

Customer volume prediction, which predicts the volume from a customer source to a service place, is a very important technique for location selection, market investigation, and other related applications. Most of traditional methods only make use of partial information for either supervised or unsupervised modeling, which cannot well integrate overall available knowledge. In this paper, we propose a method titled GR- NMF for jointly modeling both implicit correlations hidden inside customer volumes and explicit geographical knowledge via an integrated probabilistic framework. The effectiveness of GR-NMF in coupling all-round knowledge is verified over a real-life outpatient dataset under different scenarios. GR-NMF shows particularly evident advantages to all baselines in location selection with the cold-start challenge.

ECAI Conference 2016 Conference Paper

Adaptive Binary Quantization for Fast Nearest Neighbor Search

  • Zhujin Li
  • Xianglong Liu 0001
  • Junjie Wu
  • Hao Su

Hashing has been proved an attractive technique for fast nearest neighbor search over big data. Compared to the projection based hashing methods, prototype based ones own stronger capability of generating discriminative binary codes for the data with complex inherent structure. However, our observation indicates that they still suffer from the insufficient coding that usually utilizes the complete binary codes in a hypercube. To address this problem, we propose an adaptive binary quantization method that learns a discriminative hash function with prototypes correspondingly associated with small unique binary codes. Our alternating optimization adaptively discovers the prototype set and the code set of a varying size in an efficient way, which together robustly approximate the data relations. Our method can be naturally generalized to the product space for long hash codes. We believe that our idea serves as a very helpful insight to hashing research. The extensive experiments on four large-scale (up to 80 million) datasets demonstrate that our method significantly outperforms state-of-the-art hashing methods, with up to 58. 84% performance gains relatively.

IJCAI Conference 2013 Conference Paper

A Theoretic Framework of K-Means-Based Consensus Clustering

  • Junjie Wu
  • Hongfu Liu
  • Hui Xiong
  • Jie Cao

Consensus clustering emerges as a promising solution to find cluster structures from data. As an efficient approach for consensus clustering, the Kmeans based method has garnered attention in the literature, but the existing research is still preliminary and fragmented. In this paper, we provide a systematic study on the framework of K-meansbased Consensus Clustering (KCC). We first formulate the general definition of KCC, and then reveal a necessary and sufficient condition for utility functions that work for KCC, on both complete and incomplete basic partitionings. Experimental results on various real-world data sets demonstrate that KCC is highly efficient and is comparable to the state-of-the-art methods in terms of clustering quality. In addition, KCC shows high robustness to incomplete basic partitionings with substantial missing values.