EAAI Journal 2026 Journal Article
Graph channel receptive field transformer for multi-agent trajectory prediction
- Jiankun Peng
- Jiakang Wang
- Nan Zhang
- Di Wu
- Chunye Ma
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
EAAI Journal 2026 Journal Article
EAAI Journal 2026 Journal Article
AAAI Conference 2026 Conference Paper
Streaming video question answering (Streaming Video QA) poses distinct challenges for multimodal large language models (MLLMs), as video frames arrive sequentially and user queries can be issued at arbitrary timepoints. Existing solutions relying on fixed-size memory or naive compression often suffer from context loss or memory overflow, limiting their effectiveness in long-form, real-time scenarios.We present Vista, a novel framework for scene-aware streaming video QA that enables efficient and scalable reasoning over continuous video streams. The innovation of Vista can be summarized in three aspects: (1) Scene-aware segmentation. Vista dynamically clusters incoming frames into temporally and visually coherent scene units. (2) Scene-aware compression. Each scene is compressed into a compact token representation and stored in GPU memory for efficient index-based retrieval, while the full-resolution frames are offloaded to CPU memory. (3) Scene-aware recall. Upon receiving a question, relevant scenes are selectively recalled and reintegrated into the model’s input space, enabling both efficiency and completeness. Vista is model-agnostic and integrates seamlessly with a variety of vision-language backbones, enabling long-context reasoning without compromising latency or memory efficiency. Extensive experiments on StreamingBench demonstrate that Vista achieves state-of-the-art performance, establishing a strong baseline for real-world streaming video understanding.
TCS Journal 2025 Journal Article
NeurIPS Conference 2025 Conference Paper
Mamba, a lightweight sequence modeling framework offering near-linear complexity, presents a promising alternative to Transformers. In this work, we introduce MOGO (Mamba Only Glances Once), an end-to-end framework for efficient video action detection built entirely on the Mamba architecture. In MOGO, our newly designed Mamba-based decoder can even use just one Mamba layer to effectively perform action detection. It uses neither Transformer structures nor RCNN-like methods for proposal detection. Our framework introduces two key innovations. First, we propose a pure Mamba-based encoder-decoder architecture. The encoder processes cross-frame video information, while the decoder incorporates two novel Mamba-based structures that leverage Mamba’s intrinsic capabilities to detect actions. Theoretical analysis and ablation experiments confirm their synergy and the necessity of each structure. Second, we design a video token construction mechanism to improve the model's performance. The token importance block can ensure that the retained token information is highly relevant to the predicted targets. These two innovations make MOGO both efficient and accurate, as demonstrated on the JHMDB and UCF101-24 benchmark datasets. Compared to SOTA action detection methods, MOGO achieves superior performance in terms of GFLOPs, model parameters, and inference speed (latency) with comparable detection precision. Additionally, it requires significantly less GPU memory than some SOTA token reconstruction methods. Code is available at https: //github. com/YunqingLiu-ML/MOGO.
AAAI Conference 2025 Conference Paper
Utilizing uniformly distributed sparse annotations, weakly supervised learning alleviates the heavy reliance on fine-grained annotations in point cloud semantic segmentation tasks. However, few works discuss the inhomogeneity of sparse annotations, albeit it is common in real-world scenarios. Therefore, this work introduces the probability density function into the gradient sampling approximation method to qualitatively analyze the impact of annotation sparsity and inhomogeneity under weakly supervised learning. Based on our analysis, we propose an Adaptive Annotation Distribution Network (AADNet) capable of robust learning on arbitrarily distributed sparse annotations. Specifically, we propose a label-aware point cloud downsampling strategy to increase the proportion of annotations involved in the training stage. Furthermore, we design the multiplicative dynamic entropy as the gradient calibration function to mitigate the gradient bias caused by non-uniformly distributed sparse annotations and explicitly reduce the epistemic uncertainty. Without any prior restrictions and additional information, our proposed method achieves comprehensive performance improvements at multiple label rates and different annotation distributions.
TCS Journal 2025 Journal Article
ICLR Conference 2025 Conference Paper
Indexing is an important step towards strong performance in retrieval-augmented generation (RAG) systems. However, existing methods organize data based on either semantic similarity (similarity) or related information (relatedness), but do not cover both perspectives comprehensively. Our analysis reveals that modeling only one perspective results in insufficient knowledge synthesis, leading to suboptimal performance on complex tasks requiring multihop reasoning. In this paper, we propose SiReRAG, a novel RAG indexing approach that explicitly considers both similar and related information. On the similarity side, we follow existing work and explore some variances to construct a similarity tree based on recursive summarization. On the relatedness side, SiReRAG extracts propositions and entities from texts, groups propositions via shared entities, and generates recursive summaries to construct a relatedness tree. We index and flatten both similarity and relatedness trees into a unified retrieval pool. Our experiments demonstrate that SiReRAG consistently outperforms state-of-the-art indexing methods on three multihop datasets (MuSiQue, 2WikiMultiHopQA, and HotpotQA), with an average 1.9% improvement in F1 scores. As a reasonably efficient solution, SiReRAG enhances existing reranking methods significantly, with up to 7.8% improvement in average F1 scores. Our code is available at https://github.com/SalesforceAIResearch/SiReRAG.
UAI Conference 2024 Conference Paper
Boolean matrix factorization (BMF) has been widely utilized in fields such as recommendation systems, graph learning, text mining, and -omics data analysis. Traditional BMF methods decompose a binary matrix into the Boolean product of two lower-rank Boolean matrices plus homoscedastic random errors. However, real-world binary data typically involves biases arising from heterogeneous row- and column-wise signal distributions. Such biases can lead to suboptimal fitting and unexplainable predictions if not accounted for. In this study, we reconceptualize the binary data generation as the Boolean sum of three components: a binary pattern matrix, a background bias matrix influenced by heterogeneous row or column distributions, and random flipping errors. We introduce a novel Disentangled Representation Learning for Binary matrices (DRLB) method, which employs a dual auto-encoder network to reveal the true patterns. DRLB can be seamlessly integrated with existing BMF techniques to facilitate bias-aware BMF. Our experiments with both synthetic and real-world datasets show that DRLB significantly enhances the precision of traditional BMF methods while offering high scalability. Moreover, the bias matrix detected by DRLB accurately reflects the inherent biases in synthetic data, and the patterns identified in the bias-corrected real-world data exhibit enhanced interpretability.
YNICL Journal 2024 Journal Article
TCS Journal 2024 Journal Article
AAAI Conference 2024 Conference Paper
Weak supervision has proven to be an effective strategy for reducing the burden of annotating semantic segmentation tasks in 3D space. However, unconstrained or heuristic weakly supervised annotation forms may lead to suboptimal label efficiency. To address this issue, we propose a novel label recommendation framework for weakly supervised point cloud semantic segmentation. Distinct from pre-training and active learning, the label recommendation framework consists of three stages: inductive bias learning, recommendations for points to be labeled, and point cloud semantic segmentation learning. In practice, we first introduce the point cloud upsampling task to induct inductive bias from structural information. During the recommendation stage, we present a cross-scene clustering strategy to generate centers of clustering as recommended points. Then we introduce a recommended point positions attention module LabelAttention to model the long-range dependency under sparse annotations. Additionally, we employ position encoding to enhance the spatial awareness of semantic features. Throughout the framework, the useful information obtained from inductive bias learning is propagated to subsequent semantic segmentation networks in the form of label positions. Experimental results demonstrate that our framework outperforms weakly supervised point cloud semantic segmentation methods and other methods for labeling efficiency on S3DIS and ScanNetV2, even at an extremely low label rate.
EAAI Journal 2024 Journal Article
TCS Journal 2023 Journal Article
IJCAI Conference 2023 Conference Paper
Point cloud completion aims at estimating the complete data of objects from degraded observations. Despite existing completion methods achieving impressive performances, they rely heavily on degraded-complete data pairs for supervision. In this work, we propose a novel framework named Null-Space Diffusion Sampling (NSDS) to solve the point cloud completion task in a zero-shot manner. By leveraging a pre-trained point cloud diffusion model as the off-the-shelf generator, our sampling approach can generate desired completion outputs with the guidance of the observed degraded data without any extra training. Furthermore, we propose a tolerant loop mechanism to improve the quality of completion results for hard cases. Experimental results demonstrate our zero-shot framework achieves superior completion performance than unsupervised methods and comparable performance to supervised methods in various degraded situations.
TCS Journal 2022 Journal Article
TCS Journal 2021 Journal Article
TCS Journal 2020 Journal Article
TCS Journal 2020 Journal Article
TCS Journal 2020 Journal Article
TCS Journal 2020 Journal Article
TCS Journal 2019 Journal Article
YNICL Journal 2019 Journal Article
TCS Journal 2018 Journal Article
EAAI Journal 2018 Journal Article
ICRA Conference 2018 Conference Paper
Vision-based navigation is extremely susceptible to natural scene changes. This can result in localization failures in less than a few hours after map creation. To combat short-term illumination changes as well as long-term seasonal variations, we propose using a place-and-time-dependent binary descriptor that adapts to different scenarios in an online fashion. This is achieved by extending the GRIEF [6] evolution algorithm in two ways: correspondence generation using a known pose change and the inclusion of LATCH triplets in addition to BRIEF comparisons for descriptor generation. We show the adaptive descriptor outperforms a single descriptor scheme for localization within a single-experience Visual Teach and Repeat (VT&R) system while maintaining the efficiency of binary descriptors. By adapting the description function to different environmental conditions, it allows the system to operate for a longer period before a new experience is required. In the presence of extreme illumination changes from day to night, we obtain 40% more inlier matches compared to SURF. In the case of seasonal variations, a 70% increase is demonstrated. The increased correspondences result in more localizable sections along the paths, amounting to a 25% and 150% increase in the lighting and seasonal cases, respectively.
TCS Journal 2016 Journal Article
TCS Journal 2016 Journal Article
TCS Journal 2016 Journal Article
EAAI Journal 2016 Journal Article
TCS Journal 2014 Journal Article
TCS Journal 2013 Journal Article
TCS Journal 2013 Journal Article
TCS Journal 2012 Journal Article
IS Journal 2008 Journal Article
Accurate, reliable, and timely traffic information is critical for deployment and operation of intelligent transportation systems (ITSs). Traffic forecasting for travelers and traffic operators should become at least as useful and convenient as weather reports. In the US, the Federal Highway Administration (FHWA) has envisioned a real-time traffic estimation and prediction system (TrEPS) as an ITS support platform that resides at traffic management centers (TMCs) for dynamic route assignment (DRA) and other transportation operations.