AIIM Journal 2025 Journal Article
Multiplex aggregation combining sample reweight composite network for pathology image segmentation
- Dawei Fan
- Zhuo Chen
- Yifan Gao
- Jiaming Yu
- Kaibin Li
- Yi Wei
- Yanping Chen
- Riqing Chen
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
AIIM Journal 2025 Journal Article
NeurIPS Conference 2024 Conference Paper
In this work, we introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. This limits these methods to a low-resolution representation and makes it difficult to scale up to the dense views for better quality. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms to effectively integrate image features into 3D representations. We implement this solution through a two-stage pipeline: initially, a lightweight proposal network generates a sparse set of 3D anchor points from the posed image inputs; subsequently, a specialized reconstruction transformer refines the geometry and retrieves textural details. Extensive experimental results demonstrate that GeoLRM significantly outperforms existing models, especially for dense view inputs. We also demonstrate the practical applicability of our model with 3D generation tasks, showcasing its versatility and potential for broader adoption in real-world applications. The project page: https: //linshan-bin. github. io/GeoLRM/.
JBHI Journal 2023 Journal Article
In recent years, deep learning has gained widespread attention in electroencephalogram (EEG)-based emotion recognition. However, deep learning methods are usually time-consuming with a large amount of memory usage, which obstructs their practical usage on resource-constrained devices. In this paper, we propose a binary capsule network (Bi-CapsNet) for EEG emotion recognition with low computational cost and memory usage. The Bi-CapsNet binarizes 32-bit weights and activations to 1 b, and replaces floating-point operations with efficient bitwise operations. To address the issue of function discontinuity in backward propagation, we use a continuous function to approximate the binarization process. Two popular EEG emotion databases, namely, DEAP and DREAMER, are used for performance evaluation. In comparison to its full-precision counterpart, the Bi-CapsNet achieves a $>\! 25\times$ reduction on the computational cost and a $>\! 5\times$ reduction on the memory usage, while with only a $< $ 1% drop on the recognition accuracy. Compared to some state-of-the-art EEG emotion recognition methods, the proposed method obtains more competitive performance. In addition, the Bi-CapsNet is implemented on a mobile phone via an open-source binary inference framework named Bolt, and it achieves an $\sim\! 5\times$ inference acceleration in comparison to its full-precision counterpart.
AAAI Conference 2022 Conference Paper
Cross-Lingual Information Retrieval (CLIR) aims to rank the documents written in a language different from the user’s query. The intrinsic gap between different languages is an essential challenge for CLIR. In this paper, we introduce the multilingual knowledge graph (KG) to the CLIR task due to the sufficient information of entities in multiple languages. It is regarded as a “silver bullet” to simultaneously perform explicit alignment between queries and documents and also broaden the representations of queries. And we propose a model named CLIR with hierarchical knowledge enhancement (HIKE) for our task. The proposed model encodes the textual information in queries, documents and the KG with multilingual BERT, and incorporates the KG information in the query-document matching process with a hierarchical information fusion mechanism. Particularly, HIKE first integrates the entities and their neighborhood in KG into query representations with a knowledge-level fusion, then combines the knowledge from both source and target languages to further mitigate the linguistic gap with a language-level fusion. Finally, experimental results demonstrate that HIKE achieves substantial improvements over state-ofthe-art competitors.
AAAI Conference 2020 Conference Paper
We aim to detect real-world concurrent activities performed by a single person from a streaming 3D skeleton sequence. Different from most existing works that deal with concurrent activities performed by multiple persons that are seldom correlated, we focus on concurrent activities that are spatiotemporally or causally correlated and performed by a single person. For the sake of generalization, we propose an approach based on a decompositional design to learn a dedicated feature representation for each activity class. To address the scalability issue, we further extend the class-level decompositional design to the postural-primitive level, such that each class-wise representation does not need to be extracted by independent backbones, but through a dedicated weighted aggregation of a shared pool of postural primitives. There are multiple interdependent instances deriving from each decomposition. Thus, we propose Stacked Relation Networks (SRN), with a specialized relation network for each decomposition, so as to enhance the expressiveness of instance-wise representations via the inter-instance relationship modeling. SRN achieves state-of-the-art performance on a public dataset and a newly collected dataset. The relation weights within SRN are interpretable among the activity contexts. The new dataset and code are available at https: //github. com/weiyi1991/UA Concurrent/
KER Journal 2019 Journal Article
Abstract One way to address this low sample efficiency of reinforcement learning (RL) is to employ human expert demonstrations to speed up the RL process (RL from demonstration or RLfD). The research so far has focused on demonstrations from a single expert. However, little attention has been given to the case where demonstrations are collected from multiple experts, whose expertise may vary on different aspects of the task. In such scenarios, it is likely that the demonstrations will contain conflicting advice in many parts of the state space. We propose a two-level Q-learning algorithm, in which the RL agent not only learns the policy of deciding on the optimal action but also learns to select the most trustworthy expert according to the current state. Thus, our approach removes the traditional assumption that demonstrations come from one single source and are mostly conflict-free. We evaluate our technique on three different domains and the results show that the state-of-the-art RLfD baseline fails to converge or performs similarly to conventional Q-learning. In contrast, the performance level of our novel algorithm increases with more experts being involved in the learning process and the proposed approach has the capability to handle demonstration conflicts well.
ICML Conference 2015 Conference Paper
We consider the problem of building probabilistic models that jointly model short natural language utterances and source code snippets. The aim is to bring together recent work on statistical modelling of source code and work on bimodal models of images and natural language. The resulting models are useful for a variety of tasks that involve natural language and source code. We demonstrate their performance on two retrieval tasks: retrieving source code snippets given a natural language query, and retrieving natural language descriptions given a source code query (i. e. , source code captioning). The experiments show there to be promise in this direction, and that modelling the structure of source code is helpful towards the retrieval tasks.