Author name cluster

Wynne Hsu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers

2 author rows

AAAI Conference 2026 Conference Paper

Orthogonal Spatial-temporal Distributional Transfer for 4D Generation

Wei Liu
Shengqiong Wu
Bobo Li
Haoyu Zhao
Hao Fei
Mong-Li Lee
Wynne Hsu

In the AIGC era, generating high-quality 4D content has garnered increasing research attention. Unfortunately, current 4D synthesis research is severely constrained by the lack of large-scale 4D datasets, preventing models from adequately learning the critical spatial-temporal features necessary for high-quality 4D generation, thus hindering progress in this domain. To combat this, we propose a novel framework that transfers rich spatial priors from existing 3D diffusion models and temporal priors from video diffusion models to enhance 4D synthesis. We develop a spatial-temporal-disentangled 4D (STD-4D) Diffusion model, which synthesizes 4D-aware videos through disentangled spatial and temporal latents. To facilitate the best feature transfer, we design a novel Orthogonal Spatial-temporal Distributional Transfer (Orster) mechanism, where the spatiotemporal feature distributions are carefully modeled and injected into the STD-4D Diffusion. Further, during the 4D construction, we devise a spatial-temporal-aware HexPlane (ST-HexPlane) to integrate the transferred spatiotemporal features for better 4D deformation and 4D Gaussian feature modeling. Experiments demonstrate that our method significantly outperforms existing approaches, achieving superior spatial-temporal consistency and higher-quality 4D synthesis.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

ChronoFact: Timeline-based Temporal Fact Verification

Anab Maulana Barik
Wynne Hsu
Mong Li Lee

Temporal claims, often riddled with inaccuracies, are a significant challenge in the digital misinformation landscape. Fact-checking systems that can accurately verify such claims are crucial for combating misinformation. Current systems struggle with the complexities of evaluating the accuracy of these claims, especially when they include multiple, overlapping, or recurring events. We introduce a novel timeline-based fact verification framework that identify events from both claim and evidence and organize them into their respective chronological timelines. The framework systematically examines the relationships between the events in both claim and evidence to predict the veracity of each claim event and their chronological accuracy. This allows us to accurately determine the overall veracity of the claim. We also introduce a new dataset of complex temporal claims involving timeline-based reasoning for the training and evaluation of our proposed framework. Experimental results demonstrate the effectiveness of our approach in handling the intricacies of temporal claim verification.

PDF Details DOI

ECAI Conference 2025 Conference Paper

Hybrid Multi-View Approach Towards Augmenting Large Language Models for Human Activity Recognition

Suman Bhoi
Varsha Suresh
Wynne Hsu
Mong-Li Lee

Human Activity Recognition (HAR) is pivotal for behavior monitoring in public healthcare, supporting tasks like medical rehabilitation and targeted wellness campaigns. Initially utilizing hand-crafted features and traditional machine learning models such as support vector machines and decision trees, HAR has since evolved to include deep learning techniques, especially convolutional neural networks and Transformers, which excel at modeling temporal and task-specific information from sensor data. Despite these advancements, challenges related to high task dependency and data scarcity persist. To address these issues, there have been efforts to harness the vast pre-trained knowledge of Large Language Models (LLMs) for HAR. Yet, LLMs often fail to fully capture the temporal dynamics inherent in sensor data. We introduce HyMv, a novel hybrid approach that combines an auxiliary HAR model with soft-prompt tuning of LLMs. This approach leverages the auxiliary model’s proficiency in processing sensor data to guide the parameter optimization of LLMs during prompt tuning. Importantly, the auxiliary HAR model is only active during training, augmenting the LLM’s parameters without adding computational overhead during testing. We evaluate the performance of HyMv for various HAR tasks on diverse datasets and demonstrate its adaptability to different sensor modalities, further showcasing its broad applicability.

Details

ECAI Conference 2025 Conference Paper

Multi-Modal Continual Learning via Cross-Modality Adapters and Representation Alignment with Knowledge Preservation

Evelyn Chee
Wynne Hsu
Mong-Li Lee

Continual learning is essential for adapting models to new tasks while retaining previously acquired knowledge. While existing approaches predominantly focus on uni-modal data, multi-modal learning offers substantial benefits by utilizing diverse sensory inputs, akin to human perception. However, multi-modal continual learning presents additional challenges, as the model must effectively integrate new information from various modalities while preventing catastrophic forgetting. In this work, we propose a pre-trained model-based framework for multi-modal continual learning. Our framework includes a novel cross-modality adapter with a mixture-of-experts structure to facilitate effective integration of multi-modal information across tasks. We also introduce a representation alignment loss that fosters learning of robust multi-modal representations, and regularize relationships between learned representations to preserve knowledge from previous tasks. Experiments on several multi-modal datasets demonstrate that our approach consistently outperforms baselines in both class-incremental and domain-incremental learning, achieving higher accuracy and reduced forgetting.

Details

NeurIPS Conference 2025 Conference Paper

MuSLR: Multimodal Symbolic Logical Reasoning

Jundong Xu
Hao Fei
Yuhui Zhang
Liangming Pan
Qijun Huang
Qian Liu
Preslav Nakov
Min-Yen Kan

Multimodal symbolic logical reasoning, which aims to deduce new facts from multimodal input via formal logic, is critical in high-stakes applications such as autonomous driving and medical diagnosis, as its rigorous, deterministic reasoning helps prevent serious consequences. To evaluate such capabilities of current state-of-the-art vision language models (VLMs), we introduce the first benchmark MuSLR for multimodal symbolic logical reasoning grounded in formal logical rules. MuSLR comprises 1, 093 instances across 7 domains, including 35 atomic symbolic logic and 976 logical combinations, with reasoning depths ranging from 2 to 9. We evaluate 7 state-of-the-art VLMs on MuSLR and find that they all struggle with multimodal symbolic reasoning, with the best model, GPT-4. 1, achieving only 46. 8%. Thus, we propose LogiCAM, a modular framework that applies formal logical rules to multimodal inputs, boosting GPT-4. 1’s Chain-of-Thought performance by 14. 13%, and delivering even larger gains on complex logics such as first-order logic. We also conduct a comprehensive error analysis, showing that around 70% of failures stem from logical misalignment between modalities, offering key insights to guide future improvements.

PDF Details

NeurIPS Conference 2025 Conference Paper

Test-Time Adaptation by Causal Trimming

Yingnan Liu
Rui Qiao
Mong-Li Lee
Wynne Hsu

Test-time adaptation aims to improve model robustness under distribution shifts by adapting models with access to unlabeled target samples. A primary cause of performance degradation under such shifts is the model’s reliance on features that lack a direct causal relationship with the prediction target. We introduce Test-time Adaptation by Causal Trimming (TACT), a method that identifies and removes non-causal components from representations for test distributions. TACT applies data augmentations that preserve causal features while varying non-causal ones. By analyzing the changes in the representations using Principal Component Analysis, TACT identifies the highest variance directions associated with non-causal features. It trims the representations by removing their projections on the identified directions, and uses the trimmed representations for the predictions. During adaptation, TACT continuously tracks and refines these directions to get a better estimate of non-causal features. We theoretically analyze the effectiveness of this approach and empirically validate TACT on real-world out-of-distribution benchmarks. TACT consistently outperforms state-of-the-art methods by a significant margin.

PDF Details

ICML Conference 2025 Conference Paper

Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models

Tianjie Ju
Yi Hua
Hao Fei 0001
Zhenyu Shao
Yubin Zheng
Haodong Zhao
Mong-Li Lee
Wynne Hsu

Multi-Modal Large Language Models (MLLMs) have exhibited remarkable performance on various vision-language tasks such as Visual Question Answering (VQA). Despite accumulating evidence of privacy concerns associated with task-relevant content, it remains unclear whether MLLMs inadvertently memorize private content that is entirely irrelevant to the training tasks. In this paper, we investigate how randomly generated task-irrelevant private content can become spuriously correlated with downstream objectives due to partial mini-batch training dynamics, thus causing inadvertent memorization. Concretely, we randomly generate task-irrelevant watermarks into VQA fine-tuning images at varying probabilities and propose a novel probing framework to determine whether MLLMs have inadvertently encoded such content. Our experiments reveal that MLLMs exhibit notably different training behaviors in partial mini-batch settings with task-irrelevant watermarks embedded. Furthermore, through layer-wise probing, we demonstrate that MLLMs trigger distinct representational patterns when encountering previously seen task-irrelevant knowledge, even if this knowledge does not influence their output during prompting. Our code is available at https: //github. com/illusionhi/ProbingPrivacy.

Details

IJCAI Conference 2024 Conference Paper

Cross-Domain Feature Augmentation for Domain Generalization

Yingnan Liu
Yingtian Zou
Rui Qiao
Fusheng Liu
Mong Li Lee
Wynne Hsu

Domain generalization aims to develop models that are robust to distribution shifts. Existing methods focus on learning invariance across domains to enhance model robustness, and data augmentation has been widely used to learn invariant predictors, with most methods performing augmentation in the input space. However, augmentation in the input space has limited diversity whereas in the feature space is more versatile and has shown promising results. Nonetheless, feature semantics is seldom considered and existing feature augmentation methods suffer from a limited variety of augmented features. We decompose features into class-generic, class-specific, domain-generic, and domain-specific components. We propose a cross-domain feature augmentation method named XDomainMix that enables us to increase sample diversity while emphasizing the learning of invariant representations to achieve domain generalization. Experiments on widely used benchmark datasets demonstrate that our proposed method is able to achieve state-of-the-art performance. Quantitative analysis indicates that our feature augmentation approach facilitates the learning of effective models that are invariant across different domains.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Towards Robust Out-of-Distribution Generalization Bounds via Sharpness

Yingtian Zou
Kenji Kawaguchi
Yingnan Liu 0002
Jiashuo Liu
Mong-Li Lee
Wynne Hsu

Generalizing to out-of-distribution (OOD) data or unseen domain, termed OOD generalization, still lacks appropriate theoretical guarantees. Canonical OOD bounds focus on different distance measurements between source and target domains but fail to consider the optimization property of the learned model. As empirically shown in recent work, sharpness of learned minimum influences OOD generalization. To bridge this gap between optimization and OOD generalization, we study the effect of sharpness on how a model tolerates data change in domain shift which is usually captured by "robustness" in generalization. In this paper, we give a rigorous connection between sharpness and robustness, which gives better OOD guarantees for robust algorithms. It also provides a theoretical backing for "flat minima leads to better OOD generalization". Overall, we propose a sharpness-based OOD generalization bound by taking robustness into consideration, resulting in a tighter bound than non-robust guarantees. Our findings are supported by the experiments on a ridge regression model, as well as the experiments on deep learning classification tasks.

Details

ICML Conference 2024 Conference Paper

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition

Hao Fei 0001
Shengqiong Wu
Wei Ji 0008
Hanwang Zhang
Meishan Zhang
Mong-Li Lee
Wynne Hsu

Existing research of video understanding still struggles to achieve in-depth comprehension and reasoning in complex videos, primarily due to the under-exploration of two key bottlenecks: fine-grained spatial-temporal perceptive understanding and cognitive-level video scene comprehension. This paper bridges the gap by presenting a novel solution. We first introduce a novel video Multimodal Large Language Model (MLLM), MotionEpic, which achieves fine-grained pixel-level spatial-temporal video grounding by integrating video spatial-temporal scene graph (STSG) representation. Building upon MotionEpic, we then develop a Video-of-Thought (VoT) reasoning framework. VoT inherits the Chain-of-Thought (CoT) core, breaking down a complex task into simpler and manageable sub-problems, and addressing them step-by-step from a low-level pixel perception to high-level cognitive interpretation. Extensive experiments across various complex video QA benchmarks demonstrate that our overall framework strikingly boosts existing state-of-the-art. To our knowledge, this is the first attempt at successfully implementing the CoT technique for achieving human-level video reasoning, where we show great potential in extending it to a wider range of video understanding scenarios. Systems and codes will be open later.

Details

AAAI Conference 2023 Conference Paper

Leveraging Old Knowledge to Continually Learn New Classes in Medical Images

Evelyn Chee
Mong Li Lee
Wynne Hsu

Class-incremental continual learning is a core step towards developing artificial intelligence systems that can continuously adapt to changes in the environment by learning new concepts without forgetting those previously learned. This is especially needed in the medical domain where continually learning from new incoming data is required to classify an expanded set of diseases. In this work, we focus on how old knowledge can be leveraged to learn new classes without catastrophic forgetting. We propose a framework that comprises of two main components: (1) a dynamic architecture with expanding representations to preserve previously learned features and accommodate new features; and (2) a training procedure alternating between two objectives to balance the learning of new features while maintaining the model’s performance on old classes. Experiment results on multiple medical datasets show that our solution is able to achieve superior performance over state-of-the-art baselines in terms of class accuracy and forgetting.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Multi-Object Representation Learning via Feature Connectivity and Object-Centric Regularization

Alex Foo
Wynne Hsu
Mong Li Lee

Discovering object-centric representations from images has the potential to greatly improve the robustness, sample efficiency and interpretability of machine learning algorithms. Current works on multi-object images typically follow a generative approach that optimizes for input reconstruction and fail to scale to real-world datasets despite significant increases in model capacity. We address this limitation by proposing a novel method that leverages feature connectivity to cluster neighboring pixels likely to belong to the same object. We further design two object-centric regularization terms to refine object representations in the latent space, enabling our approach to scale to complex real-world images. Experimental results on simulated, real-world, complex texture and common object images demonstrate a substantial improvement in the quality of discovered objects compared to state-of-the-art methods, as well as the sample efficiency and generalizability of our approach. We also show that the discovered object-centric representations can accurately predict key object properties in downstream tasks, highlighting the potential of our method to advance the field of multi-object representation learning.

PDF Details

NeurIPS Conference 2023 Conference Paper

REFINE: A Fine-Grained Medication Recommendation System Using Deep Learning and Personalized Drug Interaction Modeling

Suman Bhoi
Mong Li Lee
Wynne Hsu
Ngiap Chuan Tan

Patients with co-morbidities often require multiple medications to manage their conditions. However, existing medication recommendation systems only offer class-level medications and regard all interactions among drugs to have the same level of severity. This limits their ability to provide personalized and safe recommendations tailored to individual needs. In this work, we introduce a deep learning-based fine-grained medication recommendation system called REFINE, which is designed to improve treatment outcomes and minimize adverse drug interactions. In order to better characterize patients’ health conditions, we model the trend in medication dosage titrations and lab test responses, and adapt the vision transformer to obtain effective patient representations. We also model drug interaction severity levels as weighted graphs to learn safe drug combinations and design a balanced loss function to avoid overly conservative recommendations and miss medications that might be needed for certain conditions. Extensive experiments on two real-world datasets show that REFINE outperforms state-of-the-art techniques.

PDF Details

IJCAI Conference 2022 Conference Paper

Chronic Disease Management with Personalized Lab Test Response Prediction

Suman Bhoi
Mong Li Lee
Wynne Hsu
Hao Sen Andrew Fang
Ngiap Chuan Tan

Chronic disease management involves frequent administration of invasive lab procedures in order for clinicians to determine the best course of treatment regimes for these patients. However, patients are often put off by these invasive lab procedures and do not follow the appointment schedules. This has resulted in poor management of their chronic conditions leading to unnecessary disease complications. An AI system that is able to personalize the prediction of individual patient lab test responses will enable clinicians to titrate the medications to achieve the desired therapeutic outcome. Accurate prediction of lab test response is a challenge because these patients typically have co-morbidities and their treatments might influence the target lab test response. To address this, we model the complex interactions among different medications, diseases, lab test response, and fine-grained dosage information to learn a strong patient representation. Together with information from similar patients and external knowledge such as drug-lab interactions and diagnosis-lab interaction, we design a system called KALP to perform personalized prediction of patients’ response for a target lab result and identify the top influencing factors for the prediction. Experiment results on real-world datasets demonstrate the effectiveness of KALP in reducing prediction errors by a significant margin. Case studies show that the identified factors are consistent with clinicians’ understanding.

PDF Details DOI

NeurIPS Conference 2021 Conference Paper

Explanation-based Data Augmentation for Image Classification

Sandareka Wickramanayake
Wynne Hsu
Mong Li Lee

Existing works have generated explanations for deep neural network decisions to provide insights into model behavior. We observe that these explanations can also be used to identify concepts that caused misclassifications. This allows us to understand the possible limitations of the dataset used to train the model, particularly the under-represented regions in the dataset. This work proposes a framework that utilizes concept-based explanations to automatically augment the dataset with new images that can cover these under-represented regions to improve the model performance. The framework is able to use the explanations generated by both interpretable classifiers and post-hoc explanations from black-box classifiers. Experiment results demonstrate that the proposed approach improves the accuracy of classifiers compared to state-of-the-art augmentation strategies.

PDF Details

NeurIPS Conference 2020 Conference Paper

Towards Maximizing the Representation Gap between In-Domain & Out-of-Distribution Examples

Jay Nandy
Wynne Hsu
Mong Li Lee

Among existing uncertainty estimation approaches, Dirichlet Prior Network (DPN) distinctly models different predictive uncertainty types. However, for in-domain examples with high data uncertainties among multiple classes, even a DPN model often produces indistinguishable representations from the out-of-distribution (OOD) examples, compromising their OOD detection performance. We address this shortcoming by proposing a novel loss function for DPN to maximize the representation gap between in-domain and OOD examples. Experimental results demonstrate that our proposed approach consistently improves OOD detection performance.

PDF Details

AAAI Conference 2019 Conference Paper

FLEX: Faithful Linguistic Explanations for Neural Net Based Model Decisions

Sandareka Wickramanayake
Wynne Hsu
Mong Li Lee

Explaining the decisions of a Deep Learning Network is imperative to safeguard end-user trust. Such explanations must be intuitive, descriptive, and faithfully explain why a model makes its decisions. In this work, we propose a framework called FLEX (Faithful Linguistic EXplanations) that generates post-hoc linguistic justifications to rationalize the decision of a Convolutional Neural Network. FLEX explains a model’s decision in terms of features that are responsible for the decision. We derive a novel way to associate such features to words, and introduce a new decision-relevance metric that measures the faithfulness of an explanation to a model’s reasoning. Experiment results on two benchmark datasets demonstrate that the proposed framework can generate discriminative and faithful explanations compared to state-ofthe-art explanation generators. We also show how FLEX can generate explanations for images of unseen classes as well as automatically annotate objects in images.

PDF Details

IJCAI Conference 2017 Conference Paper

Quantifying Aspect Bias in Ordinal Ratings using a Bayesian Approach

Lahari Poddar
Wynne Hsu
Mong Li Lee

User opinions expressed in the form of ratings can influence an individual's view of an item. However, the true quality of an item is often obfuscated by user biases, and it is not obvious from the observed ratings the importance different users place on different aspects of an item. We propose a probabilistic modeling of the observed aspect ratings to infer (i) each user's aspect bias and (ii) latent intrinsic quality of an item. We model multi-aspect ratings as ordered discrete data and encode the dependency between different aspects by using a latent Gaussian structure. We handle the Gaussian-Categorical non-conjugacy using a stick-breaking formulation coupled with P\'{o}lya-Gamma auxiliary variable augmentation for a simple, fully Bayesian inference. On two real world datasets, we demonstrate the predictive ability of our model and its effectiveness in learning explainable user biases to provide insights towards a more reliable product quality estimation.

PDF Details

AAMAS Conference 2012 Conference Paper

Coordination Guided Reinforcement Learning

Qiangfeng Peter Lau
Mong Li Lee
Wynne Hsu

In this paper, we propose to guide reinforcement learning (RL) with expert coordination knowledge for multi-agent problems managed by a central controller. The aim is to learn to use expert coordination knowledge to restrict the joint action space and to direct exploration towards more promising states, thereby improving the overall learning rate. We model such coordination knowledge as constraints and propose a two-level RL system that utilizes these constraints for online applications. Our declarative approach towards specifying coordination in multi-agent learning allows knowledge sharing between constraints and features (basis functions) for function approximation. Results on a soccer game and a tactical real-time strategy game show that coordination constraints improve the learning rate compared to using only unary constraints. The two-level RL system also outperforms existing single-level approach that utilizes joint action selection via coordination graphs.

PDF

AIIM Journal 2005 Journal Article

Discovering reliable protein interactions from high-throughput experimental data using network topology

Jin Chen
Wynne Hsu
Mong Li Lee
See-Kiong Ng

Details DOI

IJCAI Conference 1997 Conference Paper

Discovering Interesting Holes in Data

Bing Liu
Liang-Ping Ku
Wynne Hsu

Current machine learning and discovery techniques focus on discovering rules or regularities that exist in data. An important aspect of the research that has been ignored in the past is the learning or discovering of interesting holes in the database. If we view each case in the database as a point in a it-dimensional space, then a hole is simply a region in the space that contains no data point. Clearly, not every hole is interesting. Some holes are obvious because it is known that certain value combinations are not possible. Some holes exist because there are insufficient cases in the database. However, in some situations, empty regions do carry important information. For instance, they could warn us about some missing value combinations that are either not known before or are unexpected. Knowing these missing value combinations may lead to significant discoveries. In this paper, we propose an algorithm to discover holes in databases.

PDF

ICRA Conference 1996 Conference Paper

Conceptual level design for assembly analysis using state transitional approach

Wynne Hsu
Andrew Lim 0001
C. S. George Lee

Traditionally, design for assembly is done during the detailed design phase. A designer first maps a set of design requirements into a set of components or subassemblies that can satisfy the given set of requirements. The components and subassemblies are then examined individually to determine whether they conform to the principles of design for assembly. Usually, local changes are performed so that the resultant components/subassemblies are better for assembly. In this paper, we propose to bring the design for assembly analysis into an even earlier phase-that of the conceptual design phase. We argue that by incorporating the design for assembly analysis at the conceptual design phase, we can achieve a more substantial savings as compared to the savings obtained when the design for assembly analysis is only performed as late as the detailed design phase. The basic idea is to select a combination of design concepts (previously stored in a library) such that together they can achieve the stated functional requirements (in the form of state transitional graph) at the minimum cost for assembly. This problem of selecting the right combination of design concepts is reduced to the well-known set covering problem. With this reduction, many existing graph algorithms can be applied to aid in the design for assembly analysis.

Details

ICRA Conference 1995 Conference Paper

Paradigm Shift and the Integrated Feedback Approach

Wynne Hsu
C. S. George Lee

With the increased competition in today's world market, emphasis has been on the ability to shift from an existing paradigm to a new paradigm so as to create new opportunities and to gain new market. A successful paradigm shift is dependent on two factors: (1) the ability to pinpoint the inherent weaknesses in the existing paradigm, (2) the ability to find a paradigm that can replace the old paradigm. In this paper, we show how the integrated feedback approach is able to address these two concerns. The integrated feedback approach was proposed to integrate the design phase with the downstream activities so as to achieve a design that is better for assembly. The approach operates in two phases: an evaluation phase and a redesign suggestion generation phase. A number of objective criteria have been proposed to evaluate a given design from the functional perspective, from the assembly plan perspective, and from the tolerance perspective. Through the evaluation process, weaknesses in the design are identified and techniques for generating feasible redesign suggestions are examined. By encouraging greater participation from the designer, a systematic aid to paradigm shifting can be obtained. A prototype system implementing the integrated feedback approach has been developed on a Sun Sparcstation with graphics simulation on a Silicon Graphics Iris workstation. A real life product, the telephone, is used as an illustration.

Details

ICRA Conference 1992 Conference Paper

Feedback evaluation of assembly plans

Wynne Hsu
C. S. George Lee
Shun-Feng Su

The authors examine issues involved in integrating the design level and the assembly planning level with a feedback loop. The integration is performed in two stages. The first stage focuses on the evaluation of an assembly plan. Evaluation criteria that can pinpoint areas which need redesign are defined. The second stage is to use the evaluation results to come up with the actual redesign. Algorithms are developed for performing evaluation of assembly plans. From the evaluation results, means of generating hints for redesign are discussed. The hints are then processed and calls are made to the redesign operators to perform the actual redesign of components. The integrated design-planning system has the ability to identify parts that need redesign and the ability to come up with feasible redesign options in polynomial time. >

Details