Author name cluster

Yan Gao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers

2 author rows

EAAI Journal 2026 Journal Article

Forecast-enhanced bilevel real-time pricing for microgrids via hybrid-action reinforcement learning

Jingqi Wang
Yan Gao
Youmeng He

The integration of distributed energy resources into microgrids faces many complex challenges, including renewable intermittency, hybrid decision-making, and hierarchical coordination. This paper presents a forecast-enhanced bilevel real-time pricing framework using a hybrid-action deep reinforcement learning (DRL) algorithm with Gumbel-Softmax reparameterization. The framework manages both discrete generator commitment and continuous pricing decisions through integrated optimization. Our approach integrates Long Short-Term Memory (LSTM) forecasting to enhance proactive scheduling, while coordinating microgrid agents through a bilevel optimization architecture. The main innovations include: a hybrid-action DRL algorithm integrating Gumbel-Softmax reparameterization for joint discrete–continuous optimization; LSTM-based renewable forecasting integrated into state representation. Our DRL approach shows enhanced system performance with improved constraint satisfaction and operational efficiency, offering a practical solution for complex hybrid-action energy optimization problems.

Details DOI

EAAI Journal 2026 Journal Article

Regression prediction of non-metallic pipeline damage degree based on time-frequency domain acoustic sequence processing and transformer-Newton-Raphson-based optimizer-categorical boosting

Xiaojuan Han
Ruiqi Fan
Xiwang Cui
Yan Gao

Predicting the degree of pipeline leak is fundamental for ensuring the safe and stable operation of pipelines. To improve the accuracy of predicting leak degree in non-metallic pipelines, this paper proposes a leak degree prediction method based on an Newton-Raphson-Based Optimizer (NRBO)-Categorical Boosting (CatBoost) optimized Transformer classifier. Signal features extracted via wavelet packet decomposition under different leak levels are divided into sample sets segmented at fixed time intervals. The Short-Time Fourier Transform is applied to convert pipeline leak time-series signals of varying sizes into time-frequency images, which are then binarized. Polynomial fitting is used to establish the relationship between leak sizes and leak signals, thereby expanding the leak sample set. A Transformer-based leak degree prediction model is constructed using two-dimensional time-frequency binary images as input, and an NRBO-CatBoost optimized Transformer classifier (TF-NRCB) is employed to enhance the prediction accuracy of pipeline leak degree. The effectiveness of the proposed method is validated through an experimental water pipeline leak testing platform based on acoustic detection. Numerical results demonstrate that the leak degree prediction accuracy achieved by the TF-NRCB method is the highest, providing a solid theoretical foundation for the safe operation of pipeline systems.

Details DOI

EAAI Journal 2025 Journal Article

A bidirectional bi-objective graph search model for sustainable urban railway alignment optimization

Tianlong Zhang
Yan Gao
Shuangting Xu
Ting Deng
Qing He
Paul Schonfeld
Yang Zou
Dong Liang

Designing railway alignments in building-dense urban areas is a challenging task, requiring consideration of both costs and impacts on existing buildings and the environment. Achieving a viable solution necessitates the application of computer-aided techniques for three-dimensional (3D) global path searches while simultaneously optimizing multiple objectives. To tackle this challenge, this study proposes a bidirectional bi-objective graph search model. This model efficiently searches the 3D space to generate high-quality railway alignment solutions that simultaneously consider both comprehensive costs (including railway construction, ecological, and affected building costs) and carbon emissions (covering emissions from railways and buildings). It provides valuable reference solutions for designers, enhancing the design efficiency. The model includes two main innovations: (1) the ability to quickly search the entire 3D space using a graph-based strategy, generating multiple alignment solutions that meet design constraints in a single optimization process, and (2) the ability to accurately and efficiently account for the impact of railway alignments on existing buildings during optimization. Testing the model on a real-world urban case demonstrates its capability to generate multiple alternative railway alignments within minutes. The Pareto balanced solution achieves an 18. 91 % reduction in comprehensive costs and a 13. 46 % decrease in carbon emissions compared to manual design. The estimation error of affected building areas is approximately 2 %–4 % along the approximately 40 km alignment. Overall, the significance of this study lies in exploring the application of efficient graph search algorithms in the multi-objective optimization design of railway alignments in urban areas, advancing ongoing research in this field.

Details DOI

NeurIPS Conference 2025 Conference Paper

FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models

Yan Gao
Massimo R. Scamarcia
Javier Fernandez-Marques
Mohammad Naseri
Chong Ng
Dimitris Stripelis
Zexi Li
Tao Shen

Large Language Models (LLMs) have achieved state-of-the-art results across diverse domains, yet their development remains reliant on vast amounts of publicly available data, raising concerns about data scarcity and the lack of access to domain-specific, sensitive information. Federated Learning (FL) presents a compelling framework to address these challenges by enabling decentralized fine-tuning on pre-trained LLMs without sharing raw data. However, the compatibility and performance of pre-trained LLMs in FL settings remain largely under explored. We introduce the FlowerTune LLM Leaderboard, a first-of-its-kind benchmarking suite designed to evaluate federated fine-tuning of LLMs across four diverse domains: general NLP, finance, medical, and coding. Each domain includes federated instruction-tuning datasets and domain-specific evaluation metrics. Our results, obtained through a collaborative, open-source and community-driven approach, provide the first comprehensive comparison across 26 pre-trained LLMs with different aggregation and fine-tuning strategies under federated settings, offering actionable insights into model performance, resource constraints, and domain adaptation. This work lays the foundation for developing privacy-preserving, domain-specialized LLMs for real-world applications.

PDF Details

EAAI Journal 2025 Journal Article

Multi-condition pipeline leak diagnosis based on acoustic image fusion and whale-optimized evolutionary convolutional neural network

Yuan Yuan
Xiwang Cui
Xiaojuan Han
Yan Gao
Fangcheng Lu
Xianhong Liu

Details DOI

NeurIPS Conference 2025 Conference Paper

RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question Answering

Rongyang Zhang
Yuqing Huang
Chengqiang Lu
Qimeng Wang
Yan Gao
Yao Hu
Yin Xu
Wei Wang

In real-world scenarios, providing user queries with visually enhanced responses can considerably benefit understanding and memory, underscoring the great value of interleaved image-text generation. Despite recent progress, like the visual autoregressive model that unifies text and image processing in a single transformer architecture, generating high-quality interleaved content remains challenging. Moreover, evaluations of these interleaved sequences largely remain underexplored, with existing benchmarks often limited by unimodal metrics that inadequately assess the intricacies of combined image-text outputs. To address these issues, we present RAG-IGBench, a thorough benchmark designed specifically to evaluate the task of Interleaved Generation based on Retrieval-Augmented Generation (RAG-IG) in open-domain question answering. RAG-IG integrates multimodal large language models (MLLMs) with retrieval mechanisms, enabling the models to access external image-text information for generating coherent multimodal content. Distinct from previous datasets, RAG-IGBench draws on the latest publicly available content from social platforms and introduces innovative evaluation metrics that measure the quality of text and images, as well as their consistency. Through extensive experiments with state-of-the-art MLLMs (both open-source and proprietary) on RAG-IGBench, we provide an in-depth analysis examining the capabilities and limitations of these models. Additionally, we validate our evaluation metrics by demonstrating their high correlation with human assessments. Models fine-tuned on RAG-IGBench's training set exhibit improved performance across multiple benchmarks, confirming both the quality and practical utility of our dataset. Our benchmark is available at https: //github. com/zry13/RAG-IGBench.

PDF Details

EAAI Journal 2025 Journal Article

Real-time pricing strategy considering carbon emissions and time coupling in smart grid: A binary integer bilevel optimization model with decision-making

Yiling Luo
Yan Gao

Details DOI

AILAW Journal 2025 Journal Article

Summarizing judicial documents: a hybrid extractive- abstractive model with legal domain knowledge

Yan Gao
Jie Wu
Zhengtao Liu
Juan Li

Abstract The automatic summarization of judgment documents is a challenging task due to their length and the dispersed nature of the important information they contain. The prevailing approach to tackling the summarization of lengthy documents involves the integration of both extractive and abstractive summarization models. However, current extractive models face challenges in capturing all essential details due to the scattered distribution of pertinent information within judgment documents. Additionally, the existing abstractive models still grapple with the problem of "hallucinations" which leads to generating inaccurate information. In our work, we proposed a novel hybrid legal summarization method that incorporates legal domain knowledge into both the extractive model and abstractive model. The method consists of two parts: (1) The rhetorical role of sentences is identified by the sentence-level sequence labeling method, and the rhetorical information is integrated into the extractive model based on WoBERT through the conditional normalization to ensure that the identification of key sentences is both precise and complete. (2) The pre-trained model RoFormer is combined with Seq2Seq to construct a long text summarization model, and the prior knowledge in the external resources and the document itself is introduced into the decoding process to improve the faithfulness and coherence of the composed summary. In addition, the contrastive learning strategy is employed during the training process to enhance the robustness of the abstractive model. Experimental results on the CAIL2020 dataset show that the proposed model is superior to the baseline methods. Furthermore, our method outperforms GPT and other LLMs in processing judgment documents.

Details DOI

EAAI Journal 2025 Journal Article

The end-to-end chip surface defect segmentation method based on the diffusion model and attention mechanism

Zilin Xia
Yufan Zhao
Jinan Gu
Wenbo Wang
Zedong Huang
Yan Gao
Peiyue Sun

Details DOI

NeurIPS Conference 2025 Conference Paper

Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLM Planning with Multifaceted Constraints

Dongjie Yang
Chengqiang Lu
Qimeng Wang
Xinbei Ma
Yan Gao
Yao Hu
Hai Zhao

Unlike reasoning, which often entails a deep sequence of deductive steps, complex real-world planning is characterized by the need to synthesize a broad spectrum of parallel and potentially conflicting information and constraints. For example, in travel planning scenarios, it requires the integration of diverse real-world information and user preferences. While LLMs show promise, existing methods with long-horizon thinking struggle with handling multifaceted constraints, leading to suboptimal solutions. Motivated by the challenges of real-world travel planning, this paper introduces the Multiple Aspects of Planning (MAoP), empowering LLMs with "wide-horizon thinking" to solve planning problems with multifaceted constraints. Instead of direct planning, MAoP leverages the strategist to conduct pre-planning from various aspects and provide the planning blueprint for planners, enabling strong inference-time scalability by scaling aspects to consider various constraints. In addition, existing benchmarks for multi-constraint planning are flawed because they assess constraints in isolation, ignoring causal dependencies within the constraints, e. g, travel planning, where past activities dictate future itinerary. To address this, we propose Travel-Sim, an agent-based benchmark assessing plans via real-world simulation, thereby inherently resolving these causal dependencies. This paper advances LLM capabilities in complex planning and offers novel insights for evaluating sophisticated scenarios through simulation.

PDF Details

AAAI Conference 2024 Conference Paper

AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries

Runqi Wang
Huixin Sun
Linlin Yang
Shaohui Lin
Chuanjian Liu
Yan Gao
Yao Hu
Baochang Zhang

DEtection TRansformer (DETR)-based models have achieved remarkable performance. However, they are accompanied by a large computation overhead cost, which significantly prevents their applications on resource-limited devices. Prior arts attempt to reduce the computational burden of DETR using low-bit quantization, while these methods sacrifice a severe significant performance on weight-activation-attention low-bit quantization. We observe that the number of matching queries and positive samples affect much on the representation capacity of queries in DETR, while quantifying queries of DETR further reduces its representational capacity, thus leading to a severe performance drop. We introduce a new quantization strategy based on Auxiliary Queries for DETR (AQ-DETR), aiming to enhance the capacity of quantized queries. In addition, a layer-by-layer distillation is proposed to reduce the quantization error between quantized attention and full-precision counterpart. Through our extensive experiments on large-scale open datasets, the performance of the 4-bit quantization of DETR and Deformable DETR models is comparable to full-precision counterparts.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Multi-Scene Generalized Trajectory Global Graph Solver with Composite Nodes for Multiple Object Tracking

Yan Gao
Haojun Xu
Jie Li
Nannan Wang
Xinbo Gao

The global multi-object tracking (MOT) system can consider interaction, occlusion, and other ``visual blur'' scenarios to ensure effective object tracking in long videos. Among them, graph-based tracking-by-detection paradigms achieve surprising performance. However, their fully-connected nature poses storage space requirements that challenge algorithm handling long videos. Currently, commonly used methods are still generated trajectories by building one-forward associations across frames. Such matches produced under the guidance of first-order similarity information may not be optimal from a longer-time perspective. Moreover, they often lack an end-to-end scheme for correcting mismatches. This paper proposes the Composite Node Message Passing Network (CoNo-Link), a multi-scene generalized framework for modeling ultra-long frames information for association. CoNo-Link's solution is a low-storage overhead method for building constrained connected graphs. In addition to the previous method of treating objects as nodes, the network innovatively treats object trajectories as nodes for information interaction, improving the graph neural network's feature representation capability. Specifically, we formulate the graph-building problem as a top-k selection task for some reliable objects or trajectories. Our model can learn better predictions on longer-time scales by adding composite nodes. As a result, our method outperforms the state-of-the-art in several commonly used datasets.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

Xinyi He
Mengyu Zhou
Xinrun Xu
Xiaojun Ma
Rui Ding
Lun Du
Yan Gao
Ran Jia

Tabular data analysis is crucial in various fields, and large language models show promise in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like forecasting and chart generation. To address this gap, we developed the Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond the SQL-compatible operations and require more in-depth analysis. We also develop five innovative and effective annotation methods, harnessing the capabilities of large language models to enhance data quality and quantity. Additionally, we include unclear queries that resemble real-world user questions to test how well models can understand and tackle such challenges. Finally, we collect 2249 query-result pairs with 347 tables. We evaluate five state-of-the-art models using three different metrics and the results show that our benchmark presents introduces considerable challenge in the field of tabular data analysis, paving the way for more advanced research opportunities.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation

Shiwei Wu
Joya Chen
Kevin Qinghong Lin
Qimeng Wang
Yan Gao
Qianli Xu
Tong Xu
Yao Hu

A well-known dilemma in large vision-language models (e. g. , GPT-4, LLaVA) is that while increasing the number of vision tokens generally enhances visual understanding, it also significantly raises memory and computational costs, especially in long-term, dense video frame streaming scenarios. Although learnable approaches like Q-Former and Perceiver Resampler have been developed to reduce the vision token burden, they overlook the context causally modeled by LLMs (i. e. , key-value cache), potentially leading to missed visual cues when addressing user queries. In this paper, we introduce a novel approach to reduce vision compute by leveraging redundant vision tokens ``skipping layers'' rather than decreasing the number of vision tokens. Our method, VideoLLM-MoD, is inspired by mixture-of-depths LLMs and addresses the challenge of numerous vision tokens in long-term or streaming video. Specifically, for certain transformer layer, we learn to skip the computation for a high proportion (e. g. , 80\%) of vision tokens, passing them directly to the next layer. This approach significantly enhances model efficiency, achieving approximately 42% time and 30% memory savings for the entire training. Moreover, our method reduces the computation in the context and avoid decreasing the vision tokens, thus preserving or even improving performance compared to the vanilla model. We conduct extensive experiments to demonstrate the effectiveness of VideoLLM-MoD, showing its state-of-the-art results on multiple benchmarks, including narration, forecasting, and summarization tasks in COIN, Ego4D, and Ego-Exo4D datasets. The code and checkpoints will be made available at github. com/showlab/VideoLLM-online.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Vript: A Video Is Worth Thousands of Words

Dongjie Yang
Suyuan Huang
Chengqiang Lu
Xiaodong Han
Haoxin Zhang
Yan Gao
Yao Hu
Hai Zhao

Advancements in multimodal learning, particularly in video understanding and generation, require high-quality video-text datasets for improved model performance. Vript addresses this issue with a meticulously annotated corpus of 12K high-resolution videos, offering detailed, dense, and script-like captions for over 420K clips. Each clip has a caption of ~145 words, which is over 10x longer than most video-text datasets. Unlike captions only documenting static content in previous datasets, we enhance video captioning to video scripting by documenting not just the content, but also the camera operations, which include the shot types (medium shot, close-up, etc) and camera movements (panning, tilting, etc). By utilizing the Vript, we explore three training paradigms of aligning more text with the video modality rather than clip-caption pairs. This results in Vriptor, a top-performing video captioning model among open-source models, comparable to GPT-4V in performance. Vriptor is also a powerful model capable of end-to-end generation of dense and detailed captions for long videos. Moreover, we introduce Vript-Hard, a benchmark consisting of three video understanding tasks that are more challenging than existing benchmarks: Vript-HAL is the first benchmark evaluating action and object hallucinations in video LLMs, Vript-RR combines reasoning with retrieval resolving question ambiguity in long-video QAs, and Vript-ERO is a new task to evaluate the temporal understanding of events in long videos rather than actions in short videos in previous works. All code, models, and datasets are available in https: //github. com/mutonix/Vript.

PDF Details DOI

JMLR Journal 2023 Journal Article

A First Look into the Carbon Footprint of Federated Learning

Xinchi Qiu
Titouan Parcollet
Javier Fernandez-Marques
Pedro P. B. Gusmao
Yan Gao
Daniel J. Beutel
Taner Topal
Akhil Mathur

Despite impressive results, deep learning-based technologies also raise severe privacy and environmental concerns induced by the training procedure often conducted in data centers. In response, alternatives to centralized training such as Federated Learning (FL) have emerged. FL is now starting to be deployed at a global scale by companies that must adhere to new legal demands and policies originating from governments and social groups advocating for privacy protection. However, the potential environmental impact related to FL remains unclear and unexplored. This article offers the first-ever systematic study of the carbon footprint of FL. We propose a rigorous model to quantify the carbon footprint, hence facilitating the investigation of the relationship between FL design and carbon emissions. We also compare the carbon footprint of FL to traditional centralized learning. Our findings show that, depending on the configuration, FL can emit up to two orders of magnitude more carbon than centralized training. However, in certain settings, it can be comparable to centralized learning due to the reduced energy consumption of embedded devices. Finally, we highlight and connect the results to the future challenges and trends in FL to reduce its environmental impact, including algorithms efficiency, hardware capabilities, and stronger industry transparency. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

PDF Details

EAAI Journal 2023 Journal Article

Few-shot learning for image-based bridge damage detection

Yan Gao
Haijiang Li
Weiqi Fu

Details DOI

AAAI Conference 2023 Conference Paper

MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

Longxu Dou
Yan Gao
Mingyang Pan
Dingzirui Wang
Wanxiang Che
Dechen Zhan
Jian-Guang Lou

Text-to-SQL semantic parsing is an important NLP task, which facilitates the interaction between users and the database. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MultiSpider, the largest multilingual text-to-SQL semantic parsing dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MultiSpider we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under various settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVe (Schema-Augmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Uncovering and Quantifying Social Biases in Code Generation

Yan Liu
Xiaokang Chen
Yan Gao
Zhe Su
Fengji Zhang
Daoguang Zan
Jian-Guang Lou
Pin-Yu Chen

With the popularity of automatic code generation tools, such as Copilot, the study of the potential hazards of these tools is gaining importance. In this work, we explore the social bias problem in pre-trained code generation models. We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models. To quantify the severity of social biases in generated code, we develop a dataset along with three metrics to evaluate the overall social bias and fine-grained unfairness across different demographics. Experimental results on three pre-trained code generation models (Codex, InCoder, and CodeGen) with varying sizes, reveal severe social biases. Moreover, we conduct analysis to provide useful insights for further choice of code generation models with low social bias.

PDF Details

NeurIPS Conference 2022 Conference Paper

LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

Xinyu Pi
Wanjun Zhong
Yan Gao
Nan Duan
Jian-Guang Lou

We present LogiGAN, an unsupervised adversarial pre-training framework for improving logical reasoning abilities of language models. Upon automatic identification of logical reasoning phenomena in massive text corpus via detection heuristics, we train language models to predict the masked-out logical statements. Inspired by the facilitation effect of reflective thinking in human learning, we analogically simulate the learning-thinking process with an adversarial Generator-Verifier architecture to assist logic learning. LogiGAN implements a novel sequential GAN approach that (a) circumvents the non-differentiable challenge of the sequential GAN by leveraging the Generator as a sentence-level generative likelihood scorer with a learning objective of reaching scoring consensus with the Verifier; (b) is computationally feasible for large-scale pre-training with arbitrary target length. Both base and large size language models pre-trained with LogiGAN demonstrate obvious performance improvement on 12 datasets requiring general reasoning abilities, revealing the fundamental role of logic in broad reasoning, as well as the effectiveness of LogiGAN. Ablation studies on LogiGAN components reveal the relative orthogonality between linguistic and logic abilities and suggest that reflective thinking's facilitation effect might also generalize to machine learning.

PDF Details

IJCAI Conference 2021 Conference Paper

Keep the Structure: A Latent Shift-Reduce Parser for Semantic Parsing

Yuntao Li
Bei Chen
Qian Liu
Yan Gao
Jian-Guang Lou
Yan Zhang
Dongmei Zhang

Traditional end-to-end semantic parsing models treat a natural language utterance as a holonomic structure. However, hierarchical structures exist in natural languages, which also align with the hierarchical structures of logical forms. In this paper, we propose a latent shift-reduce parser, called LASP, which decomposes both natural language queries and logical form expressions according to their hierarchical structures and finds local alignment between them to enhance semantic parsing. LASP consists of a base parser and a shift-reduce splitter. The splitter dynamically separates an NL query into several spans. The base parser converts the relevant simple spans into logical forms, which are further combined to obtain the final logical form. We conducted empirical studies on two datasets across different domains and different types of logical forms. The results demonstrate that the proposed method significantly improves the performance of semantic parsing, especially on unseen scenarios.

PDF Details DOI

NeurIPS Conference 2021 Conference Paper

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge

Jiyang Qi
Yan Gao
Yao Hu
Xinggang Wang
Xiaoyu Liu
Xiang Bai
Serge Belongie
Alan Yuille

Although deep learning methods have achieved advanced video object recognition performance in recent years, perceiving heavily occluded objects in a video is still a very challenging task. To promote the development of occlusion understanding, we collect a large-scale dataset called OVIS for video instance segmentation in the occluded scenario. OVIS consists of 296k high-quality instance masks and 901 occluded scenes. While our human vision systems can perceive those occluded objects by contextual reasoning and association, our experiments suggest that current video understanding systems cannot. On the OVIS dataset, all baseline methods encounter a significant performance degradation of about 80\% in the heavily occluded object group, which demonstrates that there is still a long way to go in understanding obscured objects and videos in a complex real-world scenario. To facilitate the research on new paradigms for video understanding systems, we launched a challenge basing on the OVIS dataset. The submitted top-performing algorithms have achieved much higher performance than our baselines. In this paper, we will introduce the OVIS dataset and further dissect it by analyzing the results of baselines and submitted methods. The OVIS dataset and challenge information can be found at \url{http: //songbai. site/ovis}.

PDF Details

ICRA Conference 2021 Conference Paper

Thrust Enhancement of Wave-driven Unmanned Surface Vehicle by using Asymmetric Foil

Yan Gao
Lyucheng Xie
Tin Lun Lam

In the Wave-driven unmanned surface vehicles (WUSVs), oscillating-foils are the most straightforward and widely used wave energy conversion mechanism. In this paper, a kind of novel asymmetric foil is proposed, which improves the wave energy-converting efficiency to provide a more significant thrust in every wave cycle. We break down the movement of the foils in the wave and build the corresponding kinetic model to analyze their working effectiveness numerically. Through computational fluid dynamic (CFD) simulations, we determine the optimal values of critical parameters of the foils, which are suitable for a wide range of wave conditions. The thrust enhancement of the asymmetric foil is verified in both CFD simulations and hydrodynamic experiments, and the result shows a similar enhancement trend. Comparing with the traditional symmetric foil, our asymmetric foil can provide at least 13. 75% more thrust to the WUSVs.

Details

NeurIPS Conference 2020 Conference Paper

Compositional Generalization by Learning Analytical Expressions

Qian Liu
Shengnan An
Jian-Guang Lou
Bei Chen
Zeqi Lin
Yan Gao
Bin Zhou
Nanning Zheng

Compositional generalization is a basic and essential intellective capability of human beings, which allows us to recombine known parts readily. However, existing neural network based models have been proven to be extremely deficient in such a capability. Inspired by work in cognition which argues compositionality can be captured by variable slots with symbolic functions, we present a refreshing view that connects a memory-augmented neural model with analytical expressions, to achieve compositional generalization. Our model consists of two cooperative neural modules, Composer and Solver, fitting well with the cognitive argument while being able to be trained in an end-to-end manner via a hierarchical reinforcement learning algorithm. Experiments on the well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization, solving all challenges addressed by previous works with 100% accuracies.

PDF Details

IJCAI Conference 2020 Conference Paper

RECPARSER: A Recursive Semantic Parsing Framework for Text-to-SQL Task

Yu Zeng
Yan Gao
Jiaqi Guo
Bei Chen
Qian Liu
Jian-Guang Lou
Fei Teng
Dongmei Zhang

Neural semantic parsers usually fail to parse long and complicated utterances into nested SQL queries, due to the large search space. In this paper, we propose a novel recursive semantic parsing framework called RECPARSER to generate the nested SQL query layer-by-layer. It decomposes the complicated nested SQL query generation problem into several progressive non-nested SQL query generation problems. Furthermore, we propose a novel Question Decomposer module to explicitly encourage RECPARSER to focus on different components of an utterance when predicting SQL queries of different layers. Experiments on the Spider dataset show that our approach is more effective compared to the previous works at predicting the nested SQL queries. In addition, we achieve an overall accuracy that is comparable with state-of-the-art approaches.

PDF Details DOI

EAAI Journal 2018 Journal Article

Multi-time slots real-time pricing strategy with power fluctuation caused by operating continuity of smart home appliances

Hongbo Zhu
Yan Gao
Yong Hou
Li Tao

Details DOI

ICRA Conference 2001 Conference Paper

Closed-Form Inverse Kinematics Solver for Reconfigurable Robots

I-Ming Chen 0001
Yan Gao

A closed-form inverse kinematics solver for non-redundant reconfigurable robots is developed based on the product-of-exponentials (POE) formula. Its novelty lies in the use of POE reduction techniques and subproblems to obtain inverse kinematics solutions of a large number of possible configurations in a systematic and convenient way. Three reduction techniques are introduced to simplify the POE equations. Eleven types of subproblems containing geometric solutions of those simplified equations are identified and solved. Based on the sequence and types of robot joints, the solved sub-problems can be re-used for inverse kinematics of different robot configurations. This solver can cope with closed-form inverse kinematics of all robots with DOFs of 4 or less, 90 percent of the 5-DOF robots and 50 percent of the 6-DOF robots, as well as frequently used industrial robots with both prismatic and revolute joints. The solver is implemented as a C++ software package and is demonstrated through an example.

Details

ICRA Conference 1999 Conference Paper

Gait Generation for Inchworm-Like Robot Locomotion Using Finite State Model

I-Ming Chen 0001
Song Huat Yeo
Yan Gao

The gait of a multisegment inchworm robot is a series of actuator actions that will change the shape of the robot to generate a net motion. In this article, we model the multisegment inchworm robot as a finite state automaton. Gait generation is posed as a search problem on the graph described by the automaton with prescribed state transitions. The state transitions are defined based on the kinematics of robot locomotion. The auxiliary actuator concept is introduced. Single-stride and multistride gait generations are discussed. Single-stride gaits exhibit fault-tolerant and real-time computation features that are neccessary in actual applications. Both computer simulation and experimental hardware platform are developed for various aspects of the gait generation and planning.

Details