Arrow Research search

Author name cluster

Tao Zhou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers
2 author rows

Possible papers

25

JBHI Journal 2026 Journal Article

Accurate Segmentation of Surgical Instruments via Spectral-Attentive Contextual Interaction Network

  • Jiaxin Mei
  • Yizhe Zhang
  • Xiangjian He
  • Tao Zhou

Surgical instrument segmentation is crucial for enhancing visual perception and enabling precise manipulation in robotic surgical systems. However, current segmentation models continue to face substantial challenges in terms of accuracy and robustness due to complex background interference, diverse instrument morphologies, and low contrast between instruments and surrounding tissues in surgical environments. Despite significant advances in deep learning-based approaches, existing models still fall short in capturing the fine edges and global contextual relationships of instruments. To address these issues, we propose a Spectral-attentive Contextual Interaction Network (SCI-Net) for surgical instrument segmentation. Specifically, we present a Global Context Aggregation Module (GCAM) to integrate high-level features, which is used to produce a global map for the coarse localization of the segmented target. Then, a Spectral-enhanced Feature Module (SFM) is proposed to enhance the expression of features in the form of frequency-domain attention by transforming features from the spatial domain to the frequency domain. In addition, we design the Scale-aware Dilation Module (SDM) in the decoder to further adaptively integrate the augmented features through multi-scale dilation convolution combined with a dynamic fusion mechanism, which improves the segmentation performance on the fine boundaries of instruments. We have extensively validated SCI-Net on multiple publicly available surgical instrument segmentation datasets, and the experimental results show that SCI-Net significantly outperforms other state-of-the-art segmentation methods.

AAAI Conference 2026 Conference Paper

Bidirectional Channel-selective Semantic Interaction for Semi-Supervised Medical Segmentation

  • Kaiwen Huang
  • Yizhe Zhang
  • Yi Zhou
  • Tianyang Xu
  • Tao Zhou

Semi-supervised medical image segmentation is an effective method for addressing scenarios with limited labeled data. Existing methods mainly rely on frameworks such as mean teacher and dual-stream consistency learning. These approaches often face issues like error accumulation and model structural complexity, while also neglecting the interaction between labeled and unlabeled data streams. To overcome these challenges, we propose a Bidirectional Channel-selective Semantic Interaction (BCSI) framework for semi-supervised medical image segmentation. First, we propose a Semantic-Spatial Perturbation (SSP) mechanism, which disturbs the data using two strong augmentation operations and leverages unsupervised learning with pseudo-labels from weak augmentations. Additionally, we employ consistency on the predictions from the two strong augmentations to further improve model stability and robustness. Second, to reduce noise during the interaction between labeled and unlabeled data, we propose a Channel-selective Router (CR) component, which dynamically selects the most relevant channels for information exchange. This mechanism ensures that only highly relevant features are activated, minimizing unnecessary interference. Finally, the Bidirectional Channel-wise Interaction (BCI) strategy is employed to supplement additional semantic information and enhance the representation of important channels. Experimental results on multiple benchmarking 3D medical datasets demonstrate that the proposed method outperforms existing semi-supervised approaches.

JBHI Journal 2026 Journal Article

Mining Global and Local Semantics From Unlabeled Spectra for Spectral Classification

  • Wei Luo
  • Haiming Yao
  • Ang Gao
  • Tao Zhou
  • Xue Wang

Non-destructive detection methods based on molecular vibrational spectroscopy are pivotal in fields such as analytical chemistry and medical diagnostics. Recent advances have integrated deep learning with vibrational spectroscopy, significantly enhancing spectral recognition accuracy. However, these methods often rely on large annotated spectral datasets, limiting their general applicability. To address this limitation, we propose a novel approach, G lobal and L ocal S emantics M ining (GLSM), which leverages self-supervised learning to capture the global and local semantic information of unlabeled spectra, obviating the need for extensive annotated data. We devise two proxy tasks: global semantic mining and local semantic mining. The global semantic mining task is based on the premise that different views of the same spectrum can be mutually transformed, enabling the model to capture domain-invariant features across various perspectives and thereby develop a global understanding of the spectral data. This, in turn, enhances the model’s robustness to variations in peak positions. Meanwhile, the local semantic mining task posits that noisy spectra can be reconstructed into noise-free spectra, thereby facilitating the extraction of local patterns and fine-grained details, such as subtle variations in peak intensities. By combining both self-supervised tasks, our model effectively captures the global and local semantic information of the spectrum. The pre-trained model can be fine-tuned with a limited amount of labeled homologous or heterologous spectral data for semi-supervised or transfer learning-based spectral classification. Extensive experiments on three datasets in semi-supervised and transfer learning-based spectral recognition tasks comprehensively validate the effectiveness of our GLSM method, demonstrating its significant potential for real-world spectral analysis applications.

IJCAI Conference 2025 Conference Paper

A Correlation Manifold Self-Attention Network for EEG Decoding

  • Chen Hu
  • Rui Wang
  • Xiaoning Song
  • Tao Zhou
  • Xiao-Jun Wu
  • Nicu Sebe
  • Ziheng Chen

Riemannian neural networks, which generalize the deep learning paradigm to non-Euclidean geometries, have garnered widespread attention across diverse applications in artificial intelligence. Among these, the representative attention models have been studied on various non-Euclidean spaces to geometrically capture the spatiotemporal dependencies inherent in time series data, e. g. , electroencephalography (EEG). Recent studies have highlighted the full-rank correlation matrix as an advantageous alternative to the covariance matrix for data representation, owing to its invariance to the scale of variables. Motivated by these advancements, we propose the Correlation Attention Network (CorAtt) tailored for full-rank correlation matrices and implement it under the permutation-invariant and computationally efficient Off-Log and Log-Scaled geometries, respectively. Extensive evaluations on three benchmarking EEG datasets provide substantial evidence for the effectiveness of our introduced CorAtt. The code and supplementary material can be found at https: //github. com/ChenHu-ML/CorAtt.

AAAI Conference 2025 Conference Paper

Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

  • Zheng Hu
  • Zhe Li
  • Ziyun Jiao
  • Satoshi Nakagawa
  • Jiawen Deng
  • Shimin Cai
  • Tao Zhou
  • Fuji Ren

In recent years, knowledge graphs have been integrated into recommender systems as item-side auxiliary information, enhancing recommendation accuracy. However, constructing and integrating structural user-side knowledge remains a significant challenge due to the improper granularity and inherent scarcity of user-side features. Recent advancements in Large Language Models (LLMs) offer the potential to bridge this gap by leveraging their human behavior understanding and extensive real-world knowledge. Nevertheless, integrating LLM-generated information into recommender systems presents challenges, including the risk of noisy information and the need for additional knowledge transfer. In this paper, we propose an LLM-based user-side knowledge inference method alongside a carefully designed recommendation framework to address these challenges. Our approach employs LLMs to infer user interests based on historical behaviors, integrating this user-side information with item-side and collaborative data to construct a hybrid structure: the Collaborative Interest Knowledge Graph (CIKG). Furthermore, we propose a CIKG-based recommendation framework that includes a user interest reconstruction module and a cross-domain contrastive learning module to mitigate potential noise and facilitate knowledge transfer. We conduct extensive experiments on three real-world datasets to validate the effectiveness of our method. Our approach achieves state-of-the-art performance compared to competitive baselines, particularly for users with sparse interactions.

ICML Conference 2025 Conference Paper

Compositional Condition Question Answering in Tabular Understanding

  • Jun-Peng Jiang
  • Tao Zhou
  • De-Chuan Zhan
  • Han-Jia Ye

Multimodal Large Language Models (MLLMs) for tabular understanding have made significant progress in tasks such as financial report analysis and public data tests. However, our comprehensive analysis shows that these models are still limited in certain simple scenarios, particularly when handling compositional conditions in QA. Further investigation reveals that the poor performance can be attributed to two main challenges: the visual encoder’s inability to accurately recognize the content of a row, and the model’s tendency to overlook conditions in the question. To address these, we introduce a new Compositional Condition Tabular Understanding method, called CoCoTab. Specifically, to capture the structural relationships within tables, we enhance the visual encoder with additional row and column patches. Moreover, we introduce the conditional tokens between the visual patches and query embeddings, ensuring the model focuses on relevant parts of the table according to the conditions specified in the query. Additionally, we also introduce the Massive Multimodal Tabular Understanding (MMTU) benchmark, which comprehensively assesses the full capabilities of MLLMs in tabular understanding. Our proposed method achieves state-of-the-art performance on both existing tabular understanding benchmarks and MMTU. Our code can be available at https: //github. com/LAMDA-Tabular/MMTU.

JMLR Journal 2025 Journal Article

Talent: A Tabular Analytics and Learning Toolbox

  • Si-Yang Liu
  • Hao-Run Cai
  • Qi-Le Zhou
  • Huai-Hong Yin
  • Tao Zhou
  • Jun-Peng Jiang
  • Han-Jia Ye

Tabular data is a prevalent source in machine learning. While classical methods have proven effective, deep learning methods for tabular data are emerging as flexible alternatives due to their capacity to uncover hidden patterns and capture complex interactions. Considering that deep tabular methods exhibit diverse design philosophies, including the ways they handle features, design learning objectives, and construct model architectures, we introduce Talent (Tabular Analytics and Learning Toolbox), a versatile toolbox for utilizing, analyzing, and comparing these methods. Talent includes over 35 deep tabular prediction methods, offering various encoding and normalization modules, all within a unified, easily extensible interface. We demonstrate its design, application, and performance evaluation in case studies. The code is available at https://github.com/LAMDA-Tabular/TALENT. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2025. ( edit, beta )

JBHI Journal 2024 Journal Article

A Dual-Branch Framework With Prior Knowledge for Precise Segmentation of Lung Nodules in Challenging CT Scans

  • Wujun Jiang
  • Lijia Zhi
  • Shaomin Zhang
  • Tao Zhou

Lung cancer is one of the deadliest cancers globally, and early diagnosis is crucial for patient survival. Pulmonary nodules are the main manifestation of early lung cancer, usually assessed using CT scans. Nowadays, computer-aided diagnostic systems are widely used to assist physicians in disease diagnosis. The accurate segmentation of pulmonary nodules is affected by internal heterogeneity and external data factors. In order to overcome the segmentation challenges of subtle, mixed, adhesion-type, benign, and uncertain categories of nodules, a new mixed manual feature network that enhances sensitivity and accuracy is proposed. This method integrates feature information through a dual-branch network framework and multi-dimensional fusion module. By training and validating with multiple data sources and different data qualities, our method demonstrates leading performance on the LUNA16, Multi-thickness Slice Image dataset, LIDC, and UniToChest, with Dice similarity coefficients reaching 86. 89%, 75. 72%, 84. 12%, and 80. 74% respectively, surpassing most current methods for pulmonary nodule segmentation. Our method further improved the accuracy, reliability, and stability of lung nodule segmentation tasks even on challenging CT scans.

ICRA Conference 2024 Conference Paper

TPGP: Temporal-Parametric Optimization with Deep Grasp Prior for Dexterous Motion Planning

  • Haoming Li 0004
  • Qi Ye 0001
  • Yuchi Huo
  • Qingtao Liu
  • Shijian Jiang
  • Tao Zhou
  • Xiang Li
  • Yang Zhou

Grasping motion planning aims to find a feasible grasping trajectory in the configuration space given an input target grasp. While optimizing grasp motion with two or three-fingered grippers has been well studied, the study on natural grasp motion planning with a dexterous hand remains a very challenging problem due to the high dimensional working space. In this work, we propose a novel temporal-parametric grasp prior (TPGP) optimization method to simplify the difficulty of grasping trajectory optimization for the dexterous hand while maintaining smooth and natural properties of the grasping motion. Specifically, we formulate the discrete trajectory parameters into a temporal-based parameterization, where the prior constraint provided by a hand poser network, is introduced to ensure that hand pose is natural and reasonable throughout the trajectory. Finally, we present a joint target optimization strategy to enhance the target pose for more feasible trajectories. Extensive validations on two public datasets show that our method outperforms state-of-the-art methods regarding grasp motion on various metrics.

JBHI Journal 2023 Journal Article

A Structure-Guided Effective and Temporal-Lag Connectivity Network for Revealing Brain Disorder Mechanisms

  • Zhengwang Xia
  • Tao Zhou
  • Saqib Mamoon
  • Amani Alfakih
  • Jianfeng Lu

Brain network provides important insights for the diagnosis of many brain disorders, and how to effectively model the brain structure has become one of the core issues in the domain of brain imaging analysis. Recently, various computational methods have been proposed to estimate the causal relationship (i. e. , effective connectivity) between brain regions. Compared with traditional correlation-based methods, effective connectivity can provide the direction of information flow, which may provide additional information for the diagnosis of brain diseases. However, existing methods either ignore the fact that there is a temporal-lag in the information transmission across brain regions, or simply set the temporal-lag value between all brain regions to a fixed value. To overcome these issues, we design an effective temporal-lag neural network (termed ETLN) to simultaneously infer the causal relationships and the temporal-lag values between brain regions, which can be trained in an end-to-end manner. In addition, we also introduce three mechanisms to better guide the modeling of brain networks. The evaluation results on the Alzheimer's Disease Neuroimaging Initiative (ADNI) database demonstrate the effectiveness of the proposed method.

JBHI Journal 2023 Journal Article

Flexible Fusion Network for Multi-Modal Brain Tumor Segmentation

  • Hengyi Yang
  • Tao Zhou
  • Yi Zhou
  • Yizhe Zhang
  • Huazhu Fu

Automated brain tumor segmentation is crucial for aiding brain disease diagnosis and evaluating disease progress. Currently, magnetic resonance imaging (MRI) is a routinely adopted approach in the field of brain tumor segmentation that can provide different modality images. It is critical to leverage multi-modal images to boost brain tumor segmentation performance. Existing works commonly concentrate on generating a shared representation by fusing multi-modal data, while few methods take into account modality-specific characteristics. Besides, how to efficiently fuse arbitrary numbers of modalities is still a difficult task. In this study, we present a flexible fusion network (termed F $^{2}$ Net) for multi-modal brain tumor segmentation, which can flexibly fuse arbitrary numbers of multi-modal information to explore complementary information while maintaining the specific characteristics of each modality. Our F $^{2}$ Net is based on the encoder-decoder structure, which utilizes two Transformer-based feature learning streams and a cross-modal shared learning network to extract individual and shared feature representations. To effectively integrate the knowledge from the multi-modality data, we propose a cross-modal feature-enhanced module (CFM) and a multi-modal collaboration module (MCM), which aims at fusing the multi-modal features into the shared learning network and incorporating the features from encoders into the shared decoder, respectively. Extensive experimental results on multiple benchmark datasets demonstrate the effectiveness of our F $^{2}$ Net over other state-of-the-art segmentation methods.

AAAI Conference 2023 Conference Paper

Prompting Neural Machine Translation with Translation Memories

  • Abudurexiti Reheman
  • Tao Zhou
  • Yingfeng Luo
  • Di Yang
  • Tong Xiao
  • Jingbo Zhu

Improving machine translation (MT) systems with translation memories (TMs) is of great interest to practitioners in the MT community. However, previous approaches require either a significant update of the model architecture and/or additional training efforts to make the models well-behaved when TMs are taken as additional input. In this paper, we present a simple but effective method to introduce TMs into neural machine translation (NMT) systems. Specifically, we treat TMs as prompts to the NMT model at test time, but leave the training process unchanged. The result is a slight update of an existing NMT system, which can be implemented in a few hours by anyone who is familiar with NMT. Experimental results on several datasets demonstrate that our system significantly outperforms strong baselines.

JBHI Journal 2022 Journal Article

Guest Editorial Generative Adversarial Networks in Biomedical Image Computing

  • Huazhu Fu
  • Tao Zhou
  • Shuo Li
  • Alejandro F. Frangi

The papers in this special section focus on generative adversarial networks in biomedical image computing. The field of biomedical imaging has obtained great progress from Roentgen’s original discovery of the X-ray to the current imaging tools, including Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), Computed Tomography (CT), and Ultrasound (US). The benefits of using these non-invasive imaging technologies are to assess the current condition of an organ or tissue, which can be used to monitor a patient over time over time for accurate and timely diagnosis and treatment. With the development of imaging technologies, developing advanced artificial intelligence algorithms for automated image analysis has shown the potential to change many aspects of clinical applications within the next decade. Meanwhile, these advanced technologies have also brought new issues and challenges. Thus, there has been a growing demand for biomedical imaging computing to be a component of clinical trials and device improvement. Currently, Generative adversarial networks (GANs) have been attached growing interests in the computer vision community due to their capability of data generation or translation. GAN-based models are able to learn from a set of training data and generate new data with the same characteristics as the training ones, which have also proven to be the state of the art for generating sharp and realistic images. More importantly, GAN has been rapidly applied to many traditional and novel applications in the medical domain, such as image reconstruction, segmentation, diagnosis, synthesis, and so on. Despite GAN substantial progress in these areas, their application to medical image computing still faces challenges and unsolved problems remain.

JBHI Journal 2022 Journal Article

Self-Supervised Multi-Modal Hybrid Fusion Network for Brain Tumor Segmentation

  • Feiyi Fang
  • Yazhou Yao
  • Tao Zhou
  • Guosen Xie
  • Jianfeng Lu

Accurate medical image segmentation of brain tumors is necessary for the diagnosing, monitoring, and treating disease. In recent years, with the gradual emergence of multi-sequence magnetic resonance imaging (MRI), multi-modal MRI diagnosis has played an increasingly important role in the early diagnosis of brain tumors by providing complementary information for a given lesion. Different MRI modalities vary significantly in context, as well as in coarse and fine information. As the manual identification of brain tumors is very complicated, it usually requires the lengthy consultation of multiple experts. The automatic segmentation of brain tumors from MRI images can thus greatly reduce the workload of doctors and buy more time for treating patients. In this paper, we propose a multi-modal brain tumor segmentation framework that adopts the hybrid fusion of modality-specific features using a self-supervised learning strategy. The algorithm is based on a fully convolutional neural network. Firstly, we propose a multi-input architecture that learns independent features from multi-modal data, and can be adapted to different numbers of multi-modal inputs. Compared with single-modal multi-channel networks, our model provides a better feature extractor for segmentation tasks, which learns cross-modal information from multi-modal data. Secondly, we propose a new feature fusion scheme, named hybrid attentional fusion. This scheme enables the network to learn the hybrid representation of multiple features and capture the correlation information between them through an attention mechanism. Unlike popular methods, such as feature map concatenation, this scheme focuses on the complementarity between multi-modal data, which can significantly improve the segmentation results of specific regions. Thirdly, we propose a self-supervised learning strategy for brain tumor segmentation tasks. Our experimental results demonstrate the effectiveness of the proposed model against other state-of-the-art multi-modal medical segmentation methods.

IJCAI Conference 2021 Conference Paper

Context-aware Cross-level Fusion Network for Camouflaged Object Detection

  • Yujia Sun
  • Geng Chen
  • Tao Zhou
  • Yi Zhang
  • Nian Liu

Camouflaged object detection (COD) is a challenging task due to the low boundary contrast between the object and its surroundings. In addition, the appearance of camouflaged objects varies significantly, e. g. , object size and shape, aggravating the difficulties of accurate COD. In this paper, we propose a novel Context-aware Crosslevel Fusion Network (C2F-Net) to address the challenging COD task. Specifically, we propose an Attention-induced Cross-level Fusion Module (ACFM) to integrate the multi-level features with informative attention coefficients. The fused features are then fed to the proposed Dual-branch Global Context Module (DGCM), which yields multi-scale feature representations for exploiting rich global context information. In C2F-Net, the two modules are conducted on high-level features using a cascaded manner. Extensive experiments on three widely used benchmark datasets demonstrate that our C2F-Net is an effective COD model and outperforms state-of-the-art models remarkably. Our code is publicly available at: https: //github. com/thograce/C2FNet.

IROS Conference 2019 Conference Paper

PPR-Net: Point-wise Pose Regression Network for Instance Segmentation and 6D Pose Estimation in Bin-picking Scenarios

  • Zhi-Kai Dong
  • Sicheng Liu
  • Tao Zhou
  • Hui Cheng
  • Long Zeng 0001
  • Xingyao Yu
  • Houde Liu

Accurate object 6D pose estimation is a core task for robot bin-picking applications, especially when objects are randomly stacked with heavy occlusion. To address this problem, this paper proposes a simple but novel Point-wise Pose Regression Network (PPR-Net). For each point in the point cloud, the network regresses a 6D pose of the object instance that the point belongs to. We argue that the regressed poses of points from the same object instance should be located closely in pose space. Thus, these points can be clustered into different instances and their corresponding objects’ 6D poses can be estimated simultaneously. In our experiments, PPR-Net outperforms the state-of-the-art approach by 15% - 41% in average precision when evaluated on the benchmark Siléane dataset. In addition, it also works well in real world robot bin-picking tasks.

TIST Journal 2019 Journal Article

Predicting Academic Performance for College Students

  • Huaxiu Yao
  • Defu Lian
  • Yi Cao
  • Yifan Wu
  • Tao Zhou

Detecting abnormal behaviors of students in time and providing personalized intervention and guidance at the early stage is important in educational management. Academic performance prediction is an important building block to enabling this pre-intervention and guidance. Most of the previous studies are based on questionnaire surveys and self-reports, which suffer from small sample size and social desirability bias. In this article, we collect longitudinal behavioral data from the smart cards of 6,597 students and propose three major types of discriminative behavioral factors, diligence, orderliness, and sleep patterns. Empirical analysis demonstrates these behavioral factors are strongly correlated with academic performance. Furthermore, motivated by the social influence theory, we analyze the correlation between each student’s academic performance with his/her behaviorally similar students’. Statistical tests indicate this correlation is significant. Based on these factors, we further build a multi-task predictive framework based on a learning-to-rank algorithm for academic performance prediction. This framework captures inter-semester correlation, inter-major correlation, and integrates student similarity to predict students’ academic performance. The experiments on a large-scale real-world dataset show the effectiveness of our methods for predicting academic performance and the effectiveness of proposed behavioral factors.

IJCAI Conference 2018 Conference Paper

Customer Sharing in Economic Networks with Costs

  • Bin Li
  • Dong Hao
  • Dengji Zhao
  • Tao Zhou

In an economic market, sellers, infomediaries and customers constitute an economic network. Each seller has her own customer group and the seller's private customers are unobservable to other sellers. Therefore, a seller can only sell commodities among her own customers unless other sellers or infomediaries share her sale information to their customer groups. However, a seller is not incentivized to share others' sale information by default, which leads to inefficient resource allocation and limited revenue for the sale. To tackle this problem, we develop a novel mechanism called customer sharing mechanism (CSM) which incentivizes all sellers to share each other's sale information to their private customer groups. Furthermore, CSM also incentivizes all customers to truthfully participate in the sale. In the end, CSM not only allocates the commodities efficiently but also optimizes the seller's revenue.

AAAI Conference 2018 Conference Paper

Multi-Layer Multi-View Classification for Alzheimer’s Disease Diagnosis

  • Changqing Zhang
  • Ehsan Adeli
  • Tao Zhou
  • Xiaobo Chen
  • Dinggang Shen

In this paper, we propose a novel multi-view learning method for Alzheimer’s Disease (AD) diagnosis, using neuroimaging and genetics data. Generally, there are several major challenges associated with traditional classification methods on multi-source imaging and genetics data. First, the correlation between the extracted imaging features and class labels is generally complex, which often makes the traditional linear models ineffective. Second, medical data may be collected from different sources (i. e. , multiple modalities of neuroimaging data, clinical scores or genetics measurements), therefore, how to effectively exploit the complementarity among multiple views is of great importance. In this paper, we propose a Multi-Layer Multi-View Classification (ML-MVC) approach, which regards the multi-view input as the first layer, and constructs a latent representation to explore the complex correlation between the features and class labels. This captures the high-order complementarity among different views, as we exploit the underlying information with a low-rank tensor regularization. Intrinsically, our formulation elegantly explores the nonlinear correlation together with complementarity among different views, and thus improves the accuracy of classification. Finally, the minimization problem is solved by the Alternating Direction Method of Multipliers (ADMM). Experimental results on Alzheimers Disease Neuroimaging Initiative (ADNI) data sets validate the effectiveness of our proposed method.

IJCAI Conference 2018 Conference Paper

Payoff Control in the Iterated Prisoner's Dilemma

  • Dong Hao
  • Kai Li
  • Tao Zhou

Repeated game has long been the touchstone model for agents’ long-run relationships. Previous results suggest that it is particularly difficult for a repeated game player to exert an autocratic control on the payoffs since they are jointly determined by all participants. This work discovers that the scale of a player’s capability to unilaterally influence the payoffs may have been much underestimated. Under the conventional iterated prisoner’s dilemma, we develop a general framework for controlling the feasible region where the players’ payoff pairs lie. A control strategy player is able to confine the payoff pairs in her objective region, as long as this region has feasible linear boundaries. With this framework, many well-known existing strategies can be categorized and various new strategies with nice properties can be further identified. We show that the control strategies perform well either in a tournament or against a human-like opponent.

AAAI Conference 2017 Conference Paper

Mechanism Design in Social Networks

  • Bin Li
  • Dong Hao
  • Dengji Zhao
  • Tao Zhou

This paper studies an auction design problem for a seller to sell a commodity in a social network, where each individual (the seller or a buyer) can only communicate with her neighbors. The challenge to the seller is to design a mechanism to incentivize the buyers, who are aware of the auction, to further propagate the information to their neighbors so that more buyers will participate in the auction and hence, the seller will be able to make a higher revenue. We propose a novel auction mechanism, called information diffusion mechanism (IDM), which incentivizes the buyers to not only truthfully report their valuations on the commodity to the seller, but also further propagate the auction information to all their neighbors. In comparison, the direct extension of the well-known Vickrey-Clarke-Groves (VCG) mechanism in social networks can also incentivize the information diffusion, but it will decrease the seller’s revenue or even lead to a deficit sometimes. The formalization of the problem has not yet been addressed in the literature of mechanism design and our solution is very significant in the presence of large-scale online social networks.

AAAI Conference 2017 Short Paper

Natural Language Person Retrieval

  • Tao Zhou
  • Jie Yu

Following the recent progress in image classification and image captioning using deep learning, we developed a novel person retrieval system using natural language, which to our knowledge is first of its kind. Our system employs a state-of-the-art deep learning based natural language object retrieval framework to detect and retrieve people in images. Quantitative experimental results show significant improvement over state-of-the-art meth- ods for generic object retrieval. This line of research provides great advantages for searching large amounts of video surveil- lance footage and it can also be utilized in other domains, such as human-robot interaction.