Arrow Research search

Author name cluster

Nan Du

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers
2 author rows

Possible papers

22

AAAI Conference 2025 Conference Paper

Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs

  • Xiaqiang Tang
  • Jian Li
  • Nan Du
  • Sihong Xie

Despite the superior performance of Large language models on many NLP tasks, they still face significant limitations in memorizing extensive world knowledge. Recent studies have demonstrated that leveraging the Retrieval-Augmented Generation (RAG) framework, combined with Knowledge Graphs that encapsulate extensive factual data in a structured format, robustly enhances the reasoning capabilities of LLMs. However, deploying such systems in real-world scenarios presents challenges: the continuous evolution of non-stationary environments may lead to performance degradation and user satisfaction requires a careful balance of performance and responsiveness. To address these challenges, we introduce a Multi-objective Multi-Armed Bandit enhanced RAG framework, supported by multiple retrieval methods with diverse capabilities under rich and evolving retrieval contexts in practice. Within this framework, each retrieval method is treated as a distinct "arm''. The system utilizes real-time user feedback to adapt to dynamic environments, by selecting the appropriate retrieval method based on input queries and the historical multi-objective performance of each arm. Extensive experiments conducted on two benchmark KGQA datasets demonstrate that our method significantly outperforms baseline methods in non-stationary settings while achieving state-of-the-art performance in station environments.

AIIM Journal 2024 Journal Article

FIT-graph: A multi-grained evolutionary graph based framework for disease diagnosis

  • Zizhu Liu
  • Qing Cao
  • Nan Du
  • Huizhen Shu
  • Erheng Zhong
  • Nan Jiang
  • Qiaoran Chen
  • Ying Shen

Early assessment, with the help of machine learning methods, can aid clinicians in optimizing the diagnosis and treatment process, allowing patients to receive critical treatment time. Due to the advantages of effective information organization and interpretable reasoning, knowledge graph-based methods have become one of the most widely used machine learning algorithms for this task. However, due to a lack of effective organization and use of multi-granularity and temporal information, current knowledge graph-based approaches are hard to fully and comprehensively exploit the information contained in medical records, restricting their capacity to make superior quality diagnoses. To address these challenges, we examine and study disease diagnosis applications in-depth, and propose a novel disease diagnosis framework named FIT-Graph. With novel medical multi-grained evolutionary graphs, FIT-Graph efficiently organizes the extracted information from various granularities and time stages, maximizing the retention of valuable information for disease inference and ensuring the comprehensiveness and validity of the final disease inference. We compare FIT-Graph with two real-world clinical datasets from cardiology and respiratory departments with the baseline. The experimental results show that its effect is better than the baseline model, and the baseline performance of the task is improved by about 5% in multiple indices.

NeurIPS Conference 2024 Conference Paper

Self-playing Adversarial Language Game Enhances LLM Reasoning

  • Pengyu Cheng
  • Yong Dai
  • Tianhao Hu
  • Han Xu
  • Zhisong Zhang
  • Lei Han
  • Nan Du
  • Xiaolong Li

We explore the potential of self-play training for large language models (LLMs) in a two-player adversarial language game called Adversarial Taboo. In this game, an attacker and a defender communicate around a target word only visible to the attacker. The attacker aims to induce the defender to speak the target word unconsciously, while the defender tries to infer the target word from the attacker's utterances. To win the game, both players must have sufficient knowledge about the target word and high-level reasoning ability to infer and express in this information-reserved conversation. Hence, we are curious about whether LLMs' reasoning ability can be further enhanced by Self-Playing this Adversarial language Game (SPAG). With this goal, we select several open-source LLMs and let each act as the attacker and play with a copy of itself as the defender on an extensive range of target words. Through reinforcement learning on the game outcomes, we observe that the LLMs' performances uniformly improve on a broad range of reasoning benchmarks. Furthermore, iteratively adopting this self-play process can continuously promote LLMs' reasoning abilities. The code is available at https: //github. com/Linear95/SPAG.

NeurIPS Conference 2023 Conference Paper

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

  • Tao Lei
  • Junwen Bai
  • Siddhartha Brahma
  • Joshua Ainslie
  • Kenton Lee
  • Yanqi Zhou
  • Nan Du
  • Vincent Zhao

We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency. CoDA generalizes beyond standard adapter approaches to enable a new way of balancing speed and accuracy using conditional computation. Starting with an existing dense pretrained model, CoDA adds sparse activation together with a small number of new parameters and a light-weight training phase. Our experiments demonstrate that the CoDA approach provides an unexpectedly efficient way to transfer knowledge. Across a variety of language, vision, and speech tasks, CoDA achieves a 2x to 8x inference speed-up compared to the state-of-the-art Adapter approaches with moderate to no accuracy loss and the same parameter efficiency.

NeurIPS Conference 2023 Conference Paper

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

  • Sang Michael Xie
  • Hieu Pham
  • Xuanyi Dong
  • Nan Du
  • Hanxiao Liu
  • Yifeng Lu
  • Percy S. Liang
  • Quoc V Le

The mixture proportions of pretraining data domains (e. g. , Wikipedia, books, web text) greatly affect language model (LM) performance. In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of downstream tasks. We then resample a dataset with these domain weights and train a larger, full-sized model. In our experiments, we use DoReMi on a 280M-parameter proxy model to set the domain weights for training an 8B-parameter model (30x larger) more efficiently. On The Pile, DoReMi improves perplexity across all domains, even when it downweights a domain. DoReMi improves average few-shot downstream accuracy by 6. 5% points over a baseline model trained using The Pile's default domain weights and reaches the baseline accuracy with 2. 6x fewer training steps. On the GLaM dataset, DoReMi, which has no knowledge of downstream tasks, even matches the performance of using domain weights tuned on downstream tasks.

JMLR Journal 2023 Journal Article

PaLM: Scaling Language Modeling with Pathways

  • Aakanksha Chowdhery
  • Sharan Narang
  • Jacob Devlin
  • Maarten Bosma
  • Gaurav Mishra
  • Adam Roberts
  • Paul Barham
  • Hyung Won Chung

Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM). We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned state-of-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2022 Conference Paper

Mixture-of-Experts with Expert Choice Routing

  • Yanqi Zhou
  • Tao Lei
  • Hanxiao Liu
  • Nan Du
  • Yanping Huang
  • Vincent Zhao
  • Andrew M. Dai
  • Zhifeng Chen

Sparsely-activated Mixture-of-experts (MoE) models allow the number of parameters to greatly increase while keeping the amount of computation for a given token or a given sample unchanged. However, a poor expert routing strategy (e. g. one resulting in load imbalance) can cause certain experts to be under-trained, leading to an expert being under or over-specialized. Prior work allocates a fixed number of experts to each token using a top-k function regardless of the relative importance of different tokens. To address this, we propose a heterogeneous mixture-of-experts employing an expert choice method. Instead of letting tokens select the top-k experts, we have experts selecting the top-k tokens. As a result, each token can be routed to a variable number of experts and each expert can have a fixed bucket size. We systematically study pre-training speedups using the same computational resources of the Switch Transformer top-1 and GShard top-2 gating of prior work and find that our method improves training convergence time by more than 2×. For the same computational cost, our method demonstrates higher performance in fine-tuning 11 selected tasks in the GLUE and SuperGLUE benchmarks. For a smaller activation cost, our method outperforms the T5 dense model in 7 out of the 11 tasks.

AIIM Journal 2021 Journal Article

Practical fine-grained learning based anomaly classification for ECG image

  • Qing Cao
  • Nan Du
  • Li Yu
  • Ming Zuo
  • Jingsheng Lin
  • Nathan Liu
  • Erheng Zhong
  • Zizhu Liu

As a widely used vital sign within cardiology, Electrocardiography (ECG) provides the basis for assessing heart function and diagnosing cardiovascular diseases. Automated anomaly detection for ECG plays an important role in improving patient diagnosis efficiency and reducing healthcare costs. Practically, due to the limits of electronics support or the medical system setting, image is a more common format for large-scale ECG storage in most clinical institutions. To guarantee an automated ECG detection model's scalability and practicality in clinical applications, taking good advantage of ECG images is crucial. However, existing time digital-based discriminative models fail to learn from images effectively for two reasons. First of all, the signals recorded on images have much lower resolution and higher noise, which makes it impractical to extract precise ECG signals following existing techniques. Meanwhile, the differences between abnormal signals are usually subtle, and they may be overwhelmed by the noises in the images as well. Towards this end, we design a novel neural framework that can be directly applied to massive ECG images determining various types of cardiology abnormalities. It classifies fine-grained ECG images based on weakly supervised strategy, in which case only image-level labeling is required. By eliminating the need for part annotations, the proposed method can result in significant savings in annotation time and cost. The effectiveness of the method is demonstrated by experimental results on two real ECG datasets.

IJCAI Conference 2020 Conference Paper

Entity Synonym Discovery via Multipiece Bilateral Context Matching

  • Chenwei Zhang
  • Yaliang Li
  • Nan Du
  • Wei Fan
  • Philip S. Yu

Being able to automatically discover synonymous entities in an open-world setting benefits various tasks such as entity disambiguation or knowledge graph canonicalization. Existing works either only utilize entity features, or rely on structured annotations from a single piece of context where the entity is mentioned. To leverage diverse contexts where entities are mentioned, in this paper, we generalize the distributional hypothesis to a multi-context setting and propose a synonym discovery framework that detects entity synonyms from free-text corpora with considerations on effectiveness and robustness. As one of the key components in synonym discovery, we introduce a neural network model SynonymNet to determine whether or not two given entities are synonym with each other. Instead of using entities features, SynonymNet makes use of multiple pieces of contexts in which the entity is mentioned, and compares the context-level similarity via a bilateral matching schema. Experimental results demonstrate that the proposed model is able to detect synonym sets that are not observed during training on both generic and domain-specific datasets: Wiki+Freebase, PubMed+UMLS, and MedBook+MKG, with up to 4. 16% improvement in terms of Area Under the Curve and 3. 19% in terms of Mean Average Precision compared to the best baseline method.

NeurIPS Conference 2020 Conference Paper

Learning to Select Best Forecast Tasks for Clinical Outcome Prediction

  • Yuan Xue
  • Nan Du
  • Anne Mottram
  • Martin Seneviratne
  • Andrew M. Dai

The paradigm of pretraining' from a set of relevant auxiliary tasks and then finetuning' on a target task has been successfully applied in many different domains. However, when the auxiliary tasks are abundant, with complex relationships to the target task, using domain knowledge or searching over all possible pretraining setups are inefficient strategies. To address this challenge, we propose a method to automatically select from a large set of auxiliary tasks which yield a representation most useful to the target task. In particular, we develop an efficient algorithm that uses automatic auxiliary task selection within a nested-loop meta-learning process. We have applied this algorithm to the task of clinical outcome predictions in electronic medical records, learning from a large number of self-supervised tasks related to forecasting patient trajectories. Experiments on a real clinical dataset demonstrate the superior predictive performance of our method compared to direct supervised learning, naive pretraining and multitask learning, in particular in low-data scenarios when the primary task has very few examples. With detailed ablation analysis, we further show that the selection rules are interpretable and able to generalize to unseen target tasks with new data.

AAAI Conference 2020 Conference Paper

On the Generation of Medical Question-Answer Pairs

  • Sheng Shen
  • Yaliang Li
  • Nan Du
  • Xian Wu
  • Yusheng Xie
  • Shen Ge
  • Tao Yang
  • Kai Wang

Question answering (QA) has achieved promising progress recently. However, answering a question in real-world scenarios like the medical domain is still challenging, due to the requirement of external knowledge and the insufficient quantity of high-quality training data. In the light of these challenges, we study the task of generating medical QA pairs in this paper. With the insight that each medical question can be considered as a sample from the latent distribution of questions given answers, we propose an automated medical QA pair generation framework, consisting of an unsupervised key phrase detector that explores unstructured material for validity, and a generator that involves a multi-pass decoder to integrate structural knowledge for diversity. A series of experiments have been conducted on a real-world dataset collected from the National Medical Licensing Examination of China. Both automatic evaluation and human annotation demonstrate the effectiveness of the proposed method. Further investigation shows that, by incorporating the generated QA pairs for training, significant improvement in terms of accuracy can be achieved for the examination QA system. 1

AAAI Conference 2019 Conference Paper

Multi-Task Learning with Multi-View Attention for Answer Selection and Knowledge Base Question Answering

  • Yang Deng
  • Yuexiang Xie
  • Yaliang Li
  • Min Yang
  • Nan Du
  • Wei Fan
  • Kai Lei
  • Ying Shen

Answer selection and knowledge base question answering (KBQA) are two important tasks of question answering (QA) systems. Existing methods solve these two tasks separately, which requires large number of repetitive work and neglects the rich correlation information between tasks. In this paper, we tackle answer selection and KBQA tasks simultaneously via multi-task learning (MTL), motivated by the following motivations. First, both answer selection and KBQA can be regarded as a ranking problem, with one at text-level while the other at knowledge-level. Second, these two tasks can benefit each other: answer selection can incorporate the external knowledge from knowledge base (KB), while KBQA can be improved by learning contextual information from answer selection. To fulfill the goal of jointly learning these two tasks, we propose a novel multi-task learning scheme that utilizes multi-view attention learned from various perspectives to enable these tasks to interact with each other as well as learn more comprehensive sentence representations. The experiments conducted on several real-world datasets demonstrate the effectiveness of the proposed method, and the performance of answer selection and KBQA is improved. Also, the multi-view attention scheme is proved to be effective in assembling attentive information from different representational perspectives.

NeurIPS Conference 2018 Conference Paper

Learning Temporal Point Processes via Reinforcement Learning

  • Shuang Li
  • Shuai Xiao
  • Shixiang Zhu
  • Nan Du
  • Yao Xie
  • Le Song

Social goods, such as healthcare, smart city, and information networks, often produce ordered event data in continuous time. The generative processes of these event data can be very complex, requiring flexible models to capture their dynamics. Temporal point processes offer an elegant framework for modeling event data without discretizing the time. However, the existing maximum-likelihood-estimation (MLE) learning paradigm requires hand-crafting the intensity function beforehand and cannot directly monitor the goodness-of-fit of the estimated model in the process of training. To alleviate the risk of model-misspecification in MLE, we propose to generate samples from the generative model and monitor the quality of the samples in the process of training until the samples and the real data are indistinguishable. We take inspiration from reinforcement learning (RL) and treat the generation of each event as the action taken by a stochastic policy. We parameterize the policy as a flexible recurrent neural network and gradually improve the policy to mimic the observed event distribution. Since the reward function is unknown in this setting, we uncover an analytic and nonparametric form of the reward function using an inverse reinforcement learning formulation. This new RL framework allows us to derive an efficient policy gradient algorithm for learning flexible point process models, and we show that it performs well in both synthetic and real data.

JMLR Journal 2017 Journal Article

Scalable Influence Maximization for Multiple Products in Continuous-Time Diffusion Networks

  • Nan Du
  • Yingyu Liang
  • Maria-Florina Balcan
  • Manuel Gomez-Rodriguez
  • Hongyuan Zha
  • Le Song

A typical viral marketing model identifies influential users in a social network to maximize a single product adoption assuming unlimited user attention, campaign budgets, and time. In reality, multiple products need campaigns, users have limited attention, convincing users incurs costs, and advertisers have limited budgets and expect the adoptions to be maximized soon. Facing these user, monetary, and timing constraints, we formulate the problem as a submodular maximization task in a continuous-time diffusion model under the intersection of one matroid and multiple knapsack constraints. We propose a randomized algorithm estimating the user influence (Partial results in the paper on influence estimation have been published in a conference paper: Nan Du, Le Song, Manuel Gomez-Rodriguez, and Hongyuan Zha. Scalable influence estimation in continuous time diffusion networks. In Advances in Neural Information Processing Systems 26, 2013.) in a network ($|\mathcal{V}|$ nodes, $|\mathcal{E}|$ edges) to an accuracy of $\epsilon$ with $n=\mathcal{O}(1/\epsilon^2)$ randomizations and $\tilde{\mathcal{O}}(n|\mathcal{E}|+n|\mathcal{V}|)$ computations. By exploiting the influence estimation algorithm as a subroutine, we develop an adaptive threshold greedy algorithm achieving an approximation factor $k_a/(2+2 k)$ of the optimal when $k_a$ out of the $k$ knapsack constraints are active. Extensive experiments on networks of millions of nodes demonstrate that the proposed algorithms achieve the state-of- the-art in terms of effectiveness and scalability. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

NeurIPS Conference 2016 Conference Paper

Coevolutionary Latent Feature Processes for Continuous-Time User-Item Interactions

  • Yichen Wang
  • Nan Du
  • Rakshit Trivedi
  • Le Song

Matching users to the right items at the right time is a fundamental task in recommendation systems. As users interact with different items over time, users' and items' feature may evolve and co-evolve over time. Traditional models based on static latent features or discretizing time into epochs can become ineffective for capturing the fine-grained temporal dynamics in the user-item interactions. We propose a coevolutionary latent feature process model that accurately captures the coevolving nature of users' and items' feature. To learn parameters, we design an efficient convex optimization algorithm with a novel low rank space sharing constraints. Extensive experiments on diverse real-world datasets demonstrate significant improvements in user behavior prediction compared to state-of-the-arts.

ICRA Conference 2016 Conference Paper

Robust control of automated manufacturing systems with assembly operations using petri nets

  • Nan Du
  • Hesuan Hu
  • Yang Liu 0003

Automated manufacturing systems (AMSs) with complex topologies have become increasingly active in the area of research. However, a prerequisite is indispensable for numerous researchers to investigate AMSs. That is resources are presumptively and arbitrarily assumed never to fail; nevertheless, this does not completely match the reality. In this paper, we consider deadlock and blocking problems in AMSs with unreliable resources and assembly operations. Our objective is to develop a robust supervisory control policy that allocates resources so that parts requiring the failed resource do not block the production of parts not necessarily requiring the failed resource in case any unreliable resource fails unexpectedly. Conventional methods are based on structure-oriented and monolithic specifications. On the contrary, our supervisory policy looks ahead to anticipate whether currently-available resources are enough in their quantity to assist a concerned token to reach its desired destination without knowing the global information. Our strategy is achieved in a local and dynamic manner. It can effectively respond to resource failures.

NeurIPS Conference 2015 Conference Paper

Time-Sensitive Recommendation From Recurrent User Activities

  • Nan Du
  • Yichen Wang
  • Niao He
  • Jimeng Sun
  • Le Song

By making personalized suggestions, a recommender system is playing a crucial role in improving the engagement of users in modern web-services. However, most recommendation algorithms do not explicitly take into account the temporal behavior and the recurrent activities of users. Two central but less explored questions are how to recommend the most desirable item \emph{at the right moment}, and how to predict \emph{the next returning time} of a user to a service. To address these questions, we propose a novel framework which connects self-exciting point processes and low-rank models to capture the recurrent temporal patterns in a large collection of user-item consumption pairs. We show that the parameters of the model can be estimated via a convex optimization, and furthermore, we develop an efficient algorithm that maintains $O(1 / \epsilon)$ convergence rate, scales up to problems with millions of user-item pairs and thousands of millions of temporal events. Compared to other state-of-the-arts in both synthetic and real datasets, our model achieves superb predictive performance in the two time-sensitive recommendation questions. Finally, we point out that our formulation can incorporate other extra context information of users, such as profile, textual and spatial features.

NeurIPS Conference 2014 Conference Paper

Learning Time-Varying Coverage Functions

  • Nan Du
  • Yingyu Liang
  • Maria-Florina Balcan
  • Le Song

Coverage functions are an important class of discrete functions that capture laws of diminishing returns. In this paper, we propose a new problem of learning time-varying coverage functions which arise naturally from applications in social network analysis, machine learning, and algorithmic game theory. We develop a novel parametrization of the time-varying coverage function by illustrating the connections with counting processes. We present an efficient algorithm to learn the parameters by maximum likelihood estimation, and provide a rigorous theoretic analysis of its sample complexity. Empirical experiments from information diffusion in social network analysis demonstrate that with few assumptions about the underlying diffusion process, our method performs significantly better than existing approaches on both synthetic and real world data.

NeurIPS Conference 2014 Conference Paper

Shaping Social Activity by Incentivizing Users

  • Mehrdad Farajtabar
  • Nan Du
  • Manuel Gomez Rodriguez
  • Isabel Valera
  • Hongyuan Zha
  • Le Song

Events in an online social network can be categorized roughly into endogenous events, where users just respond to the actions of their neighbors within the network, or exogenous events, where users take actions due to drives external to the network. How much external drive should be provided to each user, such that the network activity can be steered towards a target state? In this paper, we model social events using multivariate Hawkes processes, which can capture both endogenous and exogenous event intensities, and derive a time dependent linear relation between the intensity of exogenous events and the overall network activity. Exploiting this connection, we develop a convex optimization framework for determining the required level of external drive in order for the network to reach a desired activity level. We experimented with event data gathered from Twitter, and show that our method can steer the activity of the network more accurately than alternatives.

NeurIPS Conference 2013 Conference Paper

Scalable Influence Estimation in Continuous-Time Diffusion Networks

  • Nan Du
  • Le Song
  • Manuel Gomez Rodriguez
  • Hongyuan Zha

If a piece of information is released from a media site, can it spread, in 1 month, to a million web pages? This influence estimation problem is very challenging since both the time-sensitive nature of the problem and the issue of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. Our algorithm can estimate the influence of every node in a network with $|\Vcal|$ nodes and $|\Ecal|$ edges to an accuracy of $\epsilon$ using $n=O(1/\epsilon^2)$ randomizations and up to logarithmic factors $O(n|\Ecal|+n|\Vcal|)$ computations. When used as a subroutine in a greedy influence maximization algorithm, our proposed method is guaranteed to find a set of nodes with an influence of at least $(1 - 1/e)\operatorname{OPT} - 2\epsilon$, where $\operatorname{OPT}$ is the optimal value. Experiments on both synthetic and real-world data show that the proposed method can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence.

NeurIPS Conference 2012 Conference Paper

Learning Networks of Heterogeneous Influence

  • Nan Du
  • Le Song
  • Ming Yuan
  • Alex Smola

Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting events in the future. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we attempt to address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence among different entities in a network are heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernel-based method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the influence functions between networked entities.

NeurIPS Conference 2012 Conference Paper

Waveform Driven Plasticity in BiFeO3 Memristive Devices: Model and Implementation

  • Christian Mayr
  • Paul Stärke
  • Johannes Partzsch
  • Love Cederstroem
  • Rene Schüffny
  • Yao Shuai
  • Nan Du
  • Heidemarie Schmidt

Memristive devices have recently been proposed as efficient implementations of plastic synapses in neuromorphic systems. The plasticity in these memristive devices, i. e. their resistance change, is defined by the applied waveforms. This behavior resembles biological synapses, whose plasticity is also triggered by mechanisms that are determined by local waveforms. However, learning in memristive devices has so far been approached mostly on a pragmatic technological level. The focus seems to be on finding any waveform that achieves spike-timing-dependent plasticity (STDP), without regard to the biological veracity of said waveforms or to further important forms of plasticity. Bridging this gap, we make use of a plasticity model driven by neuron waveforms that explains a large number of experimental observations and adapt it to the characteristics of the recently introduced BiFeO$_3$ memristive material. Based on this approach, we show STDP for the first time for this material, with learning window replication superior to previous memristor-based STDP implementations. We also demonstrate in measurements that it is possible to overlay short and long term plasticity at a memristive device in the form of the well-known triplet plasticity. To the best of our knowledge, this is the first implementations of triplet plasticity on any physical memristive device.