Arrow Research search

Author name cluster

Luca Benini

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers
2 author rows

Possible papers

22

TAAS Journal 2026 Journal Article

Modeling and Controlling Many-Core HPC Processors: An Alternative to PID and Moving Average Algorithms

  • Giovanni Bambini
  • Alessandro Ottaviano
  • Christian Conficoni
  • Andrea Tilli
  • Luca Benini
  • Andrea Bartolini

The race toward performance increase and computing power has led to chips with heterogeneous and complex designs, integrating an ever-growing number of cores on the same monolithic chip or chiplet silicon die. Higher integration density, compounded with the slowdown of technology-driven power reduction, implies that power and thermal management become increasingly relevant. Unfortunately, existing research lacks a detailed analysis and modeling of thermal, power, and electrical coupling effects and how they have to be jointly considered to perform dynamic control of complex and heterogeneous Multi-Processor System on Chips (MPSoCs). To close the gap, in this work, we first provide a detailed thermal and power model targeting a modern High Performance Computing (HPC) MPSoC. We consider real-world coupling effects such as actuators’ non-idealities and the exponential relation between the dissipated power, the temperature state, and the voltage level in a single processing element. We analyze how these factors affect the control algorithm behavior and the type of challenges that they pose. Based on the analysis, we propose a thermal capping strategy inspired by Fuzzy control theory to replace the state-of-the-art PID controller, as well as a root-finding iterative method to optimally choose the shared voltage value among cores grouped in the same voltage domain. We evaluate the proposed controller with model-in-the-loop and hardware-in-the-loop co-simulations. We show an improvement over state-of-the-art methods of up to \(5\times\) the maximum exceeded temperature while providing an average of \(3.56\%\) faster application execution runtime across all the evaluation scenarios.

NeurIPS Conference 2025 Conference Paper

CamSAM2: Segment Anything Accurately in Camouflaged Videos

  • Yuli Zhou
  • Yawei Li
  • Yuqian Fu
  • Luca Benini
  • Ender Konukoglu
  • Guolei Sun

Video camouflaged object segmentation (VCOS), aiming at segmenting camouflaged objects that seamlessly blend into their environment, is a fundamental vision task with various real-world applications. With the release of SAM2, video segmentation has witnessed significant progress. However, SAM2's capability of segmenting camouflaged videos is suboptimal, especially when given simple prompts such as point and box. To address the problem, we propose Camouflaged SAM2 (CamSAM2), which enhances SAM2's ability to handle camouflaged scenes without modifying SAM2's parameters. Specifically, we introduce a decamouflaged token to provide the flexibility of feature adjustment for VCOS. To make full use of fine-grained and high-resolution features from the current frame and previous frames, we propose implicit object-aware fusion (IOF) and explicit object-aware fusion (EOF) modules, respectively. Object prototype generation (OPG) is introduced to abstract and memorize object prototypes with informative details using high-quality features from previous frames. Extensive experiments are conducted to validate the effectiveness of our approach. While CamSAM2 only adds negligible learnable parameters to SAM2, it substantially outperforms SAM2 on three VCOS datasets, especially achieving 12. 2 mDice gains with click prompt on MoCA-Mask and 19. 6 mDice gains with mask prompt on SUN-SEG-Hard, with Hiera-T as the backbone. The code is available at https: //github. com/zhoustan/CamSAM2.

NAI Journal 2025 Journal Article

Factorizers for distributed sparse block codes

  • Michael Hersche
  • Aleksandar Terzić
  • Geethan Karunaratne
  • Jovin Langenegger
  • Angéline Pouget
  • Giovanni Cherubini
  • Luca Benini
  • Abu Sebastian

Distributed sparse block codes (SBCs) exhibit compact representations for encoding and manipulating symbolic data structures using fixed-width vectors. One major challenge however is to disentangle, or factorize, the distributed representation of data structures into their constituent elements without having to search through all possible combinations. This factorization becomes more challenging when SBCs vectors are noisy due to perceptual uncertainty and approximations made by modern neural networks to generate the query SBCs vectors. To address these challenges, we first propose a fast and highly accurate method for factorizing a more flexible and hence generalized form of SBCs, dubbed GSBCs. Our iterative factorizer introduces a threshold-based nonlinear activation, conditional random sampling, and an ℓ ∞ -based similarity metric. Its random sampling mechanism, in combination with the search in superposition, allows us to analytically determine the expected number of decoding iterations, which matches the empirical observations up to the GSBC’s bundling capacity. Secondly, the proposed factorizer maintains a high accuracy when queried by noisy product vectors generated using deep convolutional neural networks (CNNs). This facilitates its application in replacing the large fully connected layer (FCL) in CNNs, whereby C trainable class vectors, or attribute combinations, can be implicitly represented by our factorizer having F -factor codebooks, each with C F fixed codevectors. We provide a methodology to flexibly integrate our factorizer in the classification layer of CNNs with a novel loss function. With this integration, the convolutional layers can generate a noisy product vector that our factorizer can still decode, whereby the decoded factors can have different interpretations based on downstream tasks. We demonstrate the feasibility of our method on four deep CNN architectures over CIFAR-100, ImageNet-1K, and RAVEN datasets. In all use cases, the number of parameters and operations are notably reduced compared to the FCL.

ICML Conference 2025 Conference Paper

IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models

  • Hang Guo 0002
  • Yawei Li 0001
  • Tao Dai 0001
  • Shu-Tao Xia
  • Luca Benini

Fine-tuning pre-trained diffusion models under limited budgets has gained great success. In particular, the recent advances that directly fine-tune the quantized weights using Low-rank Adaptation (LoRA) further reduces training costs. Despite these progress, we point out that existing adaptation recipes are not inference-efficient. Specifically, additional post-training quantization (PTQ) on tuned weights is needed during deployment, which results in noticeable performance drop when the bit-width is low. Based on this observation, we introduce IntLoRA, which adapts quantized diffusion models with integer-type low-rank parameters, to include inference efficiency during tuning. Specifically, IntLoRA enables pre-trained weights to remain quantized during training, facilitating fine-tuning on consumer-level GPUs. During inference, IntLoRA weights can be seamlessly merged into pre-trained weights to directly obtain quantized downstream weights without PTQ. Extensive experiments show our IntLoRA achieves significant speedup on both training and inference without losing performance.

NeurIPS Conference 2025 Conference Paper

LUNA: Efficient and Topology-Agnostic Foundation Model for EEG Signal Analysis

  • Berkay Döner
  • Thorir Mar Ingolfsson
  • Luca Benini
  • Yawei Li

Electroencephalography (EEG) offers a non-invasive lens into human brain activity, but building large‐scale models is hampered by $\textit{topological heterogeneity}$: each public corpus defines its own electrode layout, limiting generalization. We introduce $\textbf{LUNA}$ ($\textbf{L}$atent $\textbf{U}$nified $\textbf{N}$etwork $\textbf{A}$rchitecture), a self-supervised foundation model that reconciles disparate electrode geometries while scaling linearly---not quadratically---with channel count. LUNA compresses multi-channel EEG into a fixed-size, topology-agnostic latent space via learned queries and cross-attention. Downstream transformer blocks then operate exclusively on this latent representation using patch-wise temporal self-attention, decoupling computation from electrode count. Pre-trained on TUEG and Siena ($\>$21, 000 h raw EEG across diverse montages) using a masked-patch reconstruction objective, LUNA transfers effectively to four downstream tasks: abnormality detection, artifact rejection, slowing classification, and emotion recognition. It demonstrates highly competitive performance across several benchmarks, achieving state-of-the-art results on TUAR and TUSL, e. g. , $\textbf{0. 921 AUROC}$ on TUAR, while reducing FLOPs by $\textbf{300}$$\times$ and trimming GPU memory use by up to $\textbf{10}$$\times$. Critically, these gains are consistent across all evaluated electrode configurations. Code is available at https: //github. com/pulp-bio/biofoundation

ECAI Conference 2025 Conference Paper

One Shot vs. Iterative: Rethinking Pruning Strategies for Model Compression

  • Mikolaj Janusz
  • Tomasz Wojnar
  • Yawei Li 0001
  • Luca Benini
  • Kamil Adamczewski

Pruning is a core technique for compressing neural networks to improve computational efficiency. This process is typically approached in two ways: one-shot pruning, which involves a single pass of training and pruning, and iterative pruning, where pruning is performed over multiple cycles for potentially finer network refinement. Although iterative pruning has historically seen broader adoption, this preference is often assumed rather than rigorously tested. Our study presents one of the first systematic and comprehensive comparisons of these methods, providing rigorous definitions, benchmarking both across structured and unstructured settings, and applying different pruning criteria and modalities. We find that each method has specific advantages: one-shot pruning proves more effective at lower pruning ratios, while iterative pruning performs better at higher ratios. Building on these findings, we advocate for patience-based pruning and introduce a hybrid approach that can outperform traditional methods in certain scenarios, providing valuable insights for practitioners selecting a pruning strategy tailored to their goals and constraints. Source code is available at https: //github. com/janumiko/pruning-benchmark.

NeurIPS Conference 2025 Conference Paper

PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation

  • Yanlong Chen
  • Mattia Orlandi
  • Pierangelo Rapa
  • Simone Benatti
  • Luca Benini
  • Yawei Li

Physiological signals are often corrupted by motion artifacts, baseline drift, and other low-SNR disturbances, posing significant challenges for analysis. Additionally, these signals exhibit strong non-stationarity, with sharp peaks and abrupt changes that evolve continuously, making them difficult to represent using traditional time-domain or filtering methods. To address these issues, a novel wavelet-based approach for physiological signal analysis is presented, aimed at capturing multi-scale time-frequency features across various physiological signals. Leveraging this technique, two large-scale pretrained models specific to EMG and ECG are introduced for the first time, achieving superior performance and setting new baselines in downstream tasks. Additionally, a unified multi-modal framework is constructed by integrating a pretrained EEG model, where each modality is guided through its dedicated branch and fused via learnable weighted fusion. This design effectively addresses challenges such as low signal-to-noise ratio, high inter-subject variability, and device mismatch, outperforming existing methods on multi-modal tasks. The proposed wavelet-based architecture lays a solid foundation for the analysis of diverse physiological signals, while the multi-modal design points to next-generation physiological signal processing with potential impacts on wearable health monitoring, clinical diagnostics, and broader biomedical applications. Code and data are available at: github. com/ForeverBlue816/PhysioWave

ICML Conference 2025 Conference Paper

SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models

  • Wei Huang 0042
  • Haotong Qin
  • Yangdong Liu
  • Yawei Li 0001
  • Qinshuo Liu
  • Xianglong Liu 0001
  • Luca Benini
  • Michele Magno

Post-training quantization (PTQ) is an effective technique for compressing large language models (LLMs). However, while uniform-precision quantization is computationally efficient, it often compromises model performance. To address this, we propose SliM-LLM, a salience-driven mixed-precision quantization framework that allocates bit-widths at the group-wise with high accuracy. Our approach leverages the observation that important weights follow a structured distribution and introduces two key components: 1) Salience-Determined Bit Allocation adaptively assigns bit-widths to groups within each layer based on their salience; and 2) Salience-Weighted Quantizer Calibration optimizes quantizer parameters by incorporating element-level salience, retain essential information. With its structured group-wise partitioning, SliM-LLM provides a hardware-friendly solution that matches the efficiency of uniform quantization methods while significantly improving accuracy. Experiments show that SliM-LLM achieves superior performance across various LLMs at low bit-widths. For example, a 2-bit quantized LLaMA-7B model reduces memory usage by nearly 6x compared to the floating-point baseline, decreases perplexity by 48% compared to state-of-the-art gradient-free PTQ methods, and maintains GPU inference speed. Additionally, the extended version, SliM-LLM+, which incorporates gradient-based quantization, further reduces perplexity by 35. 1%. Our code is available at https: //github. com/Aaronhuang-778/SliM-LLM.

ICRA Conference 2024 Conference Paper

Fully Onboard Low-Power Localization with Semantic Sensor Fusion on a Nano-UAV using Floor Plans

  • Nicky Zimmerman
  • Hanna Müller
  • Michele Magno
  • Luca Benini

Nano-sized unmanned aerial vehicles (UAVs) are well-fit for indoor applications and for close proximity to humans. To enable autonomy, the nano-UAV must be able to self-localize in its operating environment. This is a particularly-challenging task due to the limited sensing and compute resources on board. This work presents an online and onboard approach for localization in floor plans annotated with semantic information. Unlike sensor-based maps, floor plans are readily-available, and do not increase the cost and time of deployment. To overcome the difficulty of localizing in sparse maps, the proposed approach fuses geometric information from miniaturized time-of-flight sensors and semantic cues. The semantic information is extracted from images by deploying a state-of-the-art object detection model on a high-performance multi-core microcontroller onboard the drone, consuming only 2. 5mJ per frame and executing in 38ms. In our evaluation, we globally localize in a real-world office environment, achieving 90% success rate. We also release an open-source implementation of our work 1.

IROS Conference 2023 Conference Paper

A Relative Infrastructure-less Localization Algorithm for Decentralized and Autonomous Swarm Formation

  • Dominik Schindler
  • Vlad Niculescu
  • Tommaso Polonelli
  • Daniele Palossi
  • Luca Benini
  • Michele Magno

Decentralized and autonomous control of Unmanned Aerial Vehicle (UAV) swarms is a key enabler for cooperative systems and infrastructure-less formation flights. However, UAVs often lack reliable heading angle measurements, especially in indoor scenarios, space, and GNSS-denied environments, posing an additional observability challenge on range-based relative localization. We tackle this problem by proposing a novel solution enhancing the classical tag-and-anchor trilateration. The proposed solution relies on Ultra-wideband range measurements and addresses the relative pose estimation between pairs of UAVs under relative motion. Furthermore, it does not require any explicit motion pattern or initialization procedure and leverages an approximate maximum-likelihood algorithm to recursively solve the relative localization problem with constant computational complexity. The method has been implemented and demonstrated through field experiments, where a swarm of nano-UAVs positioned themselves with respect to a leader in a nearly-static formation with an average error of 38. 5 cm and a convergence time of 25 s. The achieved formation accuracy is similar to the one achieved by the state-of-the-art EKF-based leader-follower methods.

ICRA Conference 2023 Conference Paper

Deep Neural Network Architecture Search for Accurate Visual Pose Estimation aboard Nano-UAVs

  • Elia Cereda
  • Luca Crupi
  • Matteo Risso
  • Alessio Burrello
  • Luca Benini
  • Alessandro Giusti
  • Daniele Jahier Pagliari
  • Daniele Palossi

Miniaturized autonomous unmanned aerial vehicles (UAVs) are an emerging and trending topic. With their form factor as big as the palm of one hand, they can reach spots otherwise inaccessible to bigger robots and safely operate in human surroundings. The simple electronics aboard such robots (sub-100 mW) make them particularly cheap and attractive but pose significant challenges in enabling onboard sophisticated intelligence. In this work, we leverage a novel neural architecture search (NAS) technique to automatically identify several Pareto-optimal convolutional neural networks (CNNs) for a visual pose estimation task. Our work demonstrates how reallife and field-tested robotics applications can concretely leverage NAS technologies to automatically and efficiently optimize CNNs for the specific hardware constraints of small UAVs. We deploy several NAS-optimized CNNs and run them in closed-loop aboard a 27-g Crazyflie nano-UAV equipped with a parallel ultra-low power System-on-Chip. Our results improve the State-of-the-Art by reducing the in-field control error of 32% while achieving a real-time onboard inference-rate of ~10Hz@10mW and ~50Hz@90mW.

IROS Conference 2023 Conference Paper

LocalViT: Analyzing Locality in Vision Transformers

  • Yawei Li 0001
  • Kai Zhang 0008
  • Jiezhang Cao
  • Radu Timofte
  • Michele Magno
  • Luca Benini
  • Luc Van Gool

The aim of this paper is to study the influence of locality mechanisms in vision transformers. Transformers originated from machine translation and are particularly good at modelling long-range dependencies within a long sequence. Although the global interaction between the token embeddings could be well modelled by the self-attention mechanism of transformers, what is lacking is a locality mechanism for infor-mation exchange within a local region. In this paper, locality mechanism is systematically investigated by carefully designed controlled experiments. We add locality to vision transformers into the feed-forward network. This seemingly simple solution is inspired by the comparison between feed-forward networks and inverted residual blocks. The importance of locality mechanisms is validated in two ways: 1) A wide range of design choices (activation function, layer placement, expansion ratio) are available for incorporating locality mechanisms and proper choices can lead to a performance gain over the baseline, and 2) The same locality mechanism is successfully applied to vision transformers with different architecture designs, which shows the generalization of the locality concept. For ImageNet2012 classification, the locality-enhanced transformers outperform the baselines Swin-T [1], DeiT-T [2] and PVT-T [3] by 1. 0%, 2. 6 % and 3. 1 % with a negligible increase in the number of parameters and computational effort. Code is available at https://github.com/ofsoundof/LocalViT.

NeurIPS Conference 2023 Conference Paper

MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition

  • Nicolas Menet
  • Michael Hersche
  • Geethan Karunaratne
  • Luca Benini
  • Abu Sebastian
  • Abbas Rahimi

With the advent of deep learning, progressively larger neural networks have been designed to solve complex tasks. We take advantage of these capacity-rich models to lower the cost of inference by exploiting computation in superposition. To reduce the computational burden per input, we propose Multiple-Input-Multiple-Output Neural Networks (MIMONets) capable of handling many inputs at once. MIMONets augment various deep neural network architectures with variable binding mechanisms to represent an arbitrary number of inputs in a compositional data structure via fixed-width distributed representations. Accordingly, MIMONets adapt nonlinear neural transformations to process the data structure holistically, leading to a speedup nearly proportional to the number of superposed input items in the data structure. After processing in superposition, an unbinding mechanism recovers each transformed input of interest. MIMONets also provide a dynamic trade-off between accuracy and throughput by an instantaneous on-demand switching between a set of accuracy-throughput operating points, yet within a single set of fixed parameters. We apply the concept of MIMONets to both CNN and Transformer architectures resulting in MIMOConv and MIMOFormer, respectively. Empirical evaluations show that MIMOConv achieves $\approx 2$–$4\times$ speedup at an accuracy delta within [+0. 68, -3. 18]% compared to WideResNet CNNs on CIFAR10 and CIFAR100. Similarly, MIMOFormer can handle $2$–$4$ inputs at once while maintaining a high average accuracy within a [-1. 07, -3. 43]% delta on the long range arena benchmark. Finally, we provide mathematical bounds on the interference between superposition channels in MIMOFormer. Our code is available at https: //github. com/IBM/multiple-input-multiple-output-nets.

JBHI Journal 2021 Journal Article

An Ensemble of Hyperdimensional Classifiers: Hardware-Friendly Short-Latency Seizure Detection With Automatic iEEG Electrode Selection

  • Alessio Burrello
  • Simone Benatti
  • Kaspar Schindler
  • Luca Benini
  • Abbas Rahimi

We propose a new algorithm for detecting epileptic seizures. Our algorithm first extracts three features, namely mean amplitude, line length, and local binary patterns that are fed to an ensemble of classifiers using hyperdimensional (HD) computing. These features are embedded into prototype vectors representing ictal (during seizures) and interictal (between seizures) brain states are constructed. These vectors can be computed at different spatial scales ranging from a single electrode up to many electrodes. This flexibility allows our algorithm to identify the electrodes that discriminate best between ictal and interictal brain states. We assess our algorithm on the SWEC-ETHZ iEEG dataset that includes 99 short-time iEEG seizures recorded with 36 to 100 electrodes from 16 drug-resistant epilepsy patients. Using k-fold cross-validation and all electrodes, our algorithm surpasses state-of-the-art algorithms yielding significantly shorter latency (8. 81 s vs. 11. 57 s) in seizure onset detection, and higher specificity (97. 31% vs. 94. 84%) and accuracy (96. 85% vs. 95. 42%). We can further reduce the latency of our algorithm to 3. 74 s by allowing a slightly higher percentage of false alarms (2% specificity loss). Using only the top 10% of the electrodes ranked by our algorithm, we still maintain superior latency, sensitivity, and specificity compared to the other algorithms with all the electrodes. We finally demonstrate the suitability of our algorithm to deployment on low-cost embedded hardware platforms, thanks to its robustness to noise/artifacts affecting the signal, its low computational complexity, and the small memory-footprint on a RISC-V microcontroller.

EAAI Journal 2019 Journal Article

A semisupervised autoencoder-based approach for anomaly detection in high performance computing systems

  • Andrea Borghesi
  • Andrea Bartolini
  • Michele Lombardi
  • Michela Milano
  • Luca Benini

High Performance Computing (HPC) systems are complex machines with heterogeneous components that can break or malfunction. Automated anomaly detection in these systems is a challenging and critical task, as HPC systems are expected to work 24/7. The majority of the current state-of-the-art methods dealing with this problem are Machine Learning techniques or statistical models that rely on a supervised approach, namely the detection mechanism is trained to recognize a fixed number of different states (i. e. normal and anomalous conditions). In this paper a novel semi-supervised approach for anomaly detection in supercomputers is proposed, based on a type of neural network called autoencoder. The approach learns the normal state of the supercomputer nodes and after the training phase can be used to discern anomalous conditions from normal behavior; in doing so it relies only on the availability of data characterizing only the normal state of the system. This is different from supervised methods that require data sets with many examples of anomalous states, which are in general very rare and/or hard to obtain. The approach was tested on a real-life High Performance Computing system equipped with a monitoring infrastructure capable to generate large amount of data describing the system state. The proposed approach definitely outperforms the best current techniques for semi-supervised anomaly detection, with an increase in accuracy detection of around 12%. Two different implementations are discussed: one where each supercomputer node has a specific model and one with a single, generalized model for all nodes, in order to explore the trade-off between accuracy and ease of deployment.

NeurIPS Conference 2019 Conference Paper

Constrained deep neural network architecture search for IoT devices accounting for hardware calibration

  • Florian Scheidegger
  • Luca Benini
  • Costas Bekas
  • A. Cristiano I. Malossi

Deep neural networks achieve outstanding results for challenging image classification tasks. However, the design of network topologies is a complex task, and the research community is conducting ongoing efforts to discover top-accuracy topologies, either manually or by employing expensive architecture searches. We propose a unique narrow-space architecture search that focuses on delivering low-cost and rapidly executing networks that respect strict memory and time requirements typical of Internet-of-Things (IoT) near-sensor computing platforms. Our approach provides solutions with classification latencies below 10~ms running on a low-cost device with 1~GB RAM and a peak performance of 5. 6~GFLOPS. The narrow-space search of floating-point models improves the accuracy on CIFAR10 of an established IoT model from 70. 64% to 74. 87% within the same memory constraints. We further improve the accuracy to 82. 07% by including 16-bit half types and obtain the highest accuracy of 83. 45% by extending the search with model-optimized IEEE 754 reduced types. To the best of our knowledge, this is the first empirical demonstration of more than 3000 trained models that run with reduced precision and push the Pareto optimal front by a wide margin. Within a given memory constraint, accuracy is improved by more than 7% points for half and more than 1% points for the best individual model format.

NeurIPS Conference 2017 Conference Paper

Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations

  • Eirikur Agustsson
  • Fabian Mentzer
  • Michael Tschannen
  • Lukas Cavigelli
  • Radu Timofte
  • Luca Benini
  • Luc Gool

We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy. Our method is based on a soft (continuous) relaxation of quantization and entropy, which we anneal to their discrete counterparts throughout training. We showcase this method for two challenging applications: Image compression and neural network compression. While these tasks have typically been approached with different methods, our soft-to-hard quantization approach gives results competitive with the state-of-the-art for both.

ECAI Conference 2016 Conference Paper

DARDIS: Distributed And Randomized DIspatching and Scheduling

  • Thomas Bridi
  • Michele Lombardi 0001
  • Andrea Bartolini
  • Luca Benini
  • Michela Milano

Scheduling and dispatching are critical enabling technologies in supercomputing and grid computing. In these contexts, scalability is an issue: we have to allocate and schedule up to tens of thousands of tasks on tens of thousands of resources. This problem scale is out of reach for complete and centralized scheduling approaches.

AIJ Journal 2014 Journal Article

CROSS cyclic resource-constrained scheduling solver

  • Alessio Bonfietti
  • Michele Lombardi
  • Luca Benini
  • Michela Milano

Cyclic scheduling problems consist in ordering a set of activities executed indefinitely over time in a periodic fashion, subject to precedence and resource constraints. This class of problems has many applications in manufacturing, embedded systems and compiler design, production and chemical systems. This paper proposes a Constraint Programming approach for cyclic scheduling problems, based on modular arithmetic: in particular, we introduce a modular precedence constraint and a global cumulative constraint along with their filtering algorithms. We discuss two possible formulations. The first one (referred to as CROSS) models a pure cyclic scheduling problem and makes use of both our novel constraints. The second formulation (referred to as CROSS ⁎ ) introduces a restrictive assumption to enable the use of classical resources constraints, but may incur a loss of solution quality. Many traditional approaches to cyclic scheduling operate by fixing the period value and then solving a linear problem in a generate-and-test fashion. Conversely, our technique is based on a non-linear model and tackles the problem as a whole: the period value is inferred from the scheduling decisions. Our approach has been tested on a number of non-trivial synthetic instances and on a set of realistic industrial instances. The method proved to effective in finding high quality solutions in a very short amount of time.

AAAI Conference 2012 Conference Paper

Optimization and Controlled Systems: A Case Study on Thermal Aware Workload Dispatching

  • Andrea Bartolini
  • Michele Lombardi
  • Michela Milano
  • Luca Benini

Although successfully employed on many industrial problems, Combinatorial Optimization still has limited applicability on several real-world domains, often due to modeling difficulties. This is typically the case for systems under the control of an on-line policy: even when the policy itself is well known, capturing its effect on the system in a declarative model is often impossible by conventional means. Such a difficulty is at the root of the classical, sharp separation between off-line and on-line approaches. In this paper, we investigate a general method to model controlled systems, based on the integration of Machine Learning and Constraint Programming (CP). Specifically, we use an Artificial Neural Network (ANN) to learn the behavior of a controlled system (a multicore CPU with thermal controllers) and plug it into a CP model by means of Neuron Constraints. The method obtains significantly better results compared to an approach with no ANN guidance. Neuron Constraints were first introduced in (Bartolini et al. 2011b) as a mean to model complex systems: providing evidence of their applicability to controlled systems is a significant step forward, broadening the application field of combinatorial methods and disclosing opportunities for hybrid off-line/on-line optimization.