Author name cluster

Luca Benini

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers

2 author rows

TAAS Journal 2026 Journal Article

Modeling and Controlling Many-Core HPC Processors: An Alternative to PID and Moving Average Algorithms

Giovanni Bambini
Alessandro Ottaviano
Christian Conficoni
Andrea Tilli
Luca Benini
Andrea Bartolini

The race toward performance increase and computing power has led to chips with heterogeneous and complex designs, integrating an ever-growing number of cores on the same monolithic chip or chiplet silicon die. Higher integration density, compounded with the slowdown of technology-driven power reduction, implies that power and thermal management become increasingly relevant. Unfortunately, existing research lacks a detailed analysis and modeling of thermal, power, and electrical coupling effects and how they have to be jointly considered to perform dynamic control of complex and heterogeneous Multi-Processor System on Chips (MPSoCs). To close the gap, in this work, we first provide a detailed thermal and power model targeting a modern High Performance Computing (HPC) MPSoC. We consider real-world coupling effects such as actuators’ non-idealities and the exponential relation between the dissipated power, the temperature state, and the voltage level in a single processing element. We analyze how these factors affect the control algorithm behavior and the type of challenges that they pose. Based on the analysis, we propose a thermal capping strategy inspired by Fuzzy control theory to replace the state-of-the-art PID controller, as well as a root-finding iterative method to optimally choose the shared voltage value among cores grouped in the same voltage domain. We evaluate the proposed controller with model-in-the-loop and hardware-in-the-loop co-simulations. We show an improvement over state-of-the-art methods of up to $5\times$ the maximum exceeded temperature while providing an average of $3.56\%$ faster application execution runtime across all the evaluation scenarios.

Details DOI

NeurIPS Conference 2025 Conference Paper

CamSAM2: Segment Anything Accurately in Camouflaged Videos

Yuli Zhou
Yawei Li
Yuqian Fu
Luca Benini
Ender Konukoglu
Guolei Sun

Video camouflaged object segmentation (VCOS), aiming at segmenting camouflaged objects that seamlessly blend into their environment, is a fundamental vision task with various real-world applications. With the release of SAM2, video segmentation has witnessed significant progress. However, SAM2's capability of segmenting camouflaged videos is suboptimal, especially when given simple prompts such as point and box. To address the problem, we propose Camouflaged SAM2 (CamSAM2), which enhances SAM2's ability to handle camouflaged scenes without modifying SAM2's parameters. Specifically, we introduce a decamouflaged token to provide the flexibility of feature adjustment for VCOS. To make full use of fine-grained and high-resolution features from the current frame and previous frames, we propose implicit object-aware fusion (IOF) and explicit object-aware fusion (EOF) modules, respectively. Object prototype generation (OPG) is introduced to abstract and memorize object prototypes with informative details using high-quality features from previous frames. Extensive experiments are conducted to validate the effectiveness of our approach. While CamSAM2 only adds negligible learnable parameters to SAM2, it substantially outperforms SAM2 on three VCOS datasets, especially achieving 12. 2 mDice gains with click prompt on MoCA-Mask and 19. 6 mDice gains with mask prompt on SUN-SEG-Hard, with Hiera-T as the backbone. The code is available at https: //github. com/zhoustan/CamSAM2.

PDF Details

NAI Journal 2025 Journal Article

Factorizers for distributed sparse block codes

Michael Hersche
Aleksandar Terzić
Geethan Karunaratne
Jovin Langenegger
Angéline Pouget
Giovanni Cherubini
Luca Benini
Abu Sebastian

Distributed sparse block codes (SBCs) exhibit compact representations for encoding and manipulating symbolic data structures using fixed-width vectors. One major challenge however is to disentangle, or factorize, the distributed representation of data structures into their constituent elements without having to search through all possible combinations. This factorization becomes more challenging when SBCs vectors are noisy due to perceptual uncertainty and approximations made by modern neural networks to generate the query SBCs vectors. To address these challenges, we first propose a fast and highly accurate method for factorizing a more flexible and hence generalized form of SBCs, dubbed GSBCs. Our iterative factorizer introduces a threshold-based nonlinear activation, conditional random sampling, and an ℓ ∞ -based similarity metric. Its random sampling mechanism, in combination with the search in superposition, allows us to analytically determine the expected number of decoding iterations, which matches the empirical observations up to the GSBC’s bundling capacity. Secondly, the proposed factorizer maintains a high accuracy when queried by noisy product vectors generated using deep convolutional neural networks (CNNs). This facilitates its application in replacing the large fully connected layer (FCL) in CNNs, whereby C trainable class vectors, or attribute combinations, can be implicitly represented by our factorizer having F -factor codebooks, each with C F fixed codevectors. We provide a methodology to flexibly integrate our factorizer in the classification layer of CNNs with a novel loss function. With this integration, the convolutional layers can generate a noisy product vector that our factorizer can still decode, whereby the decoded factors can have different interpretations based on downstream tasks. We demonstrate the feasibility of our method on four deep CNN architectures over CIFAR-100, ImageNet-1K, and RAVEN datasets. In all use cases, the number of parameters and operations are notably reduced compared to the FCL.

Details DOI

ICML Conference 2025 Conference Paper

IntLoRA: Integral Low-rank Adaptation of Quantized Diffusion Models

Hang Guo 0002
Yawei Li 0001
Tao Dai 0001
Shu-Tao Xia
Luca Benini

Fine-tuning pre-trained diffusion models under limited budgets has gained great success. In particular, the recent advances that directly fine-tune the quantized weights using Low-rank Adaptation (LoRA) further reduces training costs. Despite these progress, we point out that existing adaptation recipes are not inference-efficient. Specifically, additional post-training quantization (PTQ) on tuned weights is needed during deployment, which results in noticeable performance drop when the bit-width is low. Based on this observation, we introduce IntLoRA, which adapts quantized diffusion models with integer-type low-rank parameters, to include inference efficiency during tuning. Specifically, IntLoRA enables pre-trained weights to remain quantized during training, facilitating fine-tuning on consumer-level GPUs. During inference, IntLoRA weights can be seamlessly merged into pre-trained weights to directly obtain quantized downstream weights without PTQ. Extensive experiments show our IntLoRA achieves significant speedup on both training and inference without losing performance.

Details

NeurIPS Conference 2025 Conference Paper

LUNA: Efficient and Topology-Agnostic Foundation Model for EEG Signal Analysis

Berkay Döner
Thorir Mar Ingolfsson
Luca Benini
Yawei Li

Electroencephalography (EEG) offers a non-invasive lens into human brain activity, but building large‐scale models is hampered by $\textit{topological heterogeneity}$: each public corpus defines its own electrode layout, limiting generalization. We introduce $\textbf{LUNA}$ ($\textbf{L}$atent $\textbf{U}$nified $\textbf{N}$etwork $\textbf{A}$rchitecture), a self-supervised foundation model that reconciles disparate electrode geometries while scaling linearly---not quadratically---with channel count. LUNA compresses multi-channel EEG into a fixed-size, topology-agnostic latent space via learned queries and cross-attention. Downstream transformer blocks then operate exclusively on this latent representation using patch-wise temporal self-attention, decoupling computation from electrode count. Pre-trained on TUEG and Siena ($\>$21, 000 h raw EEG across diverse montages) using a masked-patch reconstruction objective, LUNA transfers effectively to four downstream tasks: abnormality detection, artifact rejection, slowing classification, and emotion recognition. It demonstrates highly competitive performance across several benchmarks, achieving state-of-the-art results on TUAR and TUSL, e. g. , $\textbf{0. 921 AUROC}$ on TUAR, while reducing FLOPs by $\textbf{300}$$\times$ and trimming GPU memory use by up to $\textbf{10}$$\times$. Critically, these gains are consistent across all evaluated electrode configurations. Code is available at https: //github. com/pulp-bio/biofoundation

PDF Details

ECAI Conference 2025 Conference Paper

One Shot vs. Iterative: Rethinking Pruning Strategies for Model Compression

Mikolaj Janusz
Tomasz Wojnar
Yawei Li 0001
Luca Benini
Kamil Adamczewski

Pruning is a core technique for compressing neural networks to improve computational efficiency. This process is typically approached in two ways: one-shot pruning, which involves a single pass of training and pruning, and iterative pruning, where pruning is performed over multiple cycles for potentially finer network refinement. Although iterative pruning has historically seen broader adoption, this preference is often assumed rather than rigorously tested. Our study presents one of the first systematic and comprehensive comparisons of these methods, providing rigorous definitions, benchmarking both across structured and unstructured settings, and applying different pruning criteria and modalities. We find that each method has specific advantages: one-shot pruning proves more effective at lower pruning ratios, while iterative pruning performs better at higher ratios. Building on these findings, we advocate for patience-based pruning and introduce a hybrid approach that can outperform traditional methods in certain scenarios, providing valuable insights for practitioners selecting a pruning strategy tailored to their goals and constraints. Source code is available at https: //github. com/janumiko/pruning-benchmark.

Details

NeurIPS Conference 2025 Conference Paper

PhysioWave: A Multi-Scale Wavelet-Transformer for Physiological Signal Representation

Yanlong Chen
Mattia Orlandi
Pierangelo Rapa
Simone Benatti
Luca Benini
Yawei Li

Physiological signals are often corrupted by motion artifacts, baseline drift, and other low-SNR disturbances, posing significant challenges for analysis. Additionally, these signals exhibit strong non-stationarity, with sharp peaks and abrupt changes that evolve continuously, making them difficult to represent using traditional time-domain or filtering methods. To address these issues, a novel wavelet-based approach for physiological signal analysis is presented, aimed at capturing multi-scale time-frequency features across various physiological signals. Leveraging this technique, two large-scale pretrained models specific to EMG and ECG are introduced for the first time, achieving superior performance and setting new baselines in downstream tasks. Additionally, a unified multi-modal framework is constructed by integrating a pretrained EEG model, where each modality is guided through its dedicated branch and fused via learnable weighted fusion. This design effectively addresses challenges such as low signal-to-noise ratio, high inter-subject variability, and device mismatch, outperforming existing methods on multi-modal tasks. The proposed wavelet-based architecture lays a solid foundation for the analysis of diverse physiological signals, while the multi-modal design points to next-generation physiological signal processing with potential impacts on wearable health monitoring, clinical diagnostics, and broader biomedical applications. Code and data are available at: github. com/ForeverBlue816/PhysioWave

PDF Details

ICML Conference 2025 Conference Paper

SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models

Wei Huang 0042
Haotong Qin
Yangdong Liu
Yawei Li 0001
Qinshuo Liu
Xianglong Liu 0001
Luca Benini
Michele Magno

Post-training quantization (PTQ) is an effective technique for compressing large language models (LLMs). However, while uniform-precision quantization is computationally efficient, it often compromises model performance. To address this, we propose SliM-LLM, a salience-driven mixed-precision quantization framework that allocates bit-widths at the group-wise with high accuracy. Our approach leverages the observation that important weights follow a structured distribution and introduces two key components: 1) Salience-Determined Bit Allocation adaptively assigns bit-widths to groups within each layer based on their salience; and 2) Salience-Weighted Quantizer Calibration optimizes quantizer parameters by incorporating element-level salience, retain essential information. With its structured group-wise partitioning, SliM-LLM provides a hardware-friendly solution that matches the efficiency of uniform quantization methods while significantly improving accuracy. Experiments show that SliM-LLM achieves superior performance across various LLMs at low bit-widths. For example, a 2-bit quantized LLaMA-7B model reduces memory usage by nearly 6x compared to the floating-point baseline, decreases perplexity by 48% compared to state-of-the-art gradient-free PTQ methods, and maintains GPU inference speed. Additionally, the extended version, SliM-LLM+, which incorporates gradient-based quantization, further reduces perplexity by 35. 1%. Our code is available at https: //github. com/Aaronhuang-778/SliM-LLM.

Details

ICRA Conference 2024 Conference Paper

Fully Onboard Low-Power Localization with Semantic Sensor Fusion on a Nano-UAV using Floor Plans

Nicky Zimmerman
Hanna Müller
Michele Magno
Luca Benini

Nano-sized unmanned aerial vehicles (UAVs) are well-fit for indoor applications and for close proximity to humans. To enable autonomy, the nano-UAV must be able to self-localize in its operating environment. This is a particularly-challenging task due to the limited sensing and compute resources on board. This work presents an online and onboard approach for localization in floor plans annotated with semantic information. Unlike sensor-based maps, floor plans are readily-available, and do not increase the cost and time of deployment. To overcome the difficulty of localizing in sparse maps, the proposed approach fuses geometric information from miniaturized time-of-flight sensors and semantic cues. The semantic information is extracted from images by deploying a state-of-the-art object detection model on a high-performance multi-core microcontroller onboard the drone, consuming only 2. 5mJ per frame and executing in 38ms. In our evaluation, we globally localize in a real-world office environment, achieving 90% success rate. We also release an open-source implementation of our work 1.

Details

IROS Conference 2023 Conference Paper

A Relative Infrastructure-less Localization Algorithm for Decentralized and Autonomous Swarm Formation

Dominik Schindler
Vlad Niculescu
Tommaso Polonelli
Daniele Palossi
Luca Benini
Michele Magno

Decentralized and autonomous control of Unmanned Aerial Vehicle (UAV) swarms is a key enabler for cooperative systems and infrastructure-less formation flights. However, UAVs often lack reliable heading angle measurements, especially in indoor scenarios, space, and GNSS-denied environments, posing an additional observability challenge on range-based relative localization. We tackle this problem by proposing a novel solution enhancing the classical tag-and-anchor trilateration. The proposed solution relies on Ultra-wideband range measurements and addresses the relative pose estimation between pairs of UAVs under relative motion. Furthermore, it does not require any explicit motion pattern or initialization procedure and leverages an approximate maximum-likelihood algorithm to recursively solve the relative localization problem with constant computational complexity. The method has been implemented and demonstrated through field experiments, where a swarm of nano-UAVs positioned themselves with respect to a leader in a nearly-static formation with an average error of 38. 5 cm and a convergence time of 25 s. The achieved formation accuracy is similar to the one achieved by the state-of-the-art EKF-based leader-follower methods.

Details

ICRA Conference 2023 Conference Paper

Deep Neural Network Architecture Search for Accurate Visual Pose Estimation aboard Nano-UAVs

Elia Cereda
Luca Crupi
Matteo Risso
Alessio Burrello
Luca Benini
Alessandro Giusti
Daniele Jahier Pagliari
Daniele Palossi

Miniaturized autonomous unmanned aerial vehicles (UAVs) are an emerging and trending topic. With their form factor as big as the palm of one hand, they can reach spots otherwise inaccessible to bigger robots and safely operate in human surroundings. The simple electronics aboard such robots (sub-100 mW) make them particularly cheap and attractive but pose significant challenges in enabling onboard sophisticated intelligence. In this work, we leverage a novel neural architecture search (NAS) technique to automatically identify several Pareto-optimal convolutional neural networks (CNNs) for a visual pose estimation task. Our work demonstrates how reallife and field-tested robotics applications can concretely leverage NAS technologies to automatically and efficiently optimize CNNs for the specific hardware constraints of small UAVs. We deploy several NAS-optimized CNNs and run them in closed-loop aboard a 27-g Crazyflie nano-UAV equipped with a parallel ultra-low power System-on-Chip. Our results improve the State-of-the-Art by reducing the in-field control error of 32% while achieving a real-time onboard inference-rate of ~10Hz@10mW and ~50Hz@90mW.

Details

IROS Conference 2023 Conference Paper

LocalViT: Analyzing Locality in Vision Transformers

Yawei Li 0001
Kai Zhang 0008
Jiezhang Cao
Radu Timofte
Michele Magno
Luca Benini
Luc Van Gool

The aim of this paper is to study the influence of locality mechanisms in vision transformers. Transformers originated from machine translation and are particularly good at modelling long-range dependencies within a long sequence. Although the global interaction between the token embeddings could be well modelled by the self-attention mechanism of transformers, what is lacking is a locality mechanism for infor-mation exchange within a local region. In this paper, locality mechanism is systematically investigated by carefully designed controlled experiments. We add locality to vision transformers into the feed-forward network. This seemingly simple solution is inspired by the comparison between feed-forward networks and inverted residual blocks. The importance of locality mechanisms is validated in two ways: 1) A wide range of design choices (activation function, layer placement, expansion ratio) are available for incorporating locality mechanisms and proper choices can lead to a performance gain over the baseline, and 2) The same locality mechanism is successfully applied to vision transformers with different architecture designs, which shows the generalization of the locality concept. For ImageNet2012 classification, the locality-enhanced transformers outperform the baselines Swin-T [1], DeiT-T [2] and PVT-T [3] by 1. 0%, 2. 6 % and 3. 1 % with a negligible increase in the number of parameters and computational effort. Code is available at https://github.com/ofsoundof/LocalViT.

Details

NeurIPS Conference 2023 Conference Paper

MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition

Nicolas Menet
Michael Hersche
Geethan Karunaratne
Luca Benini
Abu Sebastian
Abbas Rahimi

With the advent of deep learning, progressively larger neural networks have been designed to solve complex tasks. We take advantage of these capacity-rich models to lower the cost of inference by exploiting computation in superposition. To reduce the computational burden per input, we propose Multiple-Input-Multiple-Output Neural Networks (MIMONets) capable of handling many inputs at once. MIMONets augment various deep neural network architectures with variable binding mechanisms to represent an arbitrary number of inputs in a compositional data structure via fixed-width distributed representations. Accordingly, MIMONets adapt nonlinear neural transformations to process the data structure holistically, leading to a speedup nearly proportional to the number of superposed input items in the data structure. After processing in superposition, an unbinding mechanism recovers each transformed input of interest. MIMONets also provide a dynamic trade-off between accuracy and throughput by an instantaneous on-demand switching between a set of accuracy-throughput operating points, yet within a single set of fixed parameters. We apply the concept of MIMONets to both CNN and Transformer architectures resulting in MIMOConv and MIMOFormer, respectively. Empirical evaluations show that MIMOConv achieves $\approx 2$–$4\times$ speedup at an accuracy delta within [+0. 68, -3. 18]% compared to WideResNet CNNs on CIFAR10 and CIFAR100. Similarly, MIMOFormer can handle $2$–$4$ inputs at once while maintaining a high average accuracy within a [-1. 07, -3. 43]% delta on the long range arena benchmark. Finally, we provide mathematical bounds on the interference between superposition channels in MIMOFormer. Our code is available at https: //github. com/IBM/multiple-input-multiple-output-nets.

PDF Details

NeSy Conference 2023 Conference Paper

Solving Raven's Progressive Matrices via a Neuro-vector-symbolic Architecture

Michael Hersche
Mustafa Zeqiri
Luca Benini
Abu Sebastian
Abbas Rahimi

Details

JBHI Journal 2021 Journal Article

An Ensemble of Hyperdimensional Classifiers: Hardware-Friendly Short-Latency Seizure Detection With Automatic iEEG Electrode Selection

Alessio Burrello
Simone Benatti
Kaspar Schindler
Luca Benini
Abbas Rahimi

We propose a new algorithm for detecting epileptic seizures. Our algorithm first extracts three features, namely mean amplitude, line length, and local binary patterns that are fed to an ensemble of classifiers using hyperdimensional (HD) computing. These features are embedded into prototype vectors representing ictal (during seizures) and interictal (between seizures) brain states are constructed. These vectors can be computed at different spatial scales ranging from a single electrode up to many electrodes. This flexibility allows our algorithm to identify the electrodes that discriminate best between ictal and interictal brain states. We assess our algorithm on the SWEC-ETHZ iEEG dataset that includes 99 short-time iEEG seizures recorded with 36 to 100 electrodes from 16 drug-resistant epilepsy patients. Using k-fold cross-validation and all electrodes, our algorithm surpasses state-of-the-art algorithms yielding significantly shorter latency (8. 81 s vs. 11. 57 s) in seizure onset detection, and higher specificity (97. 31% vs. 94. 84%) and accuracy (96. 85% vs. 95. 42%). We can further reduce the latency of our algorithm to 3. 74 s by allowing a slightly higher percentage of false alarms (2% specificity loss). Using only the top 10% of the electrodes ranked by our algorithm, we still maintain superior latency, sensitivity, and specificity compared to the other algorithms with all the electrodes. We finally demonstrate the suitability of our algorithm to deployment on low-cost embedded hardware platforms, thanks to its robustness to noise/artifacts affecting the signal, its low computational complexity, and the small memory-footprint on a RISC-V microcontroller.

Details DOI

EAAI Journal 2019 Journal Article

A semisupervised autoencoder-based approach for anomaly detection in high performance computing systems

Andrea Borghesi
Andrea Bartolini
Michele Lombardi
Michela Milano
Luca Benini

High Performance Computing (HPC) systems are complex machines with heterogeneous components that can break or malfunction. Automated anomaly detection in these systems is a challenging and critical task, as HPC systems are expected to work 24/7. The majority of the current state-of-the-art methods dealing with this problem are Machine Learning techniques or statistical models that rely on a supervised approach, namely the detection mechanism is trained to recognize a fixed number of different states (i. e. normal and anomalous conditions). In this paper a novel semi-supervised approach for anomaly detection in supercomputers is proposed, based on a type of neural network called autoencoder. The approach learns the normal state of the supercomputer nodes and after the training phase can be used to discern anomalous conditions from normal behavior; in doing so it relies only on the availability of data characterizing only the normal state of the system. This is different from supervised methods that require data sets with many examples of anomalous states, which are in general very rare and/or hard to obtain. The approach was tested on a real-life High Performance Computing system equipped with a monitoring infrastructure capable to generate large amount of data describing the system state. The proposed approach definitely outperforms the best current techniques for semi-supervised anomaly detection, with an increase in accuracy detection of around 12%. Two different implementations are discussed: one where each supercomputer node has a specific model and one with a single, generalized model for all nodes, in order to explore the trade-off between accuracy and ease of deployment.

Details DOI

NeurIPS Conference 2019 Conference Paper

Constrained deep neural network architecture search for IoT devices accounting for hardware calibration

Florian Scheidegger
Luca Benini
Costas Bekas
A. Cristiano I. Malossi

Deep neural networks achieve outstanding results for challenging image classification tasks. However, the design of network topologies is a complex task, and the research community is conducting ongoing efforts to discover top-accuracy topologies, either manually or by employing expensive architecture searches. We propose a unique narrow-space architecture search that focuses on delivering low-cost and rapidly executing networks that respect strict memory and time requirements typical of Internet-of-Things (IoT) near-sensor computing platforms. Our approach provides solutions with classification latencies below 10~ms running on a low-cost device with 1~GB RAM and a peak performance of 5. 6~GFLOPS. The narrow-space search of floating-point models improves the accuracy on CIFAR10 of an established IoT model from 70. 64% to 74. 87% within the same memory constraints. We further improve the accuracy to 82. 07% by including 16-bit half types and obtain the highest accuracy of 83. 45% by extending the search with model-optimized IEEE 754 reduced types. To the best of our knowledge, this is the first empirical demonstration of more than 3000 trained models that run with reduced precision and push the Pareto optimal front by a wide margin. Within a given memory constraint, accuracy is improved by more than 7% points for half and more than 1% points for the best individual model format.

PDF Details

NeurIPS Conference 2017 Conference Paper

Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations

Eirikur Agustsson
Fabian Mentzer
Michael Tschannen
Lukas Cavigelli
Radu Timofte
Luca Benini
Luc Gool

We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy. Our method is based on a soft (continuous) relaxation of quantization and entropy, which we anneal to their discrete counterparts throughout training. We showcase this method for two challenging applications: Image compression and neural network compression. While these tasks have typically been approached with different methods, our soft-to-hard quantization approach gives results competitive with the state-of-the-art for both.

PDF Details

ECAI Conference 2016 Conference Paper

DARDIS: Distributed And Randomized DIspatching and Scheduling

Thomas Bridi
Michele Lombardi 0001
Andrea Bartolini
Luca Benini
Michela Milano

Scheduling and dispatching are critical enabling technologies in supercomputing and grid computing. In these contexts, scalability is an issue: we have to allocate and schedule up to tens of thousands of tasks on tens of thousands of resources. This problem scale is out of reach for complete and centralized scheduling approaches.

Details

AIJ Journal 2014 Journal Article

CROSS cyclic resource-constrained scheduling solver

Alessio Bonfietti
Michele Lombardi
Luca Benini
Michela Milano

Cyclic scheduling problems consist in ordering a set of activities executed indefinitely over time in a periodic fashion, subject to precedence and resource constraints. This class of problems has many applications in manufacturing, embedded systems and compiler design, production and chemical systems. This paper proposes a Constraint Programming approach for cyclic scheduling problems, based on modular arithmetic: in particular, we introduce a modular precedence constraint and a global cumulative constraint along with their filtering algorithms. We discuss two possible formulations. The first one (referred to as CROSS) models a pure cyclic scheduling problem and makes use of both our novel constraints. The second formulation (referred to as CROSS ⁎ ) introduces a restrictive assumption to enable the use of classical resources constraints, but may incur a loss of solution quality. Many traditional approaches to cyclic scheduling operate by fixing the period value and then solving a linear problem in a generate-and-test fashion. Conversely, our technique is based on a non-linear model and tackles the problem as a whole: the period value is inferred from the scheduling decisions. Our approach has been tested on a number of non-trivial synthetic instances and on a set of realistic industrial instances. The method proved to effective in finding high quality solutions in a very short amount of time.

Details DOI

AAAI Conference 2012 Conference Paper

Optimization and Controlled Systems: A Case Study on Thermal Aware Workload Dispatching

Andrea Bartolini
Michele Lombardi
Michela Milano
Luca Benini

Although successfully employed on many industrial problems, Combinatorial Optimization still has limited applicability on several real-world domains, often due to modeling difficulties. This is typically the case for systems under the control of an on-line policy: even when the policy itself is well known, capturing its effect on the system in a declarative model is often impossible by conventional means. Such a difficulty is at the root of the classical, sharp separation between off-line and on-line approaches. In this paper, we investigate a general method to model controlled systems, based on the integration of Machine Learning and Constraint Programming (CP). Specifically, we use an Artificial Neural Network (ANN) to learn the behavior of a controlled system (a multicore CPU with thermal controllers) and plug it into a CP model by means of Neuron Constraints. The method obtains significantly better results compared to an approach with no ANN guidance. Neuron Constraints were first introduced in (Bartolini et al. 2011b) as a mean to model complex systems: providing evidence of their applicability to controlled systems is a significant step forward, broadening the application field of combinatorial methods and disclosing opportunities for hybrid off-line/on-line optimization.

PDF Details

IJCAI Conference 2005 Conference Paper

Allocation and Scheduling for MPSoCs via decomposition and no-good generation

Luca Benini
Davide Bertozzi
Alessio Guerri
Michela

PDF