Bringing Foundation Models to the Edge with Efficient Deployment Strategies
Lucas Moreira Ferreira, Mariana Silva, Thiago Costa, Adelaide Mattia de Rocha, Rafael Valladares de Almeida, Camila I. de Oliveira, João Pedro Nunes, Gulnaz Rati
- Year
- 2025
- Citations
- 2
- Access
- Open access
Abstract
Foundation Models (FMs)-large-scale deep learning models pretrained on massive and diverse datasets-have rapidly emerged as the cornerstone of modern artificial intelligence, demonstrating state-of-theart performance across a multitude of tasks in natural language processing, computer vision, speech recognition, and multimodal learning. These models, characterized by their vast parameter counts and intricate architectures, offer exceptional generalization capabilities, few-shot learning, and transferability. However, their enormous computational and memory requirements have traditionally restricted their deployment to centralized, cloud-based infrastructures equipped with high-end GPUs or TPUs. As the demand for real-time, low-latency, privacy-preserving AI continues to grow across applications such as autonomous vehicles, mobile health, robotics, and smart environments, there is a pressing need to bring the power of FMs to edge devices-platforms that operate under stringent resource constraints, including limited memory, compute power, and energy budgets. This review presents a comprehensive and in-depth exploration of the landscape of Foundation Model deployment on edge hardware platforms, including Raspberry Pi, FPGAs, microcontrollers, and other embedded systems. We examine the motivations for edge deployment, including the desire to minimize latency, reduce reliance on unreliable network connections, protect sensitive data through on-device inference, and enable autonomous operation in decentralized environments. We analyze the major technical challenges posed by this paradigm shift, ranging from hardware limitations and thermal constraints to the incompatibility of standard FM architectures with edge-level inference. The review explores current state-of-the-art solutions, including model compression techniques such as quantization, pruning, knowledge distillation, and low-rank approximation, as well as advances in compiler toolchains and hardware accelerators that enable efficient inference of large models on constrained devices. Furthermore, we delve into the roles of reconfigurable logic and custom silicon in bridging the compute gap, particularly focusing on the potential of FPGAs to offer adaptable, low-latency execution environments tailored to specific model structures. We also highlight software ecosystems and runtime environments designed to facilitate FM deployment on edge devices, assessing their capabilities and limitations in supporting diverse workloads and ensuring portability. In addition to technical considerations, the review addresses critical concerns related to data privacy, model security, federated learning, and on-device adaptation, which are essential for trustworthy and user-aligned AI applications. The survey culminates with a discussion of open challenges and emerging research directions that promise to shape the future of edge-AI systems powered by Foundation Models. These include the design of edge-native model architectures, lifelong and federated learning under resource constraints, explainability and robustness in edge deployments, and the development of scalable toolchains that unify hardware-software co-design. Through a thorough synthesis of existing literature, architectural insights, and deployment case studies, this review aims to serve as a foundational resource for researchers, engineers, and practitioners seeking to advance the state of the art in bringing the capabilities of Foundation Models to the edge. By charting a roadmap that bridges the gap between model expressiveness and hardware feasibility, we hope to catalyze innovation in building intelligent, responsive, and sustainable AI systems that operate efficiently at the network’s edge.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002