At UCL RoMA Lab, we scale Visual-Language-Action (VLA) foundation models for robotics, transforming multimodal perception into intelligent action. Our work advances embodied AI by tackling generalization across diverse sensors, computational efficiency on real-world hardware, and seamless human–robot interaction. The mission is to enable autonomous systems that operate reliably in complex, dynamic environments.