Helix: AI Model Powers Dexterous Humanoid Robots for Industrial Tasks
Helix is a generalist Vision-Language-Action (VLA) AI model engineered to enhance the capabilities of humanoid robots in complex, unstructured industrial environments like warehouses, logistics centers, and manufacturing floors. Helix integrates advanced perception, language comprehension, and learned motor control into a unified system, enabling robots to perform dexterous tasks with notable speed, adaptability, and reduced programming overhead.
Helix has full upper body control, being the first VLA to output high-rate, continuous control of the entire humanoid upper body, including wrists, torso, head, and individual fingers. It's also the first VLA to operate at the same on two robots, enabling them to work together to solve tasks with items they've never seen before.
Figure's humanoid robots fitted with Helix AI are able to pick up just about any small household item, including thousands of items they've never encountered before, by following natural language prompts. By using one neural network to learn all behaviors—picking & placing items, using drawers & refrigerators, and cross-robot interaction—they can complete tasks without specific fine-tuning for that task.
Helix is the first VLA that runs entirely onboard embedded low-power-consumption GPUs, making it immediately ready for commercial deployment.
Traditional robot control often forces a choice between fast, task-specific policies and slower, more general large AI models. Helix addresses this through a novel dual-system architecture:
- System 1 (S1)—High-Speed Reactive Control: A visuomotor policy translates S2's high-level intent into precise, continuous robot actions at 200 Hz. This rapid control loop allows the robot to react dynamically to its environment and execute movements smoothly.
- System 2 (S2)—High-Level Task Understanding: An onboard Vision-Language Model (VLM), pretrained on broad internet data, operates at 7 to 9 Hz. It interprets scenes and natural language commands (e.g., "Pick up the gasket," "Place the part in the bin") to understand task requirements and object context, enabling generalization to items and situations not seen during training.
This decoupled design allows S2 to handle complex reasoning while S1 ensures fluid, real-time motion execution, optimizing performance for demanding industrial applications.
High-DoF with high-speed control outputs continuous control commands at 200 Hz for a 35 Degree-of-Freedom (DoF) humanoid upper body. It manages coordinated wrist, finger, torso, and head movements for complex manipulation tasks.
Zero-shot generalization enables robots to interact with novel objects based solely on natural language commands, without task-specific training or demonstrations. This significantly reduces setup time and allows handling of diverse parts or products encountered in dynamic environments.
Natural language programming simplifies task assignment through plain language instructions, reducing reliance on complex coding or robotic specialists. A single set of neural network weights governs diverse behaviors (picking, placing, interaction), eliminating the need for per-task fine-tuning or separate models, simplifying deployment and maintenance.
Full onboard processing runs entirely on low-power-consumption embedded GPUs located on the robot, allowing for standalone operation on the factory or warehouse floor without requiring external compute infrastructure or cloud connectivity.
Multirobot collaboration supports coordinated tasks between multiple robots running identical Helix models, directed via natural language, suitable for assembly line or logistics operations.
Data-efficient training achieves robust performance from a relatively small dataset (~500 hours of teleoperation), potentially accelerating deployment compared to data-heavy AI approaches.
Figure
Sunnyvale, CA
figure.ai