Figure has demonstrated the first fruit of its collaboration with OpenAI to enhance the capabilities of humanoid robots. In a video released today, the Figure 01 bot is seen conversing in real-time.
The development progress at Figure is nothing short of extraordinary. Entrepreneur Brett Adcock only emerged from stealth last year, after gathering together a bunch of key players from Boston Dynamics, Tesla Google DeepMind and Archer Aviation to "create the world's first commercially viable general purpose humanoid robot."
By October, the Figure 01 was already up on its feet and performing basic autonomous tasks. By the turn of the year, the robot had watch-and-learn capabilities, and was ready to enter the workforce at BMW by mid-January.
We got to see it on the warehouse floor last month, just before Figure announced a successful Series B funding round along with a collaboration agreement with OpenAI "to develop next generation AI models for humanoid robots." Now we get a taste for what that means.
Adcock confirmed in an X post that Figure 01's integrated cameras send data to a large vision-language model trained by OpenAI, while Figure's own neural networks also "take images in at 10 Hz through cameras on the robot." OpenAI is also responsible for the ability to understand spoken words, and all of this influx of information is translated into "fast, low level, dexterous robot actions" by Figure's neural net.
He confirmed that the robot was not teleoperated during the demo, and that the video was filmed at actual speed. All up, a remarkable achievement for a partnership that's less than two weeks old – "our goal is to train a world model to operate humanoid robots at the billion-unit level," said Adcock. At this rate, we won't have to wait long.
Figure已展示了其与OpenAI合作增强人形机器人能力的第一个成果。在今天发布的一段视频中,可以看到Figure 01机器人实时对话。
Figure的发展进程无疑是非同寻常的。企业家布雷特·阿德科克仅在去年才从谨慎状态走出,之前他汇集了来自波士顿动力公司、特斯拉、谷歌DeepMind和Archer Aviation的关键人员,目的是"创造世界上第一款商业化可行的通用人形机器人"。
到10月,Figure 01已能独立站立并执行基本任务。转眼至年底,这个机器人就具备了观察学习能力,并于1月中旬准备在宝马公司投入工作。
上个月我们在仓库里见到了它,之后不久Figure就宣布成功完成B轮融资,并与OpenAI达成合作协议"开发下一代人形机器人AI模型"。现在我们终于一窥这意味着什么。
阿德科克在X网站上确认,Figure 01的集成摄像头将数据发送到由OpenAI训练的大型视觉语言模型,而Figure自己的神经网络也"以10Hz频率通过机器人上的摄像头输入图像"。OpenAI还负责理解口语能力,所有这些信息流被Figure的神经网络转化为"快速、低级、灵活的机器人动作"。
他证实,这个演示视频中的机器人没有遥控操作,并且视频是以实际速度拍摄的。总的来说,这是一个令人难以置信的成就,因为这个合作伙伴关系仅维持了不到两周的时间。阿德科克说:"我们的目标是训练一个世界模型,使人形机器人能够百万单位运行。"按照这个速度,我们不用等太久就能看到了。