December 2023
tl;dr: A pipeline to use ChatGPT for robotics tasks via prompt engineering, and writing high level code for execution. Similar to CaP (code as policies).
Overall impression
Robotics systems, unlike text-only apps, require deep understanding of real-world physics, environmental context, and the ability to perform physical actions.
LLM’s out-of-the-box understanding of basic concepts (control, camera geometry, physical form factors) makes it an excellenet choice to build generalizable and user-friendly robotics pipeline.
PromptCraft replaces a specialized engineer-in-the-loop with a user-on-the-loop. –> How to polish the interaction between user and the robot or automate as much as possible is the key to real world application (productization).
PromptCraft is NOT a fully automated process, and needs human on the loop to monitor and intervene in case of unexpected behavior generated by LLM, especially so for safety-critical application.
PromptCraft is not using VLM, but rather only LLM.
Key ideas
- Pipeline to construct ChatGPT-based robotics app
- Define high level robot function lib.
- Prompt with objectives and allowed functions.
- The user stays on the loop to evaluate.
- Deployed onto the robot.
Technical details
- The creation of a high level function library, and listing them in the prompt is a key concept that unlock the ablity to solve robotics app with ChatGPT. This avoids unbounded text-based answer, and avoids API under-specification.
- The capability to write new functions confers flexibility and robustness to LLMs.
- The diaglog/conversation ability of ChatGPT is a surprisingly effective vehicle for interactive behavior correction.
- The user of simulators can be particularly useful to evaluate model’s performance before deployment in the real world. –> Simulation (Habitat, AirSim, etc) is the right vehicle to evaluate closed-loop high level task planning.
Notes
- Application of LLM application on robotics, include visual-language navigation, language-based human-robot interaction, and visual-langauge manipulation control (PerAct, Cliport by Dieter Fox)