July 2023
tl;dr: Code gen for robotic control policies. LMP (language model generated programs) that can represent robotics policies, including reactive policies and waypoint-based policies. This is one way to “ground” LLM to the real world.
Overall impression
This is one way to “ground” LLM to the real world. VoxPoser provides another, more generalized way to generate control. CaP still relies on manually designed motion primitives, while VoxPoser circumvent this via an automatic value map generation.
CaP is the first step for LLM to step beyond a sequence of skills. Code writing LLMs can orchestrate planning, policy logic and control. CaP can interpret language commands to generate LMPs that represent reactive low-level policies (PD or impedance controllers) and waypoints-based policies (for vision-based pick and place, or trajectory based control).
CaP alleviates the need to collect data and train a fixed set of predefined skills or language conditioned policies. CaP still relies on low level control APIs.
CaP generalizes at a specific layer in the robot stack: interpreting NL instructions processing perception outputs and then parameterize low-dimensional inputs to control primitives. This fits into system with factorized perception and control.
Despite great progress, how to form data close-loop to continuously improve CaP is unclear. It is also limited to a handful of named primitive parameters of control API.
The concurrent work is ProgPrompt. The comparison can be found in its project page.
Key ideas
- CaP can parameterize and write parts of low-level control primitives (“move a bit to the right”, etc).
- LMP
- React to perceptual inputs
- parameterize control primitives APIs
- directly compiled and executed on a robot
- Hierarchical code-gen (recursively defining undefined functions) is central to CaP. CaP self-architect a dynamic codebase.
- Parsing code generated by LMP and locating undefined functinos
- calling another LMP specialized in function generation to create those functions.
- This allows the high-level LMPs to follow good abstraction practices.
- Perception and control APIs
- LMP to generate parametrized low-level control APIs
- Perception and control APIs ground the LMP to a real-world robot system.
- Improvements to perception and control APIs will improve LMP-based policies.
Technical details
- HW:
- UR5e robotic arm.
- Intel Realsnse D435 camera.
- Robotiq 2F-85 gripper (same as the one used in 1X Eve?)
- Incorporating each new skill (and mode of grounding) requires additional data and retraining.
- Third party API vs First party APIs
- Third party APIs occur in training data (such as numpy)
- First party libraries not found in training data if those functions have meainingful names and provided in Hints.
- Open vocabulary object detector MDETR and ViLD.
- Prompt engineering
- Make sure prompts contains code that has no bugs
- Explicit import statements do improve reliability
- Prompts do not need to be long, but need to be relevant and specific
Notes
- Questions and notes on how to improve/revise the current work