Large Language Models as General Pattern Machines
Authors: Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, Andy Zeng
Abstract: We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences – from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstract Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art. Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary. These results suggest that without any additional training, LLMs can serve as general sequence modelers, driven by in-context learning. In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics – from extrapolating sequences of numbers that represent states over time to complete simple motions, to least-to-most prompting of reward-conditioned trajectories that can discover and represent closed-loop policies (e.g., a stabilizing controller for CartPole). While difficult to deploy today for real systems due to latency, context size limitations, and compute costs, the approach of using LLMs to drive low-level control may provide an exciting glimpse into how the patterns among words could be transferred to actions.
What, Why and How
Here is a summary of the key points from the paper:
What:
-
The paper shows that large language models (LLMs) like GPT-3 have an inherent capability for pattern recognition and completion when prompted with examples, without any additional training.
-
This capability applies not just to linguistic patterns, but more abstract symbolic and numeric patterns as well.
-
The authors demonstrate the application of this capability in 3 areas: (1) Sequence transformation (generalizing transformations between input-output sequence pairs). (2) Sequence completion (extrapolating periodic functions from partial sequences). (3) Sequence improvement (iteratively improving sequences like robot trajectories according to a reward).
Why:
-
Understanding and characterizing the innate pattern reasoning abilities of LLMs is important for determining their potential as generalizable models for robotics and other domains beyond language itself.
-
The ability to manipulate and complete arbitrary symbolic patterns could allow LLMs to operate on various levels of abstraction for robot perception, planning and control.
-
Zero-shot in-context learning could enable a pretrained LLM to be adapted for new tasks without requiring task-specific training data.
How:
-
The authors prompt the LLM with a few input-output examples that implicitly specify the pattern or transformation rule.
-
They test sequence transformation with the Abstract Reasoning Corpus, procedurally generated examples, and robotic manipulation tasks.
-
For sequence completion, they show LLMs can extrapolate periodic functions and robot motions from partial demonstrations.
-
Sequence improvement is shown by iteratively generating higher reward trajectories on grid navigation and CartPole tasks.
-
Throughout, representations use numeric and arbitrary tokens to evaluate the generality of pattern manipulation.
In summary, the key insight is that LLMs exhibit surprisingly general capabilities for pattern recognition and manipulation when prompted with examples, which could be useful for robotics applications and policy improvement with limited or no training data. The paper aims to characterize and demonstrate these innate skills.
Main Contributions
Here are the main contributions of this paper:
-
Identifies and characterizes the inherent pattern recognition and manipulation capabilities of large language models (LLMs) when prompted with examples, without any task-specific training.
-
Demonstrates these capabilities extend beyond linguistic patterns to more abstract symbolic and numeric patterns.
-
Evaluates LLMs on the Abstract Reasoning Corpus (ARC) benchmark and shows they can solve a non-trivial number of reasoning problems compared to prior specialized methods.
-
Introduces a new procedurally generated benchmark based on probabilistic context-free grammars to measure LLM pattern manipulation skills.
-
Shows LLMs exhibit a degree of token invariance - they can still recognize patterns when symbols are randomly remapped to other tokens.
-
Applies LLM sequence completion to extrapolate periodic functions and robot motions from partial demonstrations.
-
Demonstrates iterative LLM-based sequence improvement for trajectory optimization tasks using in-context reinforcement learning.
-
Discovers LLM agents can learn stabilizing controllers for CartPole and optimize grid navigation trajectories within 50 episodes.
-
Provides an analysis of the limitations and practical challenges of using LLMs for robotic control.
In summary, the key contributions are empirically evaluating the innate pattern reasoning skills of LLMs on symbolic reasoning benchmarks and simple robotics tasks, proposing new benchmarks, and demonstrating the potential application of these skills for robotic control and policy improvement with limited data.
Method Section
Here is a summary of the methods section from the paper:
The paper evaluates large language models (LLMs) as general pattern machines by assessing their capabilities in 3 categories:
Sequence Transformation:
- Input the LLM with examples of input-output sequence pairs that demonstrate a transformation rule.
- Evaluate on the Abstract Reasoning Corpus (ARC) by flattening the 2D grid patterns into sequences and prompting with a few example input-output grids.
- Also test on procedurally generated sequence transformations based on probabilistic context-free grammars.
Sequence Completion:
- Input the LLM with a partial sequence corresponding to discrete samples from a function.
- Assess if the LLM can extrapolate and complete periodic functions like sinusoids.
- Apply to robotics by encoding partial demonstrations as series of states and prompting the LLM to continue the motion.
Sequence Improvement:
- Input the LLM with a set of trajectories, some labeled with returns.
- Prompt the LLM to generate new higher-reward trajectories by conditioning on previous ones.
- Test trajectory optimization on grid navigation and discovering stabilizing controllers for CartPole.
The representations use both numeric tokens and arbitrary symbols to evaluate pattern manipulation generalization. The prompts provide a few examples that implicitly specify the task without additional training. The paper analyzes results across various model sizes and datasets.
In summary, the core method is assessing the zero-shot in-context learning capabilities of LLMs for pattern reasoning across 3 paradigm cases of transformation, completion and improvement, with applications to symbolic reasoning benchmarks and simple robotics environments.
High-Level Pseudo Code
Here is the high-level pseudo code for the methods in the paper:
# Sequence Transformation
examples = [(input_1, output_1), ..., (input_N, output_N)]
query_input = input
prompt = concatenate(examples, query_input)
predictions = LLM(prompt)
query_output = predictions[-1]
# Sequence Completion
partial_sequence = [s1, ..., sK] # Partial samples from function
prompt = partial_sequence
predictions = LLM(prompt)
completed_sequence = [s1, ..., sK, predictions]
# Sequence Improvement
trajectories = [(reward_1, traj_1), ..., (reward_N, traj_N)]
for i in range(num_iterations):
prompt = concatenate(trajectories, condition)
new_traj = LLM(prompt)
execute(new_traj)
new_reward = get_reward(new_traj)
trajectories.append((new_reward, new_traj))
The key steps are:
-
Sequence Transformation: Provide input-output examples and query input, prompt LLM to predict query output
-
Sequence Completion: Provide partial sequence, prompt LLM to complete sequence
-
Sequence Improvement: Initialize trajectories, iteratively prompt LLM to generate new higher-reward trajectory, execute and update trajectory set
For each category, the method is to prompt the LLM with examples that define the task, then use it to infer outputs or improved sequences.
Detailed Pseudo Code
Here is a more detailed pseudo code outline to implement the key experiments in the paper:
# Sequence Transformation
# ARC
arc_data = load_arc_dataset()
for problem in arc_data:
input_grids, output_grids = problem[:N], problem[N:]
token_grids = []
for grid in input_grids + output_grids:
token_grid = tokenize_grid(grid)
token_grids.append(token_grid)
prompt = concatenate(token_grids)
output = LLM(prompt)[-1]
if output == token_grids[-1]:
num_correct += 1
# PCFG
grammar = define_pcfg()
for k in [1, 2, 4, ...]: # Sequence lengths
for w in [0, 1, 3, ...]: # Grammar rules
inputs, outputs = [], []
for i in range(100):
input = sample_sequence(k)
transformation = sample_transform(w)
output = apply(transformation, input)
inputs.append(input)
outputs.append(output)
prompt = interleave(inputs, outputs)
predictions = LLM(prompt)[-len(outputs):]
if predictions == outputs:
num_correct += 1
# Sequence Completion
partial = sample_partial_trajectory()
prompt = partial
predicted = LLM(prompt)
distance = dtw(partial + predicted, full_trajectory)
# Sequence Improvement
trajectories = []
for i in range(N):
trajectory = sample_trajectory()
reward = get_reward(trajectory)
trajectories.append((reward, trajectory))
for i in range(iterations):
prompt = trajectories + query
new_trajectory = LLM(prompt)[-1]
reward = execute_and_get_reward(new_trajectory)
trajectories.append((reward, new_trajectory))
The key aspects are:
- Tokenize grids/sequences consistently
- Interleave input-output examples in the prompt
- For sequence completion, measure distance to ground truth
- For sequence improvement, iteratively execute and update trajectories
Additional details like maximum context length, action tokenization, and trajectory buffer management would be needed for a full implementation.