Hey there, fellow experimenter with cognitive architectures here. Love your Alfred experiment and I have been experimenting with many similar things, more focused on the memory than tool calling aspect for now. One thing I’ve noticed is that the orchestrator you mention really really benefits from beeing > gpt-4 Level Model. There is lots of subtlety you want captured here and often it’s not many tokens so it’s really worth it to use the best available model at critical points in the system (like orchestration). Also there can be a sort of fractal system where any module can spawn an entire new agent if it turns out a given task is too hard on its own. Also if you turn this into a full on assistant I would recommend progressive summarisation over just using retrieval for long term memory as well :) Anyways great post!
wow thank you Mark! I agree, although I find you can achieve similar if you use budget forcing with a gpt 4o level model and assign a generous token budget for thinking.
The onoy benefit is to use smaller models that can run faster. I'm trying to figure out if using a smaller llama model with some iteration of budget forcing can scale output quality, because if yes, Groq's 10x token per second inference at gpt-4o prices for smaller llama models can make it VERY competitive.
Hey there, fellow experimenter with cognitive architectures here. Love your Alfred experiment and I have been experimenting with many similar things, more focused on the memory than tool calling aspect for now. One thing I’ve noticed is that the orchestrator you mention really really benefits from beeing > gpt-4 Level Model. There is lots of subtlety you want captured here and often it’s not many tokens so it’s really worth it to use the best available model at critical points in the system (like orchestration). Also there can be a sort of fractal system where any module can spawn an entire new agent if it turns out a given task is too hard on its own. Also if you turn this into a full on assistant I would recommend progressive summarisation over just using retrieval for long term memory as well :) Anyways great post!
wow thank you Mark! I agree, although I find you can achieve similar if you use budget forcing with a gpt 4o level model and assign a generous token budget for thinking.
The onoy benefit is to use smaller models that can run faster. I'm trying to figure out if using a smaller llama model with some iteration of budget forcing can scale output quality, because if yes, Groq's 10x token per second inference at gpt-4o prices for smaller llama models can make it VERY competitive.