Who needs prompt engineers anyway?
For clearly structured problems, AI-written prompts beat human prompt engineers in both speed and cost.
These days, I spend most of my time working on the AI games creator at FRVR.ai. I strongly believe the future of digital work is AI-first, where humans role is to assist and support AI’s performing the actual work. As a result, I design our solutions to be primarily used by AI agents rather than humans.1
As a result, an increasing amount of my time has been spent writing and debugging AI prompts - often being the more time consuming part of solving any task. This naturally begs the question; why not make the AI write the AI prompts?
why not make the AI write the AI prompts?
This is especially true for AI tasks with well-defined inputs and outputs, as you can automatically validate whether the output, of a prompt, meets expectations. As an example; to access our games creator beta, your creator application must first be approved — this approval is done by AI and then signed off by a human (me).
However, being human, my approval criteria are complex. I might approve a user with no experience who shows a strong willingness to learn, while denying a more experienced user for displaying undesirable attributes; such as having a swear word in their username.
Writing an AI prompt that integrates all these complex criteria has been incredibly difficult, bordering on impossible. But, by developing and making use of a AI prompt generator, which recursively writes and tests prompts against known test data, creating such a prompt now takes ~5 minutes and cost less than $4 in AI processing!
I am not proud to admit it, but it took less time building the first version of this tool and generating a superior prompt than I had previously spend tweaking the original prompt for approving creators.
This system doesn’t just streamline workflows - it changes the way we think about automation.
This system doesn’t just streamline workflows - it changes the way we think about automation. Wherever we have problems with well-defined inputs and outputs, we can now automatically generate high-quality AI prompts to solve them. We’re currently exploring applying this method to other areas of our organization, such as costumer support, emails, legal and finance.
The simple setup
The system is fully generalized, with the input structured as a prompt to test and a set of objects to validate against their expected outcome. In our case, the test cases are formatted as a JSON object:
[ { "data": { "why_do_you_want_access": "I try all new AI tools", "what_is_your_dev_experiance": "I have 2 years", "what_is_your_genai_experiance": "I have 1 year", "username": "Benjaminsen" }, "result": "approve" }, ... ]
The current prompt is tested against each known test case.
The current prompt, along with the test results are used as input for a prompt generator, which modifies and rewrites the original prompt with the goal of creating a new prompt that is more likely to return the expected output.
The new prompt is tested against each known test case.
If the new prompt has a correctness rate better than the current prompt, it’s promoted to the current prompt.
Repeat step #3, #4, #5 until 100% of test cases return the expected output.
It’s worth noticing that the system does not need any priming such as initial explaining of our goal, this is automatically inferred from the data given.
Interestingly this is very close to how AI training and fine-tuning works, except that we are operating fully in user space. It’s also orders of magnitude cheaper and allows us to upgrade to faster, cheaper or better AI models when they become avilable.
It’s also orders of magnitude cheaper
Considering costs, you could also imagine using other cheaper filter methods such as Bayesian Filtering to solve the same tasks, however in test we found that using this method requires orders of magnitude less input data for the same successful output rate.2
Human assisted AI
While the majority of work is now done by the AI, the AI still needs human input and guidance. In our example:
What data is passed into the system is determined by a developer.
The initial data set was created by manually reviewing creator applications.
The data set is expanded and the prompt regenerated when a new instance of incorrect output is found.
Observations and learnings while developing this tool
Care has to be taken that the prompt generated is generalized and not just a bullet point list of specific values that should result in a specific output.
The hardest part of building a prompt generator was to write the prompt to generate the next prompt. 🤦♂️
It’s more likely that the dataset contains inconsistent input and/or output data than, the AI being unable to write a single prompt capable of generating expected output data from ordered input data.
The AI does not care about nicely formatted input data, it’s cheaper and faster to give the AI JSON without nice formatting.
In most of our use cases it’s possible to use an expensive and slow AI model to generate the prompts and a cheaper and faster AI model to solve a given task.
This doesn’t mean we don’t create human interfaces; it just means the AI is the top priority, with human UX as a secondary focus for advanced users.
A similar recursive approach could properly be used to let an AI tweak a Bayesian Filter rather than an AI prompt.