In this post we'll explore AI Dungeon Universes, a new feature coming out in the next few weeks. We'll talk about what a universe does, how we made it, and how we make sure it works.
Universes Are Vibes
A universe doesn't have any concrete world info, like locations or characters. Instead, a universe is more like a vibe. It influences the literary style of the AI, as well as the types of events or characters that might be encountered. Right now, we only have two universes, normal and horror (aka. H.P. Lovecraft), but we intend to add more. You'll be able to set the universe per-adventure, at the top of the world info page for that adventure.
To give you a sense of what the universe setting does, here's a prompt and two different completions for that prompt, one for each universe setting:
From the highlighted phrases, you can tell that the two completions are a little different. We might say the H.P. Lovecraft completion is a little darker, or creepier. Obviously it's difficult to quantify how "Lovecraftian" the model is (though we've been thinking about how to do this), but it seems to have a qualitative effect.
Making AI Dungeon (And Universes)
In the beginning, there was GPT-3, which was basically really good auto-complete. It could complete all sorts of things: reviews, news articles, poetry, etc. However, Nick, mysterious and wise, didn't want GPT-3 to complete poetry, he wanted it to complete adventures, sort of like a dungeon master.
So he decided to teach GPT-3 what an adventure looks like. He did this by scraping chooseyourstory.com, an interactive fiction website where reader choices influence the outcome of the story.
He wrote a bot that played many of these scenarios, compiling the results into text files, like this one:
Then, he presented GPT-3 with hundreds of these stories, asking it to mimic them (aka fine-tuning). This created a copy of GPT-3 that understood the "adventure format" (second person, alternating actions and results) and could finish an adventure if you gave it the beginning. Thus AI Dungeon was born.
The Cthulhu Model
Eventually we got bored and asked, why not have different versions of AI Dungeon? Maybe we could make a sci-fi version, or a fantasy wizard version? In the end, we settled on horror, collecting a bunch of stories from H.P. Lovecraft and hiring freelancers to convert them into "adventure format." We followed the same fine-tuning process and created yet another copy of GPT-3, and thus the Cthulhu model was born.
One Model To Rule Them All
Why not keep going? Money, in short. Each copy of GPT-3 we run costs a lot of money per month. Trust me when I say even 5-10 universes would be totally unsustainable.
We decided to try a different approach, inspired by this machine learning paper called CTRL. In it, they showed that you can get a single copy of GPT-3 to learn multiple modes which you can easily turn on and off.
The method is dead simple: for each distinct mode you want GPT-3 to have, you come up with a tag. And then you put that tag at the top of all the data you want GPT-3 to associate with the tag. In our case, we only have two tags: Choose Your Story and H.P. Lovecraft.
This lets us train just one model for all the universes we have, and will let us add even more universes without making costly copies of GPT-3.
To change the universe, all we have to do is append the appropriate tag to the story context. When GPT-3 sees "H.P. Lovecraft" at the top of the story, it should say, "oh this is the same tag you used when you showed me all those horror stories, so I'll make this adventure end like those other ones."
Generally, we have a three step process for rolling out a new version of AI Dungeon.
- First, we manually inspect some adventures, looking at completions to make sure the AI isn't total garbage.
- Next, we release the feature to the alpha testers (we have around 20) and ask for feedback.
- If there are no major issues, we A/B test the model (randomly giving 50% of users the new model) and look at the distribution of the "adventure feedback" survey shown below.
We took all these steps with Universes. I, personally, reviewed around 15 adventure completions (though I admit they were only a few actions each). Alpha testers seemed to like it, or at least they couldn't tell the difference. And then we ran the A/B test (Dragon only), starting on Thursday 3/18 and ending Friday 3/26. According to the surveys, the two model versions are indistinguishable.
However, there has been some discontent on Discord, saying that the "Choose Your Story" tag is corrupting model output. It's hard to say if this is fact or psychology. This may have been caused by the tag taking up user space in the context (which also caused the tag to show up in Last Model Input). However, that was fixed on Saturday, 3/20. We're still not sure which types of stories are being impacted. If you have an adventure you think does poorly with the tag, feel free to hit me up on Discord.
The team agrees that we need better metrics and better ways of assessing model quality. This is definitely an area of improvement that we'll actively work on. Once we're confident that the tags are not a total disaster, we'll replace the current Dragon model with the Universe-enabled one, and hopefully push many more universes in the coming months. but time will tell and as always we look forward to your feedback as we work to improve AI Dungeon for everyone.