The Full Context on Using Credits for Context

After every release, we pay close attention to the feedback you all share with us to identify the things that you've enjoyed, things you want more of, and ways that we can make your experience better. Based on the comments and feedback you’ve shared with us, the Saga release has been one of our most successful releases to date. We introduced some exciting new models for free and paid tiers that have been very popular and well-received.

One frequently heard request is to add the ability to use Credits to extend the context length for Deepseek. The short answer to that question is "not yet." But there's a longer answer to that question that we'd like to share with you. We’ve formalized some principles about when we’ll offer the option to use Credits to extend context and want to share them so you know what to expect for future models.

A Brief History of Credits for Context

The primary reason that we haven't had a clear answer for which models we will offer Credits for context is that this pattern and use case evolved organically over the last couple of years. Until now, we’d never formalized any of this into a clear strategy that we could communicate with you. Let's walk through a little bit of that history so you can see how we got to where we are today.

See Mode

It all started with images. Image models, especially good ones, were orders of magnitude more expensive than the language models we were offering at the time. There wasn't a way that we could affordably let players create images without throttling their use somehow. Initially, we charged Energy for images. Energy was an old AI Dungeon system that limited the use of our AI language models for free and subscribed players. Eventually, we transitioned to Image Credits, a resource that was shared with AI Art Studio, an experience that was part of our old Voyage platform.

Then, when we launched Stable Diffusion as an image model, we renamed Image Credits to Credits with the expectation that they could be used for more than just images. They would become a “hard currency” for the platform, as compared to Scales, which are a “soft currency.” We also had plans to use Credits as part of the Voyage experience, but those plans never panned out. We also started giving players Credits instead of Scales as part of their subscription.

However, image generation wasn't as popular as we expected, and we ended up shutting down the Voyage platform. Consequently, some players felt like Credits were a throwaway benefit that didn't have any meaningful use for them in AI Dungeon.

GPT-4 Turbo—Our first “Ultra” model

Then, some big new AI models came onto the scene. The first one that we really paid attention to was GPT-4 Turbo.

There was strong player demand for the model, and even though it moralized and refused, our alpha testers raved about its superior logic and storytelling ability. It felt like magic. This was also in the very early days of our AI Renaissance series of releases, and we didn’t have the same availability of open-source or commercial models that we have today.

It was, truthfully, the most compelling SOTA model we could offer at the time, but it was so expensive that there was no way we could offer it for unlimited play like we did with other models.

The answer to this dilemma was staring us right in the face. What if players could use Credits (which many players weren’t using) to pay for GPT-4 Turbo calls? Or to extend the context length of calls made to that model? We consulted with our alpha testers, built the features needed to support that, and launched. Here’s what that looked like at the time:

Legend—Credits only, 250 tokens per Credit per call
Mythic—1k context unlimited, 250 tokens per Credit per call after that

Critically, there was NO sustainable way for us to launch this model if we didn’t adopt the tokens for context method. GPT-4 Turbo (and, later, GPT-4o) had its fans, so this was a successful decision.

We decided to label GPT-4 Turbo (and similar models) as Ultra Tier models and limit access to the Legend and Mythic tiers only.

Context Inspector and Doubling Context

Around the same time as GPT-4 Turbo, we launched the context inspector. The primary reason we introduced this feature was because the #1 complaint about AI Dungeon at the time was that the AI would forget details like players' names, genders, and other important identifying details. Players without an understanding of AI limitations attributed this to our system or the AI models being bad. Because of the context inspector, for the first time, many of our players realized that the limitations of their AI experience weren’t actually the model being stupid. They learned that the model literally didn't have visibility into their story because of a small context window.

The context inspector made this clear, and the reframing dramatically improved the overall satisfaction players had with AI Dungeon, which was great!

Since then, we’ve seen the demand for more context grow stronger and stronger. As players realized that more context improved their experience, suddenly, we weren't just hearing requests for better models but also for models with more context, which was new for us.

Eventually, we doubled the context for all tiers! Players loved it, but the hunger for more context is an insatiable beast. We know all of you would love more context for your favorite models, so we try to provide as much context as we can at each tier.

Naturally, this means that as we launch new models, there will be demand for using Credits to extend context lengths. Since players now understand that more context = more cost, this is a reasonable request.

Ultra fades into the Shadows

Then came Pegasus 3 70b. It was too expensive to offer to ALL subscribers, but it wasn’t so expensive that we had to charge Credits for it. In other words, it didn’t meet the criteria for an Ultra model, but it was also too expensive to offer as a regular “Premium” model. Our nice, neat model classification system fell apart just weeks after sharing it with players.

It didn't matter that our classification system was invalidated. We abandoned our labeling system and opted in favor of player value. Players were excited to have this model, and we were excited to offer it to them. Providing the option to use Credits to extend context made sense given the model's expense relative to normal premium models

Wizard came along and really shook things up. Like Ultra models, it was only available to Legend/Mythic subscribers, but we ended up offering it unlimited with 2k context for Legend—a first for “Ultra” type models. We enabled subscribers to spend Credits to get more Wizard context.

The demand for Wizard caught us by surprise. We saw a surge in Credit use that we’d never seen before. We didn't see this level of spend with GPT 4 Turbo or GPT 4o largely because of the refusals and the moralizing. For the first time, we saw players consistently spending far more than their subscription limits, with some players spending over $1,000 USD/mo on Credits that they’d use to extend the context of Wizard.

The higher Credit spend showed us that there was player demand for plans with high context limits on expensive, powerful models. It led to the creation of the Shadow Tiers, which are more expensive plans used by many of our players today who want high context limits for the most powerful models we offer.

Credits vs Subscriptions

Our team deeply believes that AI Dungeon is best experienced with unlimited access to AI models. As an immersive experience, you shouldn't have to think about running out of AI actions or Credits. We want you to play without worrying about a limited resource.

Said differently, using Credits to extend context actually disincentivizes you to play.

We’ve moved away from limited play patterns several times. We used to have Energy, which had a cool-down period for using models, both free and premium. Then, we made Premium models unlimited so that subscribers could play without worrying about running out of actions. Then, we implemented ads on our free tier. Although ads were NOT well received, the one benefit they brought was that, for the first time, our free tier could have unlimited AI actions. Finally, we took ads away, bringing an unlimited experience to ALL players, free and subscribed alike.

Although we understand why using Credits for context is so appealing, in our view, it isn't the most ideal play pattern. We would much rather offer models in a way that players don't need to worry about running out of a limited resource.

Today’s Reality

Despite our preference towards unlimited play over Credit spending, there are two compelling reasons to continue to offer players the option to spend Credits to extend their context length:

There continues to be a strong demand for it
We don't currently have another use for Credits that the majority of players are excited about

Both of those facts will likely change over time. For example, we expect to introduce additional ways to use Credits at some point in the future. We also expect the demand for using Credits to extend context will decrease after the launch of Heroes because Heroes uses a different architecture with multiple AI calls. The idea of extending context with Credits isn't as relevant there, and we will likely have different patterns governing usage on Heroes.

However, today, these reasons are still very much true for AI Dungeon, so we want to continue to offer Credits for context to players.

Our New Formal Strategy

Here are the principles that we will use to determine which models we will enable Credit spending to extend context:

(New) Players using Dynamic Large will be able to use Credits to extend context. This includes our Adventure tier, which hasn’t been able to take advantage of this benefit before.
(Continue) Players will be able to use Credits for any current models that offer it. No changes are being made to current models.
(Continue) Players will be able to use Credits to extend context for high-cost models based on demand, availability, and feasibility.
(Change) For future models, Credits will ONLY be used to extend context. We will no longer offer the option for Credit-only actions.
(Change) We will only offer Credits to extend context once we’ve verified that providers are able to handle production traffic with calls at high contexts. This means we may initially release models without the Credit option, but open them up later once we have more confidence in our ability to support them.

Deepseek

As we stated earlier, we will not be offering Credits for context with Deepseek yet.

Candidly, things have been a little bumpy with Deepseek. Deepseek offers some of the highest context limits we’ve ever had on AI Dungeon, and our Shadow tier members have been putting that context to good use. It hasn’t all been rainbows and sunshine. We’ve discovered we need to make some improvements to our systems to better handle this high-context traffic.

Once those changes are complete, if the load on our systems is acceptable, we will consider offering Credits for context on Deepseek.

Constant Change

We appreciate all of you who asked questions about Deepseek and other models that you would like to use Credits for. The AI industry is evolving so quickly, and we're trying to figure things out along the way.

Each year, the AI model landscape has impacted which models we offer, how we offer them, and what players expect. Two years ago, we had one free model and two premium models. Adventurers got 1k of context, Champions (called Hero at the time) got 2k, and Legend got a “whopping” 4k. Mythic and Shadow tiers didn’t exist. A lot has changed in a short time; you now get to enjoy an abundance of great models at higher context lengths.

Our goal is to remain attentive to your feedback and adjust to make sure that we are giving you the most value. We hope these adjustments will help you have an even better AI Dungeon experience.

2025/05/06

Your Saga Begins: Meet Your New AI Models

Your Saga begins now—AI Dungeon’s most narrative-driven release yet has arrived. Meet Muse, an emotionally intelligent storyteller built on Mistral NeMo 12B and available to all players for free. It weaves character-rich tales with surprising depth and coherence, perfect for slow-burn relationships or slice-of-life scenes that hit just right.

Looking for high-stakes drama? Harbinger brings sharper tension and real consequences, crafting adventures where each decision could be your last.

And at the top of the tower stands DeepSeek V3 Chat (March 2025)—an elite, state-of-the-art model with 671B total parameters, delivering prose so natural it blurs the line between human and machine.

Three storytellers. Infinite adventures. Where will your Saga take you?

2025/05/02

Synthetic Data, Preference Optimization, and Reward Models

At Latitude, we want to create the best AI-powered roleplaying experiences for you, and the primary way we do that is through our AI models. Our process for bringing updated AI models to the platform has changed over time but always includes several key elements: identify the best base models for storytelling, construct high quality datasets for finetuning, and evaluate model behavior using carefully designed metrics and player feedback.

Over the past few months, we have researched several ways to take our AI development to the next level. After rigorous experimentation, we are happy to share our findings and how we will be applying these new methods to many of our upcoming AI models.

Emotional Range in Synthetic Data

One of the most powerful attributes of language models is their ability to be finetuned for tasks using a task specific dataset. It is increasingly common to build these datasets using synthetic data generated by another AI model as opposed to strictly human written text (Abdin et al., 2024). This comes with the advantage of being highly scalable and adaptable. However, the end model often inherits biases and quirks of the source model used to generate synthetic data.

One common effect we observe when using state-of-the-art language models to create synthetic roleplaying datasets is a strong positivity bias. Since most state-of-the-art language models are aligned to be helpful and harmless (Bai et al., 2022), this naturally affects the generated stories in our datasets. When creating synthetic roleplaying datasets, we noticed that source models were aligned to values that don’t work well for writing interesting fiction.

For example, when doing simple sentiment analysis on synthetic data from several state-of-the-art language models, we observe a strong positivity bias. The below figure plots the distribution of sentiment across synthetic datasets from several sources. The median of the distribution is indicated as a dashed line while first and third quantile are indicated with dotted lines. Note how each data source varies in average and range of sentiment. This is one reason why we typically choose to generate synthetic data from multiple sources to increase diversity.

In addition to diversifying the sources of our synthetic data, we find that carefully constructed prompts are able to partially realign synthetic data sources to decrease positive bias and increase the emotional range of responses. To achieve these results in the below plot we use prompts such as:

Antagonistic characters should remain consistently difficult, cold, unpleasant or manipulative
Many roleplays should end with tension, conflict, or ambiguity unresolved
Do not create feel-good endings that contradict established character traits

Reducing Cliches with DPO

Just as human authorship can be identified by repeated linguistic features and phrases, language models tend to prefer certain phrases to the point of them becoming cliches. In the past, we have worked to mitigate AI cliches by diversifying data sources and cleaning known cliches from datasets. While these methods help, they do not completely solve the problem.

A popular finetuning strategy called direct preference optimization (DPO) (Rafailov et al., 2023), has proven successful in encouraging and discouraging specific model behaviors by leveraging preference data. Each preference in a DPO dataset includes both a good and bad example of an output. Whereas traditional supervised finetuning (SFT) optimizes a language model to increase the probability of every data point in a dataset, DPO simultaneously increases the probability of good outputs and decreases the probability of bad outputs relative to each other.

Utilizing a dataset by Durbin (2023), we apply DPO to decrease model cliches by pairing human written chapters of popular novels (good outputs) to chapters rewritten by an AI (bad outputs). We find doing so both increases player preference—as measured by side-by-side output comparisons—and decreases the frequency of cliches in model outputs. In the below plots, win rate measures the percentage of times players preferred output from our unreleased model over currently released models, and cliche rate is measures the percent of model outputs that contain common cliches.

Modeling Player Preferences

Many AI Dungeon players opt in to our “Improve the AI” feature which allows us to log anonymized story data. This includes a wealth of information regarding player behavior and preference expressed implicitly through interactions with AI Dungeon. One of the clearest signals into player preference is the retry button that allows a player to see an alternative output for their last turn. By pairing the initial model output that caused the player to click retry and the final output the player uses to continue the story, we can create powerful player preference datasets.

Unfortunately, our initial experiments into using retry preferences for DPO finetuning were unsuccessful. It seemed that the retry data was too noisy and inconsistent to finetune with DPO. More importantly, performance was poor unless we limited ourselves to only gathering preference data that was created by the model we were attempting to optimize. This was very limiting as it made optimization difficult for models still in development.

Enter reward models. Originally made popular for use on language models by reinforcement learning from human feedback (RLHF) (Ouyang et al., 2022), reward models train on preference datasets similar to DPO. However, reward models are trained to output a single number that represents output quality. Interestingly, reward models tend to generalize better than SFT and DPO. This means that we can train a single reward model on retry data from all of the models in AI Dungeon and end up with the ability to measure output quality for our models that are still in development.

By using a reward model to label good vs. bad outputs from models in development, we construct a new preference dataset that we can use for additional DPO finetuning. According to win rates from side-by-side output comparisons (along with additional internal testing), this final stage of reward DPO has a significant positive impact on the AI model

Conclusion

Our research journey has led to significant advances in AI roleplaying experiences that will directly translate to more compelling, diverse, and satisfying gameplay for you, our players. We look forward to sharing more research insights with you in the future as we continue to push for better experiences. You can expect to see new models trained with the approaches outlined here very soon.

References

Abdin, Marah, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree et al. "Phi-3 technical report: A highly capable language model locally on your phone." arXiv preprint arXiv:2404.14219 (2024).

Bai, Yuntao, Andy Jones, Kamal Ndousse, Amanda Askell, Anna Chen, Nova DasSarma, Dawn Drain et al. "Training a helpful and harmless assistant with reinforcement learning from human feedback." arXiv preprint arXiv:2204.05862 (2022).

Durbin, Jon. "Gutenberg-dpo-v0.1." Hugging Face, 2023.

Ouyang, Long, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang et al. "Training language models to follow instructions with human feedback." Advances in neural information processing systems 35 (2022): 27730-27744.

Rafailov, Rafael, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, and Chelsea Finn. "Direct Preference Optimization: Your Language Model Is Secretly a Reward Model." In Proceedings of the 37th International Conference on Neural Information Processing Systems, 53728-53741. Red Hook, NY: Curran Associates Inc., 2023.

The future of AI-generated games _