Your Data is Now Forever Baked into Their Model

“Trying to remove training data once it has been baked into a large language model is like trying to remove sugar once it has been baked into a cake. You’ll have to trash the cake and start over.” - Cassie Kozyrkov

What happens when your enterprise data trains someone else's model?

Data extraction is an after (or non) thought by many, but its effects matter - especially as companies weigh “owning” versus “renting” models: using public models, shared vendor models or building AI in-house. The build vs. buy debate has a new twist with AI because once a model is trained with a data set, there’s no removing it. The model simply doesn’t forget.

Should enterprises even care that models they don’t own cannot unlearn their data? It really comes down to the value they assign to their proprietary data and their own intelligence capital. If a company believes that its data gives it a competitive edge, then this knowledge transfer should give them pause.

With large language models (LLMs), there’s no point in building your own model unless you’ve got spare billions to burn and/or plan to rent your LLM to others, extracting value via payments or data (or both). Tuning public (or closed) models internally to your specific business needs is valuable as long as you can ensure your data isn’t leaking out.

On the other hand, machine learning (ML) decision or prediction models are currently much more accessible in terms of cost and resources. Here, the business decision is: Do you believe your data gives you a competitive advantage? If so, should you share that advantage with others?

For example, say an enterprise has proprietary data that could used to calculate the probability of customer delinquency accurately. That data is an edge, insights competitors may lack.If our enterprise believes building an in-house model isn’t worth the effort, it can choose to put its data in a vendor’s shared model to get knowledge from its own data set - renting the model. However, once their data enters this shared model, it enriches that 3rd party long after the relationship ends. Our enterprise cannot “extract” or get back the insights derived from their data.

When you plug your data into a shared model, you're not just getting insights - you are giving them away.

If your business has unique data or knowledge, you may not want to share those ingredients with others. This is especially true for ML models, which are becoming “just add eggs and water” levels of ease. Make no mistake: the future will belong to enterprises that train on proprietary or massively large data pools. Since building your own LLM is likely off the table, what AI components should enterprises own to protect their recipe for success?After all, when it comes to your proprietary data, you cannot un-bake this AI cake.

Ready to Build AI That Delivers?

Savvi helps payment providers, banks, and fintechs launch secure, autoscaling AI apps and agents—without massive teams or infrastructure headaches.