A foundation model is a large AI model trained on massive, diverse datasets that can be adapted to a wide range of downstream tasks. The term was coined by researchers at Stanford in 2021 to describe a shift in AI development: rather than training a separate model for each specific task, organizations build on top of a single powerful base model and specialize it through prompting, fine-tuning, or tool integration.
The most widely used foundation models are large language models like Claude, GPT-4, Gemini, and Llama, though the concept also applies to models for images, audio, and video. What makes them "foundation" models is that they were not trained for any single purpose but for general capability, which makes them useful as a starting point for building almost any AI application.
For organizations building AI products, the choice of foundation model is a significant architectural decision. Factors include capability on the tasks that matter most to the product, pricing per token, latency, context window size, safety and alignment properties, data handling and privacy agreements, and the provider's roadmap and reliability. Most enterprise deployments use foundation models via API rather than training or hosting their own, which reduces cost and complexity considerably.
Fine-tuning a foundation model, adapting it further on domain-specific data, is sometimes appropriate when prompting alone cannot achieve the required performance. But fine-tuning is more expensive, requires curated training data, and creates a model that must be separately maintained as the base model evolves. Most organizations achieve strong results through prompt engineering and retrieval before concluding that fine-tuning is necessary.