Announcement_10
One model DeepSeek-V2, a 236B Mixture-of-Experts language model, has been released. It offers stronger performance with 42.5% lower training costs and 93.3% reduced KV cache. Available for research and commercial use.
One model DeepSeek-V2, a 236B Mixture-of-Experts language model, has been released. It offers stronger performance with 42.5% lower training costs and 93.3% reduced KV cache. Available for research and commercial use.