- 近期网站停站换新具体说明
- 按以上说明时间,延期一周至网站时间26-27左右。具体实施前两天会在此提前通知具体实施时间
主题:这几天大火的Deepseek没有人讨论吗 -- 俺本懒人
深度自己的介绍:
V3: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models.
R1 是基于 V3 的。一样是671B total parameters with 37B activated for each token.。而Meta的羊驼3也不过 15 trillion training tokens,和DeepSeek 基本相当。而训练结果是最大 405B total parameters ,8B 和 70B active。
DeepSeek 主要的突破有两个:
1. 降低了训练成本,缩短了训练时间
2.显示 AI 的思索过程
- 相关回复 上下关系8
压缩 4 层
🙂就像生成模型,创造力只管生成,判断归用户。形式逻辑问题很大 15 nobodyknowsI 字4216 2025-01-30 12:42:24
🙂很独特的视角,对我有启发 8 唐家山 字1177 2025-01-30 21:30:42
🙂DeepSeek还到不了逻辑的层次,依然是自然语言的层次 8 nobodyknowsI 字3141 2025-01-31 03:39:02
🙂DeepSeek 的模型可不小
🙂性能差不多小一半也是小。说说恶心到我的阿里通义2.5 2 nobodyknowsI 字1759 2025-02-01 03:01:45
🙂AI要到什么层次它才能输出人类未知的东西 1 贼不走空 字207 2025-01-31 04:49:08
🙂这就是标准的体制教士 1 胡辣汤 字764 2025-02-01 07:32:30
🙂你有什么问题要问AI吗? 贼不走空 字1137 2025-02-01 08:56:54