掃碼下載
BTC $77,393.82 +3.55%
ETH $2,422.19 +3.93%
BNB $645.07 +2.47%
XRP $1.48 +3.46%
SOL $89.06 +1.38%
TRX $0.3271 +0.16%
DOGE $0.0995 +1.77%
ADA $0.2587 +1.52%
BCH $455.64 +0.98%
LINK $9.61 +2.17%
HYPE $45.08 +3.66%
AAVE $115.83 +1.98%
SUI $0.9970 +1.17%
XLM $0.1741 +4.68%
ZEC $336.28 +0.72%
BTC $77,393.82 +3.55%
ETH $2,422.19 +3.93%
BNB $645.07 +2.47%
XRP $1.48 +3.46%
SOL $89.06 +1.38%
TRX $0.3271 +0.16%
DOGE $0.0995 +1.77%
ADA $0.2587 +1.52%
BCH $455.64 +0.98%
LINK $9.61 +2.17%
HYPE $45.08 +3.66%
AAVE $115.83 +1.98%
SUI $0.9970 +1.17%
XLM $0.1741 +4.68%
ZEC $336.28 +0.72%

DeepSeek 推出 NSA,用於超快速的長上下文訓練和推理

2025-02-18 16:37:45
收藏

ChainCatcher 消息,据金十報導,DeepSeek 推出 NSA。

DeepSeek 稱,NSA 是一種與硬體一致且本機可訓練的稀疏注意力機制,用於超快速的長上下文訓練和推理。通過針對現代硬體的優化設計,NSA 加快了推理速度,同時降低了預訓練成本,而不會影響性能。

在一般基準測試、長上下文任務和基於指令的推理上,它的表現與完全注意力模型相當甚至更好。

關聯標籤
關聯標籤
app_icon
ChainCatcher 與創新者共建Web3世界