Seed 1.6 uses a sparse Mixture-of-Experts (MoE) architecture with 23B active parameters drawn from 230B total. It builds on MoE design principles from ByteDance's Seed 1.5 research. Only the parameter subset most relevant to each token activates during inference, which keeps costs closer to a 23B dense model than a 230B one.
Seed 1.6 offers three reasoning modes. FullCoT enables extended chain-of-thought for demanding analytical problems. NoCoT produces direct completions without a visible reasoning trace, suited to latency-sensitive applications. AdaCoT, the adaptive mode, decides whether reasoning tokens are warranted based on inferred question difficulty. It saves compute on simple queries while engaging deeper reflection when the problem demands it. A parallel decoding enhancement also generates additional thinking tokens without retraining, yielding eight-point improvements on the BeyondAIME benchmark.
The context window of 256K tokens supports long-document processing and multimodal input. On China's 2025 Gaokao, Seed 1.6 scored 683/750 in humanities (ranked first) and 648/750 in science (ranked second), rising to 676/750 with higher-resolution images. On India's JEE Advanced entrance exam, the model placed in the top 10 with 100% accuracy in math across all five sampling rounds.
See https://console.byteplus.com/ark/region:ark+ap-southeast-1/model/detail?Id=seed-1-6 for the full write-up, tables, and methodology.