China AI Chipset: Huawei Ascend vs NVIDIA, 30B RMB Market
China's AI chip market reached approximately 30 billion RMB in 2025, driven by massive demand for training and inference hardware from China's generative AI boom. Huawei's Ascend 910C became the primary domestic alternative to NVIDIA's A100/H100, with Chinese tech giants including Baidu, Alibaba, Tencent, and ByteDance all deploying Ascend clusters. US export controls on advanced AI chips forced Chinese companies to stockpile NVIDIA GPUs while accelerating domestic alternatives. CAMBRICON's MLU370 and Moore Threads' MTT S4000 provide inference solutions, though training capability still lags NVIDIA by 2-3 years. China's AI chip self-sufficiency rate reached approximately 25% for inference but only 5% for large-scale training.
TL;DR
China AI chip market 30B RMB. Huawei Ascend 910C is the main NVIDIA alternative. Domestic self-sufficiency: 25% inference, 5% training. Chinese tech giants all deploying Ascend clusters. US sanctions accelerated domestic development but created training gap.
Key Insights
Huawei Ascend 910C
Huawei's Ascend 910C offers the closest domestic performance to NVIDIA A100 for AI training workloads. Chinese cloud providers (Huawei Cloud, Baidu AI Cloud) deployed over 100,000 Ascend 910C chips in 2025. Performance for large language model training reaches approximately 80% of A100 efficiency with software optimization. Huawei's CANN software stack matured significantly, supporting PyTorch and MindSpore frameworks.
US Sanctions Impact
US export controls banned NVIDIA A100/H100 and later L40S from China since October 2022. Chinese companies stockpiled approximately 100,000 high-end NVIDIA GPUs before and after sanctions. NVIDIA created China-specific H20 and L20 chips complying with export rules, offering reduced performance. The sanctions paradoxically accelerated Chinese domestic chip development by creating guaranteed demand.
CAMBRICON MLU Series
CAMBRICON's MLU370 achieved mass deployment in Chinese data centers for AI inference workloads, particularly for Baidu's ERNIE and other LLMs. The company shipped over 500,000 AI accelerator chips cumulatively. CAMBRICON's next-generation MLU590 targets training-level performance but faces software ecosystem challenges compared to NVIDIA CUDA.
Inference Market Growth
China achieved approximately 25% self-sufficiency in AI inference chips, with domestic options from Huawei (Ascend 310), CAMBRICON (MLU), and startup Enflame (Sue/S20). Inference chips are less demanding than training chips, making domestic substitution more feasible. Baidu reported 70% inference workload on domestic chips for ERNIE Bot service.
Side-by-Side Comparison
| AI Chip | Company | Process Node | FP16 Performance | Application | Availability |
|---|---|---|---|---|---|
| Ascend 910C | Huawei | 7nm (SMIC) | ~310 TFLOPS | Training | Mass production |
| Ascend 310P | Huawei | 12nm | ~70 TFLOPS | Inference | Mass production |
| MLU370 | CAMBRICON | 12nm | ~256 TFLOPS | Training/Inference | Mass production |
| MLU590 | CAMBRICON | 7nm | ~400 TFLOPS | Training | Sampling |
| MTT S4000 | Moore Threads | 7nm | ~200 TFLOPS | Inference | Limited |
| BI-V150 | Biren Tech | 7nm | ~300 TFLOPS | Training | Limited |
| Enflame S50 | Enflame | 7nm | ~280 TFLOPS | Training | Sampling |
| NVIDIA H20 | NVIDIA | 4nm | ~148 TFLOPS | Training (limited) | Available (China-only) |
Frequently Asked Questions
Huawei Ascend chips cannot yet fully replace NVIDIA GPUs for AI training, though the gap is narrowing for specific workloads: performance gap, the Ascend 910C achieves approximately 80% of NVIDIA A100 performance on optimized workloads, but lags behind H100 by a wider margin of approximately 50%; software ecosystem, this remains the biggest challenge as NVIDIA CUDA has 15+ years of developer ecosystem advantage, while Huawei CANN and MindSpore are relatively new and have a smaller developer community; framework support, Ascend supports PyTorch and TensorFlow through adapters but with performance overhead, while native MindSpore achieves best performance but requires code migration; cluster scaling, Huawei has demonstrated 10,000+ chip clusters matching NVIDIA's DGX SuperPOD for Baidu ERNIE training, but debugging and monitoring tools lag behind NVIDIA's solutions; and practical deployment, major Chinese companies have shifted 40-70% of new AI workloads to Ascend, but critical training runs still rely on stockpiled NVIDIA GPUs. Most industry estimates suggest Huawei will achieve rough parity for training by 2027-2028.