CANN二阶PID批量评估基准
PidSopdtBatchRolloutScore Benchmark Report【免费下载链接】mat-chem-sim-pred面向工业领域聚焦计算仿真、预测两大核心场景构建面向流程工业机理数据双轮驱动的领域计算层推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-predThis document records the measured CPU/NPU behavior ofPidSopdtBatchRolloutScore.EnvironmentNPU host:node202Device:Ascend910B3, device id0CANN:/usr/local/Ascend/ascend-toolkit/latestCPU baseline: benchmark program multi-thread modeBuild:-DCMAKE_BUILD_TYPERelease -DSOC_VERSIONAscend910B3 -DRUN_MODEnpuMethodThebenchmark_pid_sopdt_batch_rollout_score_aclnnprogram builds an in-process multi-thread CPU reference (ComputeRange, the same second-order recurrencey[k1] a1*y[k] a2*y[k-1] b*u[k-delay]), runs the NPU operator on the same inputs and reportsmax_abs_err,max_quality_rel_errandbest_idx_diff_count. The pass conditions arenpu_zero_score_count 0, per-candidate scores matching the CPU reference to float32 precision, and anybest_idxdifferences being near-ties (the chosen candidates metric rel-err stays small).The NPU state update is emitted in the same summation order as the CPU reference ((a1*y a2*y_prev) b*u) so the long-horizon second-order recurrence stays float-aligned with it.CorrectnessThe SOPDT kernel differs from the verified FOPDT kernel only by adding one history statey[k-1]and one coefficient. The candidate-axis SIMD width does not change the numerics (each tile is independent).Measured onnode202 / Ascend910B3, B128, sim_steps1024, candidate_tileC,npu_zero_score_count0:candidatesmax_abs_errmax_quality_rel_errbest_idx_diff_count10242.4e-34.7e-5040962.01.06e-26163841.09.2e-314At 1024 candidates the NPU output is essentially exact (rel-err4.7e-5). At 4096 the candidate grid samples the sameKp/Ki/Kdrange four times denser, so near the optimum many adjacent candidates have near-equal scores; float-rounding then flips the arg-min for a few loops (best_idx_diff_count6, growing to 14 at 16384 as the grid gets denser). This is a near-tie effect, not a trajectory error: it persists at short horizons (e.g.sim_steps128-best_idx_diff6,max_quality_rel_err5.3e-3), and the second-order oscillatory dynamics make SOPDT a bit more rounding- sensitive than the first-order reference (FOPDT showsmax_quality_rel_err4.5e-3, best_idx_diff1at 4096). Themax_abs_err(1-2) is again the discrete settling-time metric differing by 1-2 samples.Measured timingnode202 / Ascend910B3, B128, sim_steps1024, candidate_tileC, CPU 64-thread parallel reference.candidatesCPU parallel msNPU kernel msNPU kernel vs CPU102442.88.495.04x4096143.828.45.07x16384556.8107.55.18xAgainst a 192-thread CPU reference the speedup is 4.1-4.4x (the wider CPU pool narrows the gap).NotesThe kernel reuses the FOPDT wide-lane (kLane768) and fused inner-loop optimizations unchanged; the only algorithmic difference is the second-order recurrence, which adds ~2 vector ops per timestep and one extra state vector (placed in the previously-unused scratch block, sokLane768still fits the 192 KB UB budget).【免费下载链接】mat-chem-sim-pred面向工业领域聚焦计算仿真、预测两大核心场景构建面向流程工业机理数据双轮驱动的领域计算层推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-pred创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考