CANN/mat-chem-sim-pred：SOPDT批量PID候选评分算法-尧图建网站

PidSopdtBatchRolloutScore Algorithm【免费下载链接】mat-chem-sim-pred面向工业领域聚焦计算仿真、预测两大核心场景构建面向流程工业机理数据双轮驱动的领域计算层推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-predPurposeThis operator evaluates many PID candidates for many SOPDT loops during the tuning stage and returns the best candidate for each loop.The target workload is:batch loops x candidate set x rollout time stepsModelThe plant model is discretized SOPDT (second-order plus dead time):y[k1] a1 * y[k] a2 * y[k-1] b * u[k-delay]Versus the FOPDT recurrencey[k1] a*y[k] b*u[k-delay], SOPDT keeps one extra output-history statey[k-1]and a second coefficienta2, which lets it represent over/critically/under-damped second-order responses. For a plant built from two stable real lags with polesp1, p2,a1 p1p2,a2 -p1*p2andb K*(1-p1)*(1-p2). Everything else (PID law, scoring, candidate-axis SIMD, delay ring, tiling) is identical toPidFopdtBatchRolloutScore.The PID law is:e[k] sp - y[k] integral e[k] * dt derivative (e[k] - e[k-1]) / dt u[k] clamp(Kp * e[k] Ki * integral Kd * derivative, -10, 10)ScoreFor each candidate, the rollout accumulates:IAEISEovershootsettling_timecontrol_energyThe optimization target is:score IAE overshoot_weight * overshoot settling_weight * settling_time control_weight * control_energyThe operator returns the candidate with minimumscore.NPU Execution StrategyThe current implementation uses a two-stage tiled structure:host splits the candidate axis into tileslocal kernel evaluates one tile for all assigned loops and writes partial best resultsfinal kernel reduces all tile-local best results into one best result per loopThis structure was chosen because the earlier single-launch(loop, tile)task mapping showed unstable coverage onnode202. The current host-per-tile launch plus conservative loop-range partitioning restores correctness.Kernel difference from FOPDTThe SOPDT kernel adds one state vectory_prev(y[k-1], placed in the previously-unused scratch block 11, so the UB budget andkLane768are unchanged) and reads two coefficients (a1,a2) instead of one. The state update becomes:y_new a1*y a2*y_prev b*u[k-delay] y_prev y y y_newThis costs ~2 extra vector ops per timestep (oneMuls oneAdd) versus FOPDT; the delay ring, reduction and scoring are unchanged.VectorizationThe rollout time dimension is a serial recurrence (y[k1]depends ony[k]) and cannot be turned into GEMM-style dense math without dropping the per-step nonlinearities (control clamp) and the nonlinear score functionals (IAE/ISE/overshoot/settling), so the kernel keeps the exact step-by-step recurrence.The parallelism instead lives on the candidate axis: every timestep applies the same chain of vector ops to all candidates at once. Because the recurrence is serial, that chain of dependent vector ops cannot be pipelined across timesteps, so with a narrow lane the inner loop is bound by per-instruction issue/latency rather than by compute throughput. The kernel therefore evaluates the candidate axis with a wide SIMD lane (kLane768): more candidates per vector instruction means fewer instructions for the same work, which amortises the fixed instruction latency and makes the loop throughput-bound.kLane768is the largest lane that keeps the 8 state vectors scratch the 32-slot delay ring (delay spec0..31) I/O queues within the 192 KB UB budget. Widening the lane is a pure layout change and leaves the output bit-identical.Engineering ConclusionThis operator is valuable as:an independent PID tuning operator samplea correctness-verified NPU exploration artifact (NPU output matches the CPU reference, quality rel-err 1e-3)a single-card rollout that reuses the FOPDT wide-lane plus fused inner-loop optimizationsThe inner loop was also reduced from ~37 to ~32 vector ops per timestep by reusing the response error as the next steps error and by folding the non-feedback metric accumulators (IAE/ISE/control energy) into fused multiply-accumulates; this is bit-identical to the original. The remaining single-card headroom is a cheaper settling reduction; multi-card data parallelism scales the absolute time further but is a hardware lever, not a single-card algorithmic speedup.【免费下载链接】mat-chem-sim-pred面向工业领域聚焦计算仿真、预测两大核心场景构建面向流程工业机理数据双轮驱动的领域计算层推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-pred创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

相关新闻

基于YOLOv10的水下目标识别系统设计与优化

Leaps API开发入门：将实时协作功能集成到你自己的应用中的实用指南

毕业设计项目 深度学习异常流量检测系统（算法+论文）

最新新闻

突破传统：如何在TrueNAS Scale上30分钟搭建高性能Minecraft Forge服务器

如何快速部署AI交易系统：面向初学者的完整多智能体金融交易框架教程

HsMod：炉石传说终极增强插件完全指南 - 从痛点解决到高级配置

如何快速构建企业级数据库连接平台：MCP Toolbox 5分钟终极指南

炉石传说HsMod插件终极指南：55个功能让你完全掌控游戏体验

DesignCon 2026：高速数字设计与信号完整性前沿技术解析

日新闻

SSRF漏洞攻防实战：从原理到绕过技巧与防御策略

Playwright自动化测试实战：从零搭建现代Web测试框架

Zip炸弹漏洞剖析：从GuardDog安全工具瘫痪看文件解压的资源耗尽攻击与防御

周新闻

管理者的六个层次

华为OD机试2025C卷-座位调整[100分]（ Java _ Python3 _ C++ _ C语言 _ JsNode _ Go）实现100%通过率

CrabCode v1.0.7与v1.0.8 更新速览！

月新闻

FAE放射组学分析工具：医学影像特征探索的完整解决方案

基于Dify与DeepSeek构建私有知识库问答系统实战指南

餐饮老板必看：扫码点餐小程序3步搞定，别再让顾客干等了！

毕业设计项目深度学习异常流量检测系统（算法+论文）