本章小结

CPU 前端优化摘要如表 CPU_FE_OPT 所示。

Transform How transformed? Why helps? Works best for Done by
Basic block placement maintain fall through hot code not taken branches are cheaper; better cache utilization any code, especially with a lot of branches compiler
Basic block alignment shift the hot code using NOPs better cache utilization hot loops compiler
Function splitting split cold blocks of code and place them in separate functions better cache utilization functions with complex CFG when there are big blocks of cold code between hot parts compiler
Function reorder group hot functions together better cache utilization many small hot functions linker

表:CPU 前端优化摘要。

  • 代码布局改进常常被低估和忽视。I-cache 和 ITLB 缺失等 CPU 前端性能问题占据了大量浪费的周期,尤其对于代码量庞大的应用程序。但即使是中小型应用程序也可以从优化机器码布局中受益。
  • 如果能为应用程序提供一组典型使用场景,通常最好的选择是使用 LTO、PGO、BOLT 及类似工具来改善代码布局。对于大型应用程序,这是唯一实用的选择。

results matching ""

    No results matching ""