Post

gcc的optimize flags

简单的记录一下gcc的优化选项,以及一些细节。

正常情况下,能选择开/关的编译器优化,只有有符号的哪些

你可以通过

1
2
3
gcc -c -Q -O3 --help=optimizers > /tmp/O3-opts
gcc -c -Q -O2 --help=optimizers > /tmp/O2-opts
diff /tmp/O2-opts /tmp/O3-opts | grep enabled

看哪些优化被开启了,O2和O3的区别。

要注意一点,debug的编译尽量使用-Og或者使用-O1 or -O0, 不要让inline进入到你的debug编译,这样的坏处是断点的时候会出现很奇怪的跳转,代码对不准,具体分析问题可能要看汇编了

默认优化-O0

O0是默认的优化选项,理论上是不进行任何优化,但是在查阅资料之后发现也有一些优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
  -faggressive-loop-optimizations       [enabled]
  -fallocation-dce                      [enabled]
  -fasynchronous-unwind-tables          [enabled]
  -fauto-inc-dec                        [enabled]
  -fbit-tests                           [enabled]
  -fdce                                 [enabled]
  -fearly-inlining                      [enabled]
  -ffp-int-builtin-inexact              [enabled]
  -ffunction-cse                        [enabled]
  -fgcse-lm                             [enabled]
  -finline-atomics                      [enabled]
  -fipa-stack-alignment                 [enabled]
  -fipa-strict-aliasing                 [enabled]
  -fira-hoist-pressure                  [enabled]
  -fira-share-save-slots                [enabled]
  -fira-share-spill-slots               [enabled]
  -fivopts                              [enabled]
  -fjump-tables                         [enabled]
  -flifetime-dse                        [enabled]
  -fmath-errno                          [enabled]
  -fpeephole                            [enabled]
  -fplt                                 [enabled]
  -fprintf-return-value                 [enabled]
  -freg-struct-return                   [enabled]
  -fsched-critical-path-heuristic       [enabled]
  -fsched-dep-count-heuristic           [enabled]
  -fsched-group-heuristic               [enabled]
  -fsched-interblock                    [enabled]
  -fsched-last-insn-heuristic           [enabled]
  -fsched-rank-heuristic                [enabled]
  -fsched-spec                          [enabled]
  -fsched-spec-insn-heuristic           [enabled]
  -fsched-stalled-insns-dep             [enabled]
  -fschedule-fusion                     [enabled]
  -fsemantic-interposition              [enabled]
  -fshort-enums                         [enabled]
  -fshrink-wrap-separate                [enabled]
  -fsigned-zeros                        [enabled]
  -fsplit-ivs-in-unroller               [enabled]
  -fssa-backprop                        [enabled]
  -fstdarg-opt                          [enabled]
  -ftrapping-math                       [enabled]
  -ftree-forwprop                       [enabled]
  -ftree-loop-im                        [enabled]
  -ftree-loop-ivcanon                   [enabled]
  -ftree-loop-optimize                  [enabled]
  -ftree-phiprop                        [enabled]
  -ftree-reassoc                        [enabled]
  -ftree-scev-cprop                     [enabled]
  -funreachable-traps                   [enabled]
  -funwind-tables                       [enabled]

-O1优化

简单的看下-O1的描述

1
Optimize. Optimizing compilation takes somewhat more time, and a lot more memory for a large function. With -O, the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.

实际进行下面的优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
  -faggressive-loop-optimizations       [enabled]
  -fallocation-dce                      [enabled]
  -fasynchronous-unwind-tables          [enabled]
  -fauto-inc-dec                        [enabled]
  -fbit-tests                           [enabled]
  -fbranch-count-reg                    [enabled]
  -fcombine-stack-adjustments           [enabled]
  -fcompare-elim                        [enabled]
  -fcprop-registers                     [enabled]
  -fdce                                 [enabled]
  -fdefer-pop                           [enabled]
  -fdse                                 [enabled]
  -fearly-inlining                      [enabled]
  -fforward-propagate                   [enabled]
  -ffp-int-builtin-inexact              [enabled]
  -ffunction-cse                        [enabled]
  -fgcse-lm                             [enabled]
  -fguess-branch-probability            [enabled]
  -fif-conversion                       [enabled]
  -fif-conversion2                      [enabled]
  -finline                              [enabled]
  -finline-atomics                      [enabled]
  -finline-functions-called-once        [enabled]
  -fipa-modref                          [enabled]
  -fipa-profile                         [enabled]
  -fipa-pure-const                      [enabled]
  -fipa-reference                       [enabled]
  -fipa-reference-addressable           [enabled]
  -fipa-stack-alignment                 [enabled]
  -fipa-strict-aliasing                 [enabled]
  -fira-hoist-pressure                  [enabled]
  -fira-share-save-slots                [enabled]
  -fira-share-spill-slots               [enabled]
  -fivopts                              [enabled]
  -fjump-tables                         [enabled]
  -flifetime-dse                        [enabled]
  -fmath-errno                          [enabled]
  -fmove-loop-invariants                [enabled]
  -fmove-loop-stores                    [enabled]
  -fomit-frame-pointer                  [enabled]
  -fpeephole                            [enabled]
  -fplt                                 [enabled]
  -fprintf-return-value                 [enabled]
  -freg-struct-return                   [enabled]
  -freorder-blocks                      [enabled]
  -fsched-critical-path-heuristic       [enabled]
  -fsched-dep-count-heuristic           [enabled]
  -fsched-group-heuristic               [enabled]
  -fsched-interblock                    [enabled]
  -fsched-last-insn-heuristic           [enabled]
  -fsched-rank-heuristic                [enabled]
  -fsched-spec                          [enabled]
  -fsched-spec-insn-heuristic           [enabled]
  -fsched-stalled-insns-dep             [enabled]
  -fschedule-fusion                     [enabled]
  -fsemantic-interposition              [enabled]
  -fshort-enums                         [enabled]
  -fshrink-wrap                         [enabled]
  -fshrink-wrap-separate                [enabled]
  -fsigned-zeros                        [enabled]
  -fsplit-ivs-in-unroller               [enabled]
  -fsplit-wide-types                    [enabled]
  -fssa-backprop                        [enabled]
  -fssa-phiopt                          [enabled]
  -fstdarg-opt                          [enabled]
  -fthread-jumps                        [enabled]
  -ftoplevel-reorder                    [enabled]
  -ftrapping-math                       [enabled]
  -ftree-bit-ccp                        [enabled]
  -ftree-builtin-call-dce               [enabled]
  -ftree-ccp                            [enabled]
  -ftree-ch                             [enabled]
  -ftree-coalesce-vars                  [enabled]
  -ftree-copy-prop                      [enabled]
  -ftree-dce                            [enabled]
  -ftree-dominator-opts                 [enabled]
  -ftree-dse                            [enabled]
  -ftree-forwprop                       [enabled]
  -ftree-fre                            [enabled]
  -ftree-loop-im                        [enabled]
  -ftree-loop-ivcanon                   [enabled]
  -ftree-loop-optimize                  [enabled]
  -ftree-phiprop                        [enabled]
  -ftree-pta                            [enabled]
  -ftree-reassoc                        [enabled]
  -ftree-scev-cprop                     [enabled]
  -ftree-sink                           [enabled]
  -ftree-slsr                           [enabled]
  -ftree-sra                            [enabled]
  -ftree-ter                            [enabled]
  -funwind-tables                       [enabled]

-O2优化

O2相较于O1,他的描述显得激进了一点

1
Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to -O, this option increases both compilation time and the performance of the generated code.

在O1的基础上, O2还做了下面的这些优化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
<   -falign-functions                           [enabled]
<   -falign-jumps                               [enabled]
<   -falign-labels                              [enabled]
<   -falign-loops                               [enabled]
<   -fcaller-saves                              [enabled]
<   -fcode-hoisting                             [enabled]
<   -fcrossjumping                              [enabled]
<   -fcse-follow-jumps                          [enabled]
<   -fdevirtualize                              [enabled]
<   -fdevirtualize-speculatively                [enabled]
<   -fexpensive-optimizations                   [enabled]
<   -fgcse                                      [enabled]
<   -fhoist-adjacent-loads                      [enabled]
<   -findirect-inlining                         [enabled]
<   -finline-functions                          [enabled]
<   -finline-small-functions                    [enabled]
<   -fipa-bit-cp                                [enabled]
<   -fipa-cp                                    [enabled]
<   -fipa-icf                                   [enabled]
<   -fipa-icf-functions                         [enabled]
<   -fipa-icf-variables                         [enabled]
<   -fipa-ra                                    [enabled]
<   -fipa-sra                                   [enabled]
<   -fipa-vrp                                   [enabled]
<   -fisolate-erroneous-paths-dereference       [enabled]
<   -flra-remat                                 [enabled]
<   -foptimize-sibling-calls                    [enabled]
<   -foptimize-strlen                           [enabled]
<   -fpartial-inlining                          [enabled]
<   -fpeephole2                                 [enabled]
<   -free                                       [enabled]
<   -freorder-blocks-and-partition              [enabled]
<   -freorder-functions                         [enabled]
<   -frerun-cse-after-loop                      [enabled]
<   -fschedule-insns2                           [enabled]
<   -fstore-merging                             [enabled]
<   -fstrict-aliasing                           [enabled]
<   -ftree-loop-distribute-patterns             [enabled]
<   -ftree-loop-vectorize                       [enabled]
<   -ftree-pre                                  [enabled]
<   -ftree-slp-vectorize                        [enabled]
<   -ftree-switch-conversion                    [enabled]
<   -ftree-tail-merge                           [enabled]
<   -ftree-vrp                                  [enabled]
<   -funroll-loops                              [enabled]

-O3优化

再额外扩充一下

1
2
3
4
5
6
7
8
9
10
11
12
13
>   -fgcse-after-reload                         [enabled]
>   -fipa-cp-clone                              [enabled]
>   -floop-interchange                          [enabled]
>   -floop-unroll-and-jam                       [enabled]
>   -fpeel-loops                                [enabled]
>   -fpredictive-commoning                      [enabled]
>   -fsplit-loops                               [enabled]
>   -fsplit-paths                               [enabled]
>   -ftree-loop-distribution                    [enabled]
>   -ftree-partial-pre                          [enabled]
>   -funroll-completely-grow-size               [enabled]
>   -funswitch-loops                            [enabled]
>   -fversion-loops-for-strides                 [enabled]

剩下的哪些优化选项就自己后面再看了

REF

  1. Options That Control Optimization
  2. Options Controlling the Kind of Output
This post is licensed under CC BY 4.0 by the author.

Trending Tags