Floating point computing power
Publish: 2021-03-25 11:38:34
1. When we use different computers to calculate pi, we will find that one computer's calculation is more accurate than the other. Or when we are playing a gunfight game, when a bullet hits the wall, a wall will peel off. The performance of the same scene on a computer may be very rigid and artificial; And on another computer, it will be very vivid, even close to what we see in reality
everything we have seen above comes from the "floating point operation function" added inside the CPU. Floating point operation ability is an important index of multimedia and 3D graphics processing related to CPU. There are only two floating-point execution units in P4, and one of them has to process FADD at the same time
everything we have seen above comes from the "floating point operation function" added inside the CPU. Floating point operation ability is an important index of multimedia and 3D graphics processing related to CPU. There are only two floating-point execution units in P4, and one of them has to process FADD at the same time
2. When we use different computers to calculate pi, we will find that one computer's calculation is more accurate than the other. Or when we are playing a gunfight game, when a bullet hits the wall, a wall will peel off. The performance of the same scene on a computer may be very rigid and artificial; And on another computer, it will be very vivid, even close to what we see in reality
everything we have seen above comes from the "floating point operation function" added inside the CPU. Floating point operation ability is an important index of multimedia and 3D graphics processing related to CPU. There are only two floating-point execution units in P4, and one of them has to process FADD at the same time
everything we have seen above comes from the "floating point operation function" added inside the CPU. Floating point operation ability is an important index of multimedia and 3D graphics processing related to CPU. There are only two floating-point execution units in P4, and one of them has to process FADD at the same time
3. Both operations are processing operations
in the beginning, there was no floating-point unit in the CPU, only the integral point unit. At that time, the machine could only handle the integral point operation easily, but if you want to calculate the floating-point operation, you need the program (software) on the computer to calculate, and the hardware of the computer can't handle it. Or you can buy a floating-point unit from the market and plug it into your host computer as a peripheral to supplement the CPU itself
at that time, the reason why floating-point arithmetic was not integrated in CPU was e to the influence of price and technology. Now, with the development of technology, the price of chip is getting cheaper and cheaper, floating point and integral point have been integrated in the CPU. And their computing speed is faster and faster, and the function of processing data is more powerful
I just heard it from the teacher in class, and I hope there is a real in-depth understanding of the supplement.
in the beginning, there was no floating-point unit in the CPU, only the integral point unit. At that time, the machine could only handle the integral point operation easily, but if you want to calculate the floating-point operation, you need the program (software) on the computer to calculate, and the hardware of the computer can't handle it. Or you can buy a floating-point unit from the market and plug it into your host computer as a peripheral to supplement the CPU itself
at that time, the reason why floating-point arithmetic was not integrated in CPU was e to the influence of price and technology. Now, with the development of technology, the price of chip is getting cheaper and cheaper, floating point and integral point have been integrated in the CPU. And their computing speed is faster and faster, and the function of processing data is more powerful
I just heard it from the teacher in class, and I hope there is a real in-depth understanding of the supplement.
4. I don't know if you can open these two foreign web pages. These are the two pictures I got from Google http://www.crunchgear.com/2008/02/25/gpu-programming-now-on-osx/
http://www.tacc.utexas.e/research/users/features/dragon.php
GPU has strong computing power mainly because most of its circuits are arithmetic units, In fact, adders and multipliers are relatively small circuits, even if they do many such operation units, they will not occupy too much chip area. And because other parts of GPU occupy a small area, it can also have more registers and caches to store data. On the one hand, CPU is so slow because it has a large number of units for processing other programs, such as branch loops, and because CPU processing requires a certain degree of flexibility, the structure of arithmetic logic unit of CPU is also much more complex. In short, in order to improve the processing speed of branch instructions, many components of CPU are used to do branch prediction, and correct and recover the results of Alu when the branch prediction error occurs. These greatly increase the complexity of the device
in addition, the current CPU design is also learning from GPU, that is, adding floating-point operation units with parallel computing and not so many control structures. For example, Intel's SSE Instruction set can perform four floating-point operations at the same time, and many registers are added.
http://www.tacc.utexas.e/research/users/features/dragon.php
GPU has strong computing power mainly because most of its circuits are arithmetic units, In fact, adders and multipliers are relatively small circuits, even if they do many such operation units, they will not occupy too much chip area. And because other parts of GPU occupy a small area, it can also have more registers and caches to store data. On the one hand, CPU is so slow because it has a large number of units for processing other programs, such as branch loops, and because CPU processing requires a certain degree of flexibility, the structure of arithmetic logic unit of CPU is also much more complex. In short, in order to improve the processing speed of branch instructions, many components of CPU are used to do branch prediction, and correct and recover the results of Alu when the branch prediction error occurs. These greatly increase the complexity of the device
in addition, the current CPU design is also learning from GPU, that is, adding floating-point operation units with parallel computing and not so many control structures. For example, Intel's SSE Instruction set can perform four floating-point operations at the same time, and many registers are added.
5. Linx software can be used to test the floating-point computing ability
after Linx software is opened, select the calculation scale, memory usage and running times
recommended settings:
al core computing scale: 4000 runs: 1 ~ 2
four core computing scale: 8000 runs: 3
eight core computing scale: 10000 runs: 3
floating point unit description: a gflops (gigaflops) is equal to 1 billion (= 10 ^ 9) floating point operations per second
after Linx software is opened, select the calculation scale, memory usage and running times
recommended settings:
al core computing scale: 4000 runs: 1 ~ 2
four core computing scale: 8000 runs: 3
eight core computing scale: 10000 runs: 3
floating point unit description: a gflops (gigaflops) is equal to 1 billion (= 10 ^ 9) floating point operations per second
6. First of all, "the difference in speed mainly comes from the difference in architecture" is a superficial explanation. Yes, the architecture is different. But is this difference determined by the current situation of the choice of various manufacturers, or by the nature of the reasons? Can CPU add cores? Why doesn't GPU need cache
first of all, can CPU remove cache like GPU? no way. There are two key factors for GPU to get rid of cache: the particularity of data (high alignment, pipeline processing, not conforming to localization assumption, rarely writing back data), and high speed bus. For the latter problem, CPU is subject to the backward data bus standard, which can be changed in theory. For the former problem, it is very difficult to solve in theory. Because the CPU to provide versatility, it can not limit the type of processing data. That's why GPU can never replace CPU
secondly, can CPU add many cores? no way. First, the cache takes up the area. Secondly, the CPU needs to increase the complexity of each core in order to maintain cache consistency. In addition, in order to make better use of cache and deal with data that are not aligned and need a lot of write back, CPU needs complex optimization (branch prediction, out of order execution, and some vectorization instructions and long pipeline simulating GPU). Therefore, the complexity of a CPU core is much higher than that of GPU, and the cost is higher (not that the etching cost is high, but the complexity reces the film rate, so the final cost will be high). So CPU can't add core like GPU
as for the control ability, the current situation of GPU is worse than CPU, but it is not an essential problem. However, control like recursion is not suitable for highly aligned and pipeline processed data, which is essentially a data problem.
first of all, can CPU remove cache like GPU? no way. There are two key factors for GPU to get rid of cache: the particularity of data (high alignment, pipeline processing, not conforming to localization assumption, rarely writing back data), and high speed bus. For the latter problem, CPU is subject to the backward data bus standard, which can be changed in theory. For the former problem, it is very difficult to solve in theory. Because the CPU to provide versatility, it can not limit the type of processing data. That's why GPU can never replace CPU
secondly, can CPU add many cores? no way. First, the cache takes up the area. Secondly, the CPU needs to increase the complexity of each core in order to maintain cache consistency. In addition, in order to make better use of cache and deal with data that are not aligned and need a lot of write back, CPU needs complex optimization (branch prediction, out of order execution, and some vectorization instructions and long pipeline simulating GPU). Therefore, the complexity of a CPU core is much higher than that of GPU, and the cost is higher (not that the etching cost is high, but the complexity reces the film rate, so the final cost will be high). So CPU can't add core like GPU
as for the control ability, the current situation of GPU is worse than CPU, but it is not an essential problem. However, control like recursion is not suitable for highly aligned and pipeline processed data, which is essentially a data problem.
7. The main reason for GPU's strong computing power is that most of its circuits are arithmetic units. In fact, adders and multipliers are relatively small circuits. Even if they do many such operation units, they will not occupy too much chip area. And because other parts of GPU occupy a small area, it can also have more registers and caches to store data. On the one hand, CPU is so slow because it has a large number of units for processing other programs, such as branch loops, and because CPU processing requires a certain degree of flexibility, the structure of arithmetic logic unit of CPU is also much more complex. In short, in order to improve the processing speed of branch instructions, many components of CPU are used to do branch prediction, and correct and recover the results of Alu when the branch prediction error occurs. These greatly increase the complexity of the device
in addition, the current CPU design is also learning from GPU, that is, adding floating-point operation units with parallel computing and not so many control structures. For example, Intel's SSE Instruction set can perform four floating-point operations at the same time, and many registers have been added. In addition, if you want to learn GPU computing, you can download a CUDA SDK, which has very detailed instructions
in addition, the current CPU design is also learning from GPU, that is, adding floating-point operation units with parallel computing and not so many control structures. For example, Intel's SSE Instruction set can perform four floating-point operations at the same time, and many registers have been added. In addition, if you want to learn GPU computing, you can download a CUDA SDK, which has very detailed instructions
8. Floating point numbers can be simply understood as decimals,
some teachers will teach you how to store floating-point numbers in memory and how to store them in the form of bottom index.
it can be said that it's totally wrong. The way to store real floating-point numbers in memory is very complicated. There are seven situations in total.
because there are too many indexes, it's hard to type them,
you can refer to: (standard IEEE 745) Name:
standard for binary floating decimal point ANSI / IEEE 745
this is the standard document of floating-point numbers, which specifies in detail how to store floating-point numbers and double precision numbers, After understanding, you can calculate the value range of floating-point numbers, why some values can not be obtained, and why there are precision problems, but the calculation is quite troublesome
hope to help you
some teachers will teach you how to store floating-point numbers in memory and how to store them in the form of bottom index.
it can be said that it's totally wrong. The way to store real floating-point numbers in memory is very complicated. There are seven situations in total.
because there are too many indexes, it's hard to type them,
you can refer to: (standard IEEE 745) Name:
standard for binary floating decimal point ANSI / IEEE 745
this is the standard document of floating-point numbers, which specifies in detail how to store floating-point numbers and double precision numbers, After understanding, you can calculate the value range of floating-point numbers, why some values can not be obtained, and why there are precision problems, but the calculation is quite troublesome
hope to help you
9. < UL >
I3 380m is similar to core e8400, 25 gflops
Qualcomm snapdragon 820 scored 1732 points in single core and 4970 points in multi-core. The multi-core capability is the level of i3-4000m or i5-4210u, which is better than i3-380m. The floating-point computing capability of Intel Core i5-4210u is 43.4467 gflops
the floating-point computing capability of GPU is designed for computing, so the floating-point computing capability of GPU is several to dozens of times that of CPU. For example, the i3-380m's core display capability is about half that of hd2000, 30gflops, while the snapdragon 820's GPU adreno530's floating-point capability is 544gflops
10. Now, the gap between arm and x86 processor in floating-point computing power is still huge, not only by the architecture, but also by their different application directions
Hot content