Complexity Analysis (Part 1): How to analyze and count the execution efficiency and resource consumption of algorithms?

Complexity Analysis (Part 1): How to analyze and count the execution efficiency and resource consumption of algorithms?

Why is complexity analysis needed?

You may be a little confused. I ran the code once, and through statistics and monitoring, we can get the time of algorithm execution and the amount of memory occupied. Why do we need to analyze the complexity of time and space? Can this method of analysis be more accurate than the data I actually get from running it again?

First of all, I can say with certainty that your method of evaluating algorithm execution efficiency is correct. Many data structure and algorithm books also gave this method a name, called post-statistics method . However, this statistical method has very big limitations.

1. Test results are very dependent on the test environment

The difference of the hardware in the test environment will have a great impact on the test results. For example, we take the same piece of code and run it with Intel Core i9 processor and Intel Core i3 processor. Needless to say, the i9 processor executes much faster than the i3 processor. Also, for example, the original code a on this machine executes faster than the code b. When we switch to another machine, there may be completely opposite results.

2. Test results are greatly affected by the scale of the data

We will talk about the sorting algorithm later, let's take it as an example. For the same sorting algorithm, if the order of the data to be sorted is different, the execution time of sorting will be very different. In extreme cases, if the data is already in order, the sorting algorithm does not need to do anything, and the execution time will be very short. In addition, if the test data size is too small, the test results may not truly reflect the performance of the algorithm. For example, for small-scale data sorting, insertion sort may be faster than quick sort!

Therefore, we need a method that can roughly estimate the execution efficiency of the algorithm without using specific test data to test . This is the time and space complexity analysis method we are going to talk about today.

Big O complexity notation

The execution efficiency of the algorithm, roughly speaking, is the execution time of the algorithm code. But how to get the execution time of a piece of code "with the naked eye" without running the code?

Here is a very simple code to find the cumulative sum of 1, 2, 3...n. Now, let's estimate the execution time of this code together.

int cal ( int n) { int sum = 0 ; int i = 1 ; for (; i <= n; ++i) { sum = sum + i; } return sum; } Copy code

From the perspective of the CPU, each line of this code performs a similar operation: read data-operation-write data . Although the number of CPU executions and execution time corresponding to each line of code are different, we are just a rough estimate here, so we can assume that the execution time of each line of code is the same, which is unit_time. Based on this assumption, what is the total execution time of this code?

The second and third lines of code require 1 unit_time execution time respectively, and the 4th and 5th lines have been run n times, so the execution time of 2n*unit_time is required, so the total execution time of this code is (2n+2)* unit_time. It can be seen that the execution time T(n) of all codes is proportional to the number of executions of each line of code .

According to this analysis, let's look at this code again.

int cal ( int n) { int sum = 0 ; int i = 1 ; int j = 1 ; for (; i <= n; ++i) { j = 1 ; for (; j <= n; ++j) { sum = sum + i * j; } } } Copy code

We still assume that the execution time of each statement is unit_time. What is the total execution time T(n) of this code?

The second, third, and fourth lines of code require 1 unit_time execution time for each line. The 5th and 6th lines of code are executed n times in a loop, requiring 2n * unit_time execution time, and the 7th and 8th lines are executed in a loop of n2 Times, so it takes 2n2 * unit_time execution time. Therefore, the total execution time of the entire code T(n) = (2n2+2n+3)*unit_time.

Although we don t know the specific value of unit_time, through the derivation of the execution time of these two pieces of code, we can get a very important rule, that is, the execution time T(n) of all codes and the number of executions per line of code n Directly proportional .

We can summarize this law into a formula. Attention, the Big O is coming soon!

Let me explain this formula in detail. Among them, T(n) we have already talked about, it represents the time of code execution; n represents the size of the data size; f(n) represents the sum of the number of times each line of code is executed. Because this is a formula, it is represented by f(n). The O in the formula means that the execution time T(n) of the code is proportional to the expression of f(n).

So, T(n) = O(2n+2) in the first example, and T(n) = O(2n2+2n+3) in the second example. This is the Big O time complexity notation . Big O time complexity does not actually indicate the actual execution time of the code , but the trend of the code execution time with the growth of the data scale . Therefore, it is also called asymptotic time complexity (asymptotic time complexity), or time complexity for short . .

When n is large, you can think of it as 10,000 or 100,000. The low-order, constant, and coefficients in the formula do not affect the growth trend, so they can all be ignored. We only need to record a maximum magnitude. If we use Big O notation to express the time complexity of the two pieces of code just mentioned, we can write it as: T(n) = O(n); T(n) = O(n2).

Time complexity analysis

The origin and representation method of Big O time complexity were introduced earlier. Now let's look at how to analyze the time complexity of a piece of code? I have three more practical methods to share with you.

1. Only focus on the code with the most loop executions

As I just said, the complexity representation method of Big O just shows a trend of change. We usually ignore the constants, low-order, and coefficients in the formula, and only need to record the magnitude of the largest order. Therefore, when we analyze the time complexity of an algorithm or a piece of code, we only pay attention to the piece of code that has the most loop execution times. The magnitude of the number of executions of this core code is the time complexity of the entire code to be analyzed. In order to facilitate your understanding, I will take the previous example to illustrate.

int cal ( int n) { int sum = 0 ; int i = 1 ; for (; i <= n; ++i) { sum = sum + i; } return sum; } Copy code

The second and third lines of code are all constant-level execution time, which has nothing to do with the size of n, so it has no effect on the complexity. The most frequently executed loop is the 4th and 5th lines of code, so this piece of code should be focused on analysis. As we mentioned earlier, these two lines of code are executed n times, so the total time complexity is O(n).

2. The rule of addition: the total complexity is equal to the complexity of the code with the largest magnitude

I still have a piece of code here. You can try to analyze it first, and then go down to see if it is the same as my analysis.

int cal ( int n) { int sum_1 = 0 ; int p = 1 ; for (; p < 100 ; ++p) { sum_1 = sum_1 + p; } int sum_2 = 0 ; int q = 1 ; for (; q <n; ++q) { sum_2 = sum_2 + q; } int sum_3 = 0 ; int i = 1 ; int j = 1 ; for (; i <= n; ++i) { j = 1 ; for (; j <= n; ++j) { sum_3 = sum_3 + i * j; } } return sum_1 + sum_2 + sum_3; } Copy code

This code is divided into three parts, namely sum_1, sum_2, sum_3. We can analyze the time complexity of each part separately, then put them together, and then take the largest order of magnitude as the complexity of the entire code.

What is the time complexity of the first paragraph? This code is executed 100 times in a loop, so it has a constant execution time regardless of the scale of n.

Here I want to emphasize again that even if this code loops 10,000 times or 100,000 times, as long as it is a known number and has nothing to do with n, it still has a constant execution time. When n is infinite, it can be ignored. Although it will have a great impact on the execution time of the code, returning to the concept of time complexity, it represents the change trend of an algorithm's execution efficiency and the growth of data size, so no matter how long the execution time of the constant is, we can Ignore it. Because it has no influence on the growth trend.

What is the time complexity of the second code and the third code? The answer is O(n) and O(n2), you should be able to analyze it easily, so I won't be long-winded.

Combining the time complexity of these three pieces of code, we take the largest order of magnitude. Therefore, the time complexity of the entire code is O(n2). In other words: the total time complexity is equal to the time complexity of the code with the largest magnitude . Then we abstract this law into a formula:

If T1(n)=O(f(n)), T2(n)=O(g(n)); then T(n)=T1(n)+T2(n)=max(O(f(n) )), O(g(n))) = O(max(f(n), g(n))).

3. The rule of multiplication: the complexity of the nested code is equal to the product of the complexity of the code inside and outside the nest

I just talked about an addition rule in complexity analysis, and there is also a multiplication rule here . By analogy, you should be able to "guess" what the formula looks like?

If T1(n)=O(f(n)), T2(n)=O(g(n)); then T(n)=T1(n)*T2(n)=O(f(n)) *O(g(n))=O(f(n)*g(n)).

In other words, assuming T1(n) = O(n), T2(n) = O(n2), then T1(n) * T2(n) = O(n3). When implemented in specific code, we can regard the multiplication rule as a nested loop . Let me give you an example to explain it.

int cal ( int n) { int ret = 0 ; int i = 1 ; for (; i <n; ++i) { ret = ret + f(i); } } int f ( int n) { int sum = 0 ; int i = 1 ; for (; i <n; ++i) { sum = sum + i; } return sum; } Copy code

Let's look at the cal() function alone. Assuming that f() is just an ordinary operation, the time complexity of lines 4 to 6 is, T1(n) = O(n). But the f() function itself is not a simple operation, its time complexity is T2(n) = O(n), so the time complexity of the entire cal() function is, T(n) = T1(n) * T2(n) = O(n*n) = O(n2).

I just talked about three kinds of complexity analysis techniques. However, you don't have to deliberately remember. In fact, the key to complexity analysis lies in "proficient". You only need to read more cases and analyze more, and you will be able to achieve "no tricks, no tricks".

Analysis of Several Common Time Complexity Examples

Although the codes vary widely, there are not many common complex metrics. I summarized it a bit, these complex metrics cover almost all the complexity metrics of the code you can touch in the future.

For the complex metric levels just listed, we can roughly divide them into two categories, polynomial magnitude and non-polynomial magnitude. Among them, there are only two non-polynomial magnitudes: O(2n) and O(n!).

We call an algorithm problem whose time complexity is non-polynomial order NP (Non-Deterministic Polynomial) problem.

When the data size n becomes larger and larger, the execution time of non-polynomial algorithms will increase dramatically, and the execution time of solving the problem will increase indefinitely. Therefore, algorithms with non-polynomial time complexity are actually very inefficient algorithms. Therefore, I won't talk about NP time complexity. We mainly look at several common polynomial time complexity .

1. O(1)

First of all, you must clarify a concept, O(1) is just a representation of constant-level time complexity, not that only one line of code is executed. For example, even if there are 3 lines in this code, its time complexity is O(1) instead of O(3).

int I = . 8 ; int J = . 6 ; int SUM = I + J; duplicated code

Let me summarize a little bit, as long as the execution time of the code does not increase with the increase of n, the time complexity of the code is recorded as O(1). In other words, in general, as long as there are no loop statements or recursive statements in the algorithm, even if there are thousands of lines of code, the time complexity is still (1) .

2. O(logn), O(nlogn)

Logarithmic time complexity is very common, and it is also the most difficult time complexity to analyze. Let me illustrate with an example.

i = 1 ; while (i <= n) { i = i * 2 ; } Copy code

According to the complexity analysis method we mentioned earlier, the third line of code is the most executed loop. Therefore, as long as we can calculate how many times this line of code is executed, we can know the time complexity of the entire code.

It can be seen from the code that the value of variable i is taken from 1 and multiplied by 2 each time it loops. When it is greater than n, the loop ends. Remember the geometric series we used in high school? In fact, the value of variable i is a geometric sequence. If I list them one by one, it should look like this:

So, as long as we know the value of x, we know the number of times this line of code has been executed. Solving the problem of x by 2x=n we think we should have learned it in high school, so I won't say more. x=log2n, so the time complexity of this code is O(log2n).

Now, I am going to change the code a bit, and you can look at it again. What is the time complexity of this code?

i = 1 ; while (i <= n) { i = i * 3 ; } Copy code

According to the idea I just talked about, it is very simple to see that the time complexity of this code is O(log3n).

In fact, whether it is based on 2, 3, or 10, we can record the time complexity of all logarithmic orders as O(logn). why?

We know that logarithms can be converted to each other, log3n is equal to log32 * log2n, so O(log3n) = O(C * log2n), where C=log32 is a constant. Based on our previous theory: when using big O to mark the complexity, the coefficient can be ignored , that is, O(Cf(n)) = O(f(n)). Therefore, O(log2n) is equal to O(log3n). Therefore, in the expression method of logarithmic time complexity, we ignore the "bottom" of the logarithm, and uniformly express it as O(logn).

If you understand the O(logn) I mentioned earlier, then O(nlogn) will be easy to understand. Remember the multiplication rule we just talked about? If the time complexity of a piece of code is O(logn), we execute it n times in a loop, and the time complexity is O(nlogn). Moreover, O(nlogn) is also a very common algorithm time complexity. For example, the time complexity of merge sort and quick sort are both O(nlogn).

3. O(m+n), O(m*n)

Let's talk about a time complexity that is different from the previous one. The complexity of the code is determined by the size of the two data . Old rules, look at the code first!

int cal ( int m, int n) { int sum_1 = 0 ; int i = 1 ; for (; i <m; ++i) { sum_1 = sum_1 + i; } int sum_2 = 0 ; int j = 1 ; for (; j <n; ++j) { sum_2 = sum_2 + j; } return sum_1 + sum_2; } Copy code

As can be seen from the code, m and n represent two data sizes. We cannot evaluate in advance which magnitude of m and n is larger, so when we express the complexity, we cannot simply use the rule of addition and omit one of them. Therefore, the time complexity of the above code is O(m+n).

In view of this situation, the original addition rule is incorrect. We need to change the addition rule to: T1(m) + T2(n) = O(f(m) + g(n)). But the law of multiplication continues to be valid: T1(m)*T2(n) = O(f(m) * f(n)).

Space complexity analysis

Earlier, we spent a long time talking about big O notation and time complexity analysis. After understanding the content mentioned above, the space complexity analysis method is very simple to learn.

As I mentioned earlier, the full name of time complexity is progressive time complexity, which represents the growth relationship between the execution time of an algorithm and the scale of data . By analogy, the full name of space complexity is asymptotic space complexity (asymptotic space complexity), which represents the growth relationship between the storage space of an algorithm and the scale of data .

Let me show you a specific example.

void print ( int n) { int i = 0 ; int [] a = new int [n]; for (i; i <n; ++i) { a[i] = i * i; } for (i = n- 1 ; i >= 0 ; --i) { print out a[i] } } Copy code

As with the time complexity analysis, we can see that in the second line of code, we apply for a space to store the variable i, but it is of constant order and has nothing to do with the data size n, so we can ignore it. Line 3 applies for an int type array of size n. In addition, the rest of the code does not occupy more space, so the space complexity of the entire code is O(n).

Our common space complexity is O(1), O(n), O(n2), and logarithmic complexity such as O(logn) and O(nlogn) are usually not used. Moreover, space complexity analysis is much simpler than time complexity analysis. Therefore, for space complexity, it is enough to master what I just said.

Content summary

Complexity is also called progressive complexity, including time complexity and space complexity. It is used to analyze the growth relationship between algorithm execution efficiency and data size. It can be roughly expressed that the higher the complexity of the algorithm, the lower the execution efficiency. Common complexity is not much, from low-order to high-order: O(1), O(logn), O(n), O(nlogn), O(n2).

Disclaimer: This article is reprinted, not original