## Big O notation

When discussing running time in Big O notation, log refers to log2.

Big O notation is a special notation that points out how fast the algorithm is. For example, suppose the list contains n elements. Simple search needs to check every element, so it needs to perform n operations. Using Big O notation, this running time is O(n). What about seconds? No-Big O notation does not refer to speed in seconds. Big O notation allows you to compare operands, and it points out how fast the algorithm runs.

The operand refers to the number of times the memory is operated! More in-depth, it is the number of operations on the register!

Big O notation points out the running time in the worst case.

In addition to the running time in the worst case, it is important to also consider the running time in the average case.

Some common big O runtimes

Below is a list of 5 big O runtimes that you will often encounter in order from fast to slow.

• O(log n), also called logarithmic time, such algorithms include binary search.
• O(n), also called linear time, such algorithms include simple search.
• O(n * log n), such algorithms include the quick sort that will be introduced in Chapter 4-a faster sorting algorithm.
• O(n*n), such algorithms include the selection sort that will be introduced in Chapter 2-a slower sorting algorithm.
• O(n!), such an algorithm includes the solution to the traveling salesman problem that will be introduced next-a very slow algorithm.

n! represents the factorial of n. n!=1 2 3 ... n.

The following lists the time required to draw the grid using these algorithms in order from fast to slow:

The time in the above chart is calculated based on performing 10 operations per second. These data are not accurate. They are provided here just to give you a general idea of the difference in running time. In fact, the computer performs far more than 10 operations per second.

### Binary search algorithm

# Binary search, the array is ordered
def  binary_search ( list , item ):
low = 0
high = len ( list ) -1
while low <= high:
mid = int ((low + high)/2 )
guess = list [mid]
if guess == item:
return mid
if guess> item:
high = mid- 1
else :
low = mid + 1
return  None

my_list = [ 3 , 5 , 2 , 7 , 34 , 26 ]

Print (binary_search (my_list, . 7 ))
copying the code

### summary

• The speed of the algorithm does not refer to time, but the growth rate of the operand.
• When talking about the speed of an algorithm, we are talking about how fast its running time will increase as the input increases.
• The running time of the algorithm is expressed in Big O notation.
• O(log n) is faster than O(n). When there are more elements to search, the former is faster than the latter.

The characteristics of the array:

• All elements in the array must be of the same type (all int, double, etc.).
• The elements in the array are all connected in memory.
• The array supports sequential access and random access.
• The reading speed of the array is very fast.

The characteristics of the linked list:

• Each element of the linked list stores the address of the next element, so that a series of random memory addresses are stringed together.
• Adding an element to the linked list is easy: just put it in memory and store its address in the previous element.
• The linked list only supports sequential access.
• The insertion and deletion of the linked list is fast.

Running time of common array and linked list operations

insertO(n)O(1)
deleteO(n)O(1)

## Recursion

If you use loops, the performance of the program may be higher; if you use recursion, the program may be easier to understand. How to choose depends on what is more important to you. Recursion just makes the solution clearer, and has no performance advantage.

### Baseline conditions and recursive conditions

Because recursive functions call themselves, it is easy to make mistakes when writing such functions, which can lead to infinite loops. So when writing a recursive function, you must tell it when to stop recursion.

Because of this, every recursive function has two parts: the baseline condition and the recursive condition . The recursive condition refers to the function calling itself, while the baseline condition refers to the function no longer calling itself, thus avoiding an infinite loop.

### Call stack

The computer uses a stack called the call stack internally . Let's take a look at how the computer uses the call stack.

def  greet ( name ):
print  "hello, " + name + "!"
greet2(name)
print  "getting ready to say bye..."
bye()

def  greet2 ( name ):
print  "how are you, " + name + "?"

DEF  BYE ():
Print  "the ok BYE"
Copy the code

In Python, print is a function, but for simplicity, it is assumed that it is not a function. You can just assume the same.

Now, the memory block at the top of the stack belongs to the function greet, which means you have returned to the function greet. When you call the function greet2, the function greet is only partially executed. This is an important concept in this section: **When another function is called, the current function is paused and in an unfinished state. The values of all variables of this function are still in memory. **After executing the function greet2, you return to the function greet, and continue from where you left off: first print getting ready to say bye..., and then call the function bye.

### summary

• Recursion refers to calling your own function.
• Every recursive function has two conditions: the baseline condition and the recursive condition.
• There are two operations on the stack: push and pop.
• All function calls enter the call stack.
• The call stack may be very long, which will take up a lot of memory.

## Quick sort

### Divide and conquer

Divide and conquer

(divide and conquer, D&C)
, A well-known recursive problem solving method.

The process of using D&C to solve problems includes two steps:

• Find the baseline conditions, which must be as simple as possible.
• Keep breaking down the problem (or downsizing) until the baseline conditions are met.

When writing recursive functions involving arrays, the baseline condition is usually that the array is empty or contains only one element. When you are in trouble, check if the baseline conditions are like this.

### Quick sort

def  quicksort ( array ):
if  len (array) < 2 :
# Baseline condition: an array that is empty or contains only one element is "ordered"
return array
else :
# recursive condition
pivot = array[ 0 ]
# by all less than Sub-array composed of elements of the benchmark value
less = [i for i in array[ 1 :] if i <= pivot]
# Sub-array composed of all elements greater than the benchmark value
greater = [i for i in array[ 1 :] if i> pivot]
return quicksort(less) + [pivot] + quicksort(greater)

Print (quicksort ([ 10 , . 5 , 2 , . 3 ]))
Copy the code

### summary

• When implementing quick sort, randomly select the element used as the reference value. The average running time of quicksort is O(n log n).

• In the worst case of quicksort, the running time of the algorithm is O(n) * O(n) = O($n^2$ ).

• Constants in big O notation are sometimes important, which is why quick sort is faster than merge sort.

• When comparing simple search and binary search, constants are almost irrelevant, because when the list is very long, O(log n) is much faster than O(n).

## Hash table (hash table)

Hash table (Hash table, also called hash table) is a data structure that is directly accessed based on the key value . In other words, it accesses the record by mapping the key code value to a location in the table to speed up the search. This mapping function is called a hash function, and the array storing records is called a hash table.

Hash is to find a mapping relationship between data content and data storage address.

When using the hash table hashtable(key, value) to query, it is to use the hash function to convert the key code key to the corresponding array subscript, and locate the space to obtain the value. In this way, you can make full use of The positioning performance of the array is used for data positioning.

The query speed of Hash Table is very fast, almost the time complexity of **O(1)**.

### application

• Use the hash table for search, for example: DNS resolution
• Prevent duplication
• Use hash table as cache

### Hash collision

Hash collision: A situation where the hash function maps two different keys to the same index.

Hash conflicts are inevitable. If conflicts are encountered, the most commonly used solutions are open addressing and chain addressing .

The open addressing method is to find the next free address along the original hash address and insert it when a conflict is encountered.

But there is also a problem that if there is not enough space, he cannot handle conflicts and cannot insert data, so it needs to fill factor (space/insert data)>=1.

If there is a conflict in the principle of the chain address method, he will create a new space at the original address, and then insert it into the space in the form of a linked list node.

### performance

In the average case, the time for the hash table to perform various operations is O(1). O(1) is called constant time. You haven't seen constant time before. It doesn't mean immediately, but it means that no matter how big the hash table is, the time required is the same.

In the worst case, the running time of all operations on the hash table is O(n)-linear time, which is really slow. Let's compare hash tables with arrays and linked lists.

ArrayLinked listAverage hash tableWorst case hash table
FindO(1)O(n)O(1)O(n)
insertO(n)O(1)O(1)O(n)
deleteO(n)O(1)O(1)O(n)

On average, hash table lookup (getting the value at a given index) is as fast as an array, and insertion and deletion are as fast as a linked list, so it has the advantages of both! But in the worst case, The various operations of the hash table are very slow. Therefore, it is important to avoid the worst-case scenario when using a hash table. For this, conflicts need to be avoided. To avoid conflicts, you need to have:

• Low fill factor
• Filling factor: the number of elements in the hash table/the total number of positions
• A filling factor greater than 1 means that the number of elements exceeds the number of positions in the array.
• Once the fill factor starts to increase, you need to add positions to the hash table. This is called resizing and usually doubles the array.
• A good rule of thumb is: once the filling factor is greater than 0.7, adjust the length of the hash table.
• Good hash function
• The ideal situation is: the hash function maps the keys evenly to different positions in the hash table.
• If the linked list stored in the hash table is very long, the speed of the hash table will drop sharply. However, if the hash function used is good, these linked lists will not be very long!

Breadth first search is a search algorithm for graphs that can help answer two types of questions.

• Is there a problem: starting from node A, is there a path to node B?

• Shortest path problem: Starting from node A, which path is the shortest to node B?

### Figure Introduction

The graph is used to simulate a group of connections. The graph is composed of nodes and edges. A node may be directly connected to many nodes, and these nodes are called neighbors.

The shortest route: Suppose you live in San Francisco and you want to go from the Twin Peaks to the Golden Gate Bridge. You want to go by bus and want to have the least number of transfers. The available buses are as follows

This constitutes a graph, where locations are nodes in the graph, and connections indicate whether there is a bus relationship between different nodes. Of course, this is just an example of a graph. A graph can represent the relationship between different nodes, and nodes can have multiple meanings.

You want to take the shortest path and take the least time to reach the Jinshan Bridge. According to this diagram, you need to take at least three steps. This is the shortest path, as shown below:

#### Whether there is a problem

From node A, find whether there is B in all the neighbors of node A. If there is no B in the neighbors of A's neighbors, the search continues until B is found or the traversal is completed.

Take the example of finding a mango seller:

Suppose you run a mango farm and you need to find a mango seller in order to sell mangoes to him. For this, you can find it in your friends. 1. create a friend form and check each person on the list in turn to see if he is a mango seller? If there are no sellers among your friends, check whether the friends of each friend in the list have sellers. In this way, you can know whether there are sellers in your network, so that you can solve the problem of existence.

#### Shortest path problem

In the above example, if your friend is defined as a first-degree relationship, and your friend's friend is defined as a second-degree relationship, depending on this, the first-degree relationship is more intimate and shorter than the second-degree relationship. In order to find the shortest path, generally start from the first-degree relationship. If you can't find what you want in the first-degree relationship, then start to find the second-degree relationship. How to achieve this order in programming is done by using queues.

### Programming realization

#### Queue-control sequence

Queue is a first-in-first-out data structure, which supports two operations of enqueue and dequeue, and first-in-first-out.

#### Graph implementation-hash table

The hash table stores the neighbor relationship between nodes according to the key value. For the picture of the mango dealer, the code is as follows:

graph = dict()
graph["you"] = ["alice", "bob", "claire"]
graph["bob"] = ["anuj", "peggy"]
graph["alice"] = ["peggy"]
graph["claire"] = ["thom", "jonny"]
graph["anuj"] = []
graph["peggy"] = []
graph["thom"] = []
graph["jonny"] = []
Copy code

#### Looking for a mango seller

import collections

def  getGraph ():
graph = dict ()
graph[ "you" ] = [ "alice" , "bob" , "claire" ]
graph[ "bob" ] = [ "anuj" , "peggy" ]
graph[ "alice" ] = [ "peggy" ]
graph[ "claire" ] = [ "thom" , " jonny " ]
graph[ "anuj" ] = []
graph[ "peggy" ] = []
graph[ "thom" ] = []
graph[ "jonny" ] = []
return graph

def  person_is_seller ( name ):
return name[- 1 ] == 'm'

def  search ( name ):
graph = getGraph()
search_queue = collections.deque()
search_queue += graph[name]
searched = []
while search_queue: # As long as the queue is not empty
person = search_queue.popleft() # Take out the first person
if  not person in searched: # Make sure this person has not been checked yet, skip
if person_is_seller(person) has been checked : # Check if this person
is a seller print (person + "is a mango seller!" ) # is
return  True
else :
search_queue += graph[person] # No, add this person s friends to the queue
searched.append(person) # Add this person to the checked list
print ( "none is a mango seller" ) # If you get here, you are in the queue No one is a mango seller
return  False

search( 'you' )
Copy code

### operation hours

The running time of breadth first search is O (number of people + number of edges), which is usually written as

O(V + E)
, Where V is the number of vertices and E is the number of edges.

### summary

• Breadth first search solves the problem of whether there is a path from A to B. If so, breadth first search will find the shortest path.
• To find the shortest path problem, you can establish a graph relationship and use the breadth first search algorithm to solve it
• Breadth-first search, using the structure of the queue, first traverse from the neighbors of the starting node, first search whether a node meets the requirements, if it meets the requirements, then end the search, if not, the node will be popped out of the queue and the node's neighbors will be added The queue finally completes the traversal or finds the node that meets the requirements.

## Dijkstra's Algorithm

Dijkstra's algorithm solves the problem of the shortest path from one vertex to the rest of the vertices on a weighted directed graph. The algorithm has a restriction that the weights of all edges must be non-negative .

### Use Dijkstra's algorithm

Take the following figure as an example:

The four steps of the algorithm:

1. Find the "cheapest node", the node that can be reached in the shortest time, first find out
2. The cost of updating the neighbors of the node
3. Repeat this process until you have done this for each node in the graph
4. Calculate the final path

The first step: Find the cheapest node, assuming that the end point takes infinite time, and node B is the nearest-2 minutes.

Step 2: Calculate the time required to travel to each of its neighbors via node B.

Find a shorter path to node A.

For the neighbors of node B, if a shorter path to it is found, its cost is updated. Here, you found:

A shorter path to node A (time shortened from 6 minutes to 5 minutes); a shorter path to the destination (time shortened from infinity to 7 minutes).

Step 3: Repeat!

Repeat the first step: find the node that can be reached in the shortest time. You performed the second step on node B. Except for node B, the node that can be reached in the shortest time is node A.

Repeat the second step: update the cost of all neighbors of node A.

You find that the time to the end is 6 minutes! You ran Dijkstra's algorithm on each node (no need to do this for the end point). Now, you know: it takes 2 minutes to get to node B; 5 minutes to get to node A; and 6 minutes to get to the destination.

Breadth-first search is used to find the shortest path between two points. At that time, "shortest path" means the least number of segments . In Dijkstra's algorithm, you assign a number or weight to each segment, so Dijkstra's algorithm finds the path with the smallest total weight .

Comparison chart:

### the term

The Dijkstra algorithm is used for graphs where each edge has associated numbers. These numbers are called weights .

A graph with weights is called a weighted graph, and a graph without weights is called an unweighted graph.

To calculate the shortest path in an unweighted graph, breadth-first search can be used . To calculate the shortest path in the weighted graph, Dijkstra's algorithm can be used .

For graphs with possible loops, the path around the loop cannot be the shortest path. In an undirected graph, each edge is a loop. Dijkstra's algorithm is only suitable for directed acyclic graphs .

### Code

#Hash table of the entire graph
graph = {}
graph[ "start" ] = {}
graph[ "start" ][ "a" ] = 6
graph[ "start" ][ "b" ] = 2
graph[ "a" ] = {}
graph[ "a" ][ "fin" ] = 1
graph[ "b" ] = {}
graph[ "b" ][ "a" ] = 3
graph[ "b" ][ "fin" ] = 5
graph[ "fin" ] = {}

infinity = float ( "inf" )
costs = {}
costs[ "a" ] = 6
costs[ "b" ] = 2
costs[ "fin" ] = infinity

#Hash table is a parent-child node
parents = {}
parents[ "a" ] = "start"
parents[ "b" ] = "start"
parents[ "fin" ] = None

processed = []

def  find_lowest_cost_node ( costs ):
lowest_cost = float ( "inf" )
lowest_cost_node = None
for node in costs:                                     # Traverse all nodes
cost = costs[node]
if cost <lowest_cost and node not  in processed:     # If the cost of the current node is lower and has not been processed
lowest_cost = cost                                 # Treat it as the lowest cost node
lowest_cost_node = node
return lowest_cost_node

node = find_lowest_cost_node(costs)        # Find the node with the least cost among the unprocessed nodes
while node is  not  None :                    # End after all nodes are processed
cost = costs[node]
neighbors = graph[node]
for n in neighbors.keys():             # Traverse all neighbors of the current node
new_cost = cost + neighbors[n]
if costs[n]> new_cost:              # If you go to the neighbor through the current node to be closer to the neighbor
costs[n] = new_cost                # Update the cost of the neighbor
parents[n] = node                  # At the same time, set the parent node of the neighbor as the current node
processed. append(node)                 # Mark the current node as processed
node = find_lowest_cost_node(costs)    # Find the node to be processed next, and loop

Print (Costs)
Print (Parents)
copying the code

### summary

• Breadth first search is used to find the shortest path in a non-weighted graph.
• The Dijkstra algorithm is used to find the shortest path in the weighted graph.
• The Dijkstra algorithm works only when the weight is positive.
• If the graph contains negatively weighted edges, use the Bellman-Ford algorithm.

## Greedy Algorithm

Greedy algorithm (greedy algorithm) refers to the best or optimal (that is, the most advantageous) choice is taken in each step of the choice when solving the problem, so as to hope that the result is the best or optimal algorithm.

Greedy algorithms always make decisions that seem to be optimal at the current moment, that is, expect to lead to the global optimal solution of the problem through local optimal decisions.

The results obtained by the greedy algorithm are often not the optimal results (sometimes the optimal solution), but they are relatively close to (close to) the optimal solution.

• Greedy algorithm does not have a fixed algorithm solution framework . The key of the algorithm is the choice of greedy strategy. Different strategies are selected according to different problems.

• It must be noted that the choice of strategy must have no aftereffect, that is, the choice of a certain state will not affect the previous state, but is only related to the current state. Therefore, the greedy strategy adopted must be carefully analyzed whether it satisfies no aftermath Effectiveness.

For example, the shortest path problem (breadth first, Dijkstra) introduced above belongs to the greedy algorithm, but in the choice of the problem strategy, the optimal solution can be obtained.

### Set cover problem

Suppose you have a radio show that you want listeners in all 50 states to listen to. To do this, you need to decide which radio stations to broadcast on. You need to pay for broadcasting on every radio station, so you try to broadcast on as few radio stations as possible. The list of existing broadcasting stations is as follows:

Each broadcasting station covers a specific area, and the coverage areas of different broadcasting stations may overlap.

How to find the smallest collection of radio stations covering all 50 states in the United States? It sounds easy, but it's actually very difficult. The specific method is as follows.

1. List every possible set of broadcast stations, this is called a power set. The possible subsets are$2^n$ .
2. Among these sets, the smallest set covering 50 states in the United States is selected.

The problem is that it takes a long time to calculate each possible subset of broadcast stations. Since there are 2n possible sets, the running time is O($2^n$ ). If there are not many broadcasting stations, there are only 5~10, this is feasible. But if there are many broadcasting stations, what will be the result? As the number of broadcasting stations increases, the time required will increase sharply.

### Approximate Algorithm

** Greedy algorithm can resolve the crisis! ** Use the following greedy algorithm to get a very close solution.

1. Select a radio station that covers the most uncovered states. Even if this radio station covers some states already covered, it doesn't matter.
2. Repeat the first step until all states are covered.

This is an approximate algorithm. When it takes too long to obtain an accurate solution, an approximate algorithm can be used. The criteria for judging the pros and cons of approximate algorithms are as follows:

• How fast

• How close the approximate solution is to the optimal solution

In the above example, the running time of the greedy algorithm is O($n^2$ ), where n is the number of broadcasting stations.

#### Code

# All states that need to be covered
states_needed = set ([ "mt" , "wa" , "or" , "id" , "nv" , "ut" , "ca" , "az" ])
# Alternative broadcast List of stations
stations = {}
stations[ "1" ] = set ([ "id" , "nv" , "ut" ])
stations[ "2" ] = set ([ "wa" , "id" , "mt" ])
stations[ "3" ] = set ([ "or" , "nv" , "ca" ])
stations[ "4" ] = set ([ "nv" , "ut" ])
stations[ "5" ] = set ([ "ca" , "az" ])
# finally selected radio station list
final_stations = set ()

while states_needed:
best_station = None  # Radio stations that currently cover the most uncovered states
states_covered = set () # All uncovered states covered by this radio station
for station, states in stations.items():
covered = states_needed & states # A series of states not covered by the current broadcast station
if  len (covered)> len (states_covered): # Check whether the broadcast station covers more states than best_station.
best_station = station   # Yes, set best_station as the current broadcast station
states_covered = covered
states_needed -= states_covered # Update states_needed to remove the covered states

Print (final_stations)
copying the code

### NP complete problem

The simple definition of NP-complete problem is that it is notoriously difficult to solve, such as traveling salesman problem and set covering problem.

#### How to identify NP-complete problems

• The algorithm runs very fast when there are fewer elements, but as the number of elements increases, the speed becomes very slow.
• Problems involving "all combinations" are usually NP-complete.
• The problem cannot be divided into small problems, all possible situations must be considered. This may be an NP-complete problem.
• If the problem involves a sequence (such as the city sequence in the traveling salesman problem) and is difficult to solve, it may be an NP-complete problem.
• If the problem involves a collection (such as a collection of broadcast stations) and is difficult to solve, it may be an NP-complete problem.
• If the problem can be converted to a set covering problem or a traveling salesman problem, then it must be an NP-complete problem.

### summary

• The greedy algorithm looks for a local optimal solution in an attempt to obtain the global optimal solution in this way.
• For the NP-complete problem, no quick solution has been found yet.
• When facing NP-complete problems, the best approach is to use approximate algorithms.
• Greedy algorithm is easy to implement and fast, and it is a good approximation algorithm.

## Dynamic programming

Dynamic programming is a method of decomposing the original big problem into relatively simple sub-problems, first solving the sub-problems, and then gradually solving the big problems.

Dynamic programming is often applicable to problems with overlapping subproblems and optimal substructure properties .

### Basic idea

For a given problem, first find the solution of its sub-problems, and then merge the solutions of the sub-problems to get the solution of the original problem. Usually many sub-problems are very similar. For this reason, the dynamic programming method attempts to solve each sub-problem only once, thereby reducing the amount of calculation: once the solution of a given stator problem has been calculated, it is memorized and stored so that the same sub-problem is needed next time Directly check the table when solving.

### Divide and conquer and dynamic programming

Common point : Both require the original problem to have the optimal substructure property. Both divide and conquer the original problem into several smaller sub-problems (small enough to be easily solved). Then merge the solutions of the sub-problems. , Forming a solution to the original problem.

difference:

• The divide-and-conquer method treats the decomposed sub-problems as independent of each other and does it through recursion.
• Dynamic programming understands the decomposed sub-problems as being related to each other, there are overlapping parts, which need to be memorized, and it is usually done by iteration.

### Problem characteristics

Optimal substructure : When the optimal solution of a problem includes the optimal solution of its subproblems, the problem is said to have the property of optimal substructure.

Overlapping sub-problems : When solving the problem from top to bottom using the recursive algorithm, the sub-problems generated each time are not always new problems, and some sub-problems are repeatedly calculated many times. The dynamic programming algorithm takes advantage of the overlapping nature of this sub-problem. It solves each sub-problem only once, and then saves its solution in a table, and uses the solutions of these sub-problems as much as possible in the future.

Evernote-Comic-What is dynamic programming

### summary

• Dynamic programming can help you find the optimal solution under given constraints.
• When the problem can be decomposed into independent and discrete sub-problems, dynamic programming can be used to solve
• Every dynamic programming solution involves grids
• The value in the cell is usually the value you want to optimize.
• Each cell is a sub-problem, so you should consider how to divide the problem into sub-problems, which will help you find the coordinate axis of the grid.

## k-nearest neighbor algorithm

The K-Nearest Neighbor (KNN) classification algorithm is a theoretically mature method and one of the simplest machine learning algorithms. The idea of this method is: if most of the k most similar (ie, the nearest neighbors in the feature space) samples of a sample in the feature space belong to a certain category, the sample also belongs to this category.

A recommendation system can be created through KNN!

### summary

• KNN is used for classification and regression, and the nearest neighbors need to be considered.
• Classification is grouping.
• Regression is the prediction result (such as numbers).
• Feature extraction means converting items (such as fruits or users) into a series of comparable numbers.
• Whether the right features can be selected is critical to the success or failure of the KNN algorithm.