Big O notation
When discussing running time in Big O notation, log refers to log2.
Big O notation is a special notation that points out how fast the algorithm is. For example, suppose the list contains n elements. Simple search needs to check every element, so it needs to perform n operations. Using Big O notation, this running time is O(n). What about seconds? NoBig O notation does not refer to speed in seconds. Big O notation allows you to compare operands, and it points out how fast the algorithm runs.
The operand refers to the number of times the memory is operated! More indepth, it is the number of operations on the register!
Big O notation points out the running time in the worst case.
In addition to the running time in the worst case, it is important to also consider the running time in the average case.
Some common big O runtimes
Below is a list of 5 big O runtimes that you will often encounter in order from fast to slow.
 O(log n), also called logarithmic time, such algorithms include binary search.
 O(n), also called linear time, such algorithms include simple search.
 O(n * log n), such algorithms include the quick sort that will be introduced in Chapter 4a faster sorting algorithm.
 O(n*n), such algorithms include the selection sort that will be introduced in Chapter 2a slower sorting algorithm.
 O(n!), such an algorithm includes the solution to the traveling salesman problem that will be introduced nexta very slow algorithm.
n! represents the factorial of n. n!=1 2 3 ... n.
The following lists the time required to draw the grid using these algorithms in order from fast to slow:
The time in the above chart is calculated based on performing 10 operations per second. These data are not accurate. They are provided here just to give you a general idea of the difference in running time. In fact, the computer performs far more than 10 operations per second.
Binary search algorithm
# Binary search, the array is ordered
def binary_search ( list , item ):
low = 0
high = len ( list ) 1
while low <= high:
mid = int ((low + high)/2 )
guess = list [mid]
if guess == item:
return mid
if guess> item:
high = mid 1
else :
low = mid + 1
return None
my_list = [ 3 , 5 , 2 , 7 , 34 , 26 ]
Print (binary_search (my_list, . 7 ))
copying the code
summary
 The speed of the algorithm does not refer to time, but the growth rate of the operand.
 When talking about the speed of an algorithm, we are talking about how fast its running time will increase as the input increases.
 The running time of the algorithm is expressed in Big O notation.
 O(log n) is faster than O(n). When there are more elements to search, the former is faster than the latter.
Array and linked list
The characteristics of the array:
 All elements in the array must be of the same type (all int, double, etc.).
 The elements in the array are all connected in memory.
 The array supports sequential access and random access.
 The reading speed of the array is very fast.
The characteristics of the linked list:
 Each element of the linked list stores the address of the next element, so that a series of random memory addresses are stringed together.
 Adding an element to the linked list is easy: just put it in memory and store its address in the previous element.
 The linked list only supports sequential access.
 The insertion and deletion of the linked list is fast.
Running time of common array and linked list operations
Array  Linked list  

Read  O(1)  O(n) 
insert  O(n)  O(1) 
delete  O(n)  O(1) 
Recursion
If you use loops, the performance of the program may be higher; if you use recursion, the program may be easier to understand. How to choose depends on what is more important to you. Recursion just makes the solution clearer, and has no performance advantage.
Baseline conditions and recursive conditions
Because recursive functions call themselves, it is easy to make mistakes when writing such functions, which can lead to infinite loops. So when writing a recursive function, you must tell it when to stop recursion.
Because of this, every recursive function has two parts: the baseline condition and the recursive condition . The recursive condition refers to the function calling itself, while the baseline condition refers to the function no longer calling itself, thus avoiding an infinite loop.
Call stack
The computer uses a stack called the call stack internally . Let's take a look at how the computer uses the call stack.
def greet ( name ):
print "hello, " + name + "!"
greet2(name)
print "getting ready to say bye..."
bye()
def greet2 ( name ):
print "how are you, " + name + "?"
DEF BYE ():
Print "the ok BYE"
Copy the code
In Python, print is a function, but for simplicity, it is assumed that it is not a function. You can just assume the same.
Now, the memory block at the top of the stack belongs to the function greet, which means you have returned to the function greet. When you call the function greet2, the function greet is only partially executed. This is an important concept in this section: **When another function is called, the current function is paused and in an unfinished state. The values of all variables of this function are still in memory. **After executing the function greet2, you return to the function greet, and continue from where you left off: first print getting ready to say bye..., and then call the function bye.
summary
 Recursion refers to calling your own function.
 Every recursive function has two conditions: the baseline condition and the recursive condition.
 There are two operations on the stack: push and pop.
 All function calls enter the call stack.
 The call stack may be very long, which will take up a lot of memory.
Quick sort
Divide and conquer
Divide and conquer
The process of using D&C to solve problems includes two steps:
 Find the baseline conditions, which must be as simple as possible.
 Keep breaking down the problem (or downsizing) until the baseline conditions are met.
When writing recursive functions involving arrays, the baseline condition is usually that the array is empty or contains only one element. When you are in trouble, check if the baseline conditions are like this.
Quick sort
def quicksort ( array ):
if len (array) < 2 :
# Baseline condition: an array that is empty or contains only one element is "ordered"
return array
else :
# recursive condition
pivot = array[ 0 ]
# by all less than Subarray composed of elements of the benchmark value
less = [i for i in array[ 1 :] if i <= pivot]
# Subarray composed of all elements greater than the benchmark value
greater = [i for i in array[ 1 :] if i> pivot]
return quicksort(less) + [pivot] + quicksort(greater)
Print (quicksort ([ 10 , . 5 , 2 , . 3 ]))
Copy the code
summary

When implementing quick sort, randomly select the element used as the reference value. The average running time of quicksort is O(n log n).

In the worst case of quicksort, the running time of the algorithm is O(n) * O(n) = O($n^2$ ).

Constants in big O notation are sometimes important, which is why quick sort is faster than merge sort.

When comparing simple search and binary search, constants are almost irrelevant, because when the list is very long, O(log n) is much faster than O(n).
Hash table (hash table)
Hash table (Hash table, also called hash table) is a data structure that is directly accessed based on the key value . In other words, it accesses the record by mapping the key code value to a location in the table to speed up the search. This mapping function is called a hash function, and the array storing records is called a hash table.
Hash is to find a mapping relationship between data content and data storage address.
When using the hash table hashtable(key, value) to query, it is to use the hash function to convert the key code key to the corresponding array subscript, and locate the space to obtain the value. In this way, you can make full use of The positioning performance of the array is used for data positioning.
The query speed of Hash Table is very fast, almost the time complexity of **O(1)**.
application
 Use the hash table for search, for example: DNS resolution
 Prevent duplication
 Use hash table as cache
Hash collision
Hash collision: A situation where the hash function maps two different keys to the same index.
Hash conflicts are inevitable. If conflicts are encountered, the most commonly used solutions are open addressing and chain addressing .
Open addressing
The open addressing method is to find the next free address along the original hash address and insert it when a conflict is encountered.
But there is also a problem that if there is not enough space, he cannot handle conflicts and cannot insert data, so it needs to fill factor (space/insert data)>=1.
Chain address
If there is a conflict in the principle of the chain address method, he will create a new space at the original address, and then insert it into the space in the form of a linked list node.
performance
In the average case, the time for the hash table to perform various operations is O(1). O(1) is called constant time. You haven't seen constant time before. It doesn't mean immediately, but it means that no matter how big the hash table is, the time required is the same.
In the worst case, the running time of all operations on the hash table is O(n)linear time, which is really slow. Let's compare hash tables with arrays and linked lists.
Array  Linked list  Average hash table  Worst case hash table  

Find  O(1)  O(n)  O(1)  O(n) 
insert  O(n)  O(1)  O(1)  O(n) 
delete  O(n)  O(1)  O(1)  O(n) 
On average, hash table lookup (getting the value at a given index) is as fast as an array, and insertion and deletion are as fast as a linked list, so it has the advantages of both! But in the worst case, The various operations of the hash table are very slow. Therefore, it is important to avoid the worstcase scenario when using a hash table. For this, conflicts need to be avoided. To avoid conflicts, you need to have:
 Low fill factor
 Filling factor: the number of elements in the hash table/the total number of positions
 A filling factor greater than 1 means that the number of elements exceeds the number of positions in the array.
 Once the fill factor starts to increase, you need to add positions to the hash table. This is called resizing and usually doubles the array.
 A good rule of thumb is: once the filling factor is greater than 0.7, adjust the length of the hash table.
 Good hash function
 The ideal situation is: the hash function maps the keys evenly to different positions in the hash table.
 If the linked list stored in the hash table is very long, the speed of the hash table will drop sharply. However, if the hash function used is good, these linked lists will not be very long!
Breadth first search
Breadth first search is a search algorithm for graphs that can help answer two types of questions.

Is there a problem: starting from node A, is there a path to node B?

Shortest path problem: Starting from node A, which path is the shortest to node B?
Figure Introduction
The graph is used to simulate a group of connections. The graph is composed of nodes and edges. A node may be directly connected to many nodes, and these nodes are called neighbors.
The shortest route: Suppose you live in San Francisco and you want to go from the Twin Peaks to the Golden Gate Bridge. You want to go by bus and want to have the least number of transfers. The available buses are as follows
This constitutes a graph, where locations are nodes in the graph, and connections indicate whether there is a bus relationship between different nodes. Of course, this is just an example of a graph. A graph can represent the relationship between different nodes, and nodes can have multiple meanings.
You want to take the shortest path and take the least time to reach the Jinshan Bridge. According to this diagram, you need to take at least three steps. This is the shortest path, as shown below:
Breadth first search
Whether there is a problem
From node A, find whether there is B in all the neighbors of node A. If there is no B in the neighbors of A's neighbors, the search continues until B is found or the traversal is completed.
Take the example of finding a mango seller:
Suppose you run a mango farm and you need to find a mango seller in order to sell mangoes to him. For this, you can find it in your friends. 1. create a friend form and check each person on the list in turn to see if he is a mango seller? If there are no sellers among your friends, check whether the friends of each friend in the list have sellers. In this way, you can know whether there are sellers in your network, so that you can solve the problem of existence.
Shortest path problem
In the above example, if your friend is defined as a firstdegree relationship, and your friend's friend is defined as a seconddegree relationship, depending on this, the firstdegree relationship is more intimate and shorter than the seconddegree relationship. In order to find the shortest path, generally start from the firstdegree relationship. If you can't find what you want in the firstdegree relationship, then start to find the seconddegree relationship. How to achieve this order in programming is done by using queues.
Programming realization
Queuecontrol sequence
Queue is a firstinfirstout data structure, which supports two operations of enqueue and dequeue, and firstinfirstout.
Graph implementationhash table
The hash table stores the neighbor relationship between nodes according to the key value. For the picture of the mango dealer, the code is as follows:
graph = dict() graph["you"] = ["alice", "bob", "claire"] graph["bob"] = ["anuj", "peggy"] graph["alice"] = ["peggy"] graph["claire"] = ["thom", "jonny"] graph["anuj"] = [] graph["peggy"] = [] graph["thom"] = [] graph["jonny"] = [] Copy code
Looking for a mango seller
import collections
# Breadth First Search
def getGraph ():
graph = dict ()
graph[ "you" ] = [ "alice" , "bob" , "claire" ]
graph[ "bob" ] = [ "anuj" , "peggy" ]
graph[ "alice" ] = [ "peggy" ]
graph[ "claire" ] = [ "thom" , " jonny " ]
graph[ "anuj" ] = []
graph[ "peggy" ] = []
graph[ "thom" ] = []
graph[ "jonny" ] = []
return graph
def person_is_seller ( name ):
return name[ 1 ] == 'm'
def search ( name ):
graph = getGraph()
search_queue = collections.deque()
search_queue += graph[name]
searched = []
while search_queue: # As long as the queue is not empty
person = search_queue.popleft() # Take out the first person
if not person in searched: # Make sure this person has not been checked yet, skip
if person_is_seller(person) has been checked : # Check if this person
is a seller print (person + "is a mango seller!" ) # is
return True
else :
search_queue += graph[person] # No, add this person s friends to the queue
searched.append(person) # Add this person to the checked list
print ( "none is a mango seller" ) # If you get here, you are in the queue No one is a mango seller
return False
search( 'you' )
Copy code
operation hours
The running time of breadth first search is O (number of people + number of edges), which is usually written as
summary
 Breadth first search solves the problem of whether there is a path from A to B. If so, breadth first search will find the shortest path.
 To find the shortest path problem, you can establish a graph relationship and use the breadth first search algorithm to solve it
 Breadthfirst search, using the structure of the queue, first traverse from the neighbors of the starting node, first search whether a node meets the requirements, if it meets the requirements, then end the search, if not, the node will be popped out of the queue and the node's neighbors will be added The queue finally completes the traversal or finds the node that meets the requirements.
Dijkstra's Algorithm
Dijkstra's algorithm solves the problem of the shortest path from one vertex to the rest of the vertices on a weighted directed graph. The algorithm has a restriction that the weights of all edges must be nonnegative .
Use Dijkstra's algorithm
Take the following figure as an example:
The four steps of the algorithm:
 Find the "cheapest node", the node that can be reached in the shortest time, first find out
 The cost of updating the neighbors of the node
 Repeat this process until you have done this for each node in the graph
 Calculate the final path
The first step: Find the cheapest node, assuming that the end point takes infinite time, and node B is the nearest2 minutes.
Step 2: Calculate the time required to travel to each of its neighbors via node B.
Find a shorter path to node A.
For the neighbors of node B, if a shorter path to it is found, its cost is updated. Here, you found:
A shorter path to node A (time shortened from 6 minutes to 5 minutes); a shorter path to the destination (time shortened from infinity to 7 minutes).
Step 3: Repeat!
Repeat the first step: find the node that can be reached in the shortest time. You performed the second step on node B. Except for node B, the node that can be reached in the shortest time is node A.
Repeat the second step: update the cost of all neighbors of node A.
You find that the time to the end is 6 minutes! You ran Dijkstra's algorithm on each node (no need to do this for the end point). Now, you know: it takes 2 minutes to get to node B; 5 minutes to get to node A; and 6 minutes to get to the destination.
Breadthfirst search is used to find the shortest path between two points. At that time, "shortest path" means the least number of segments . In Dijkstra's algorithm, you assign a number or weight to each segment, so Dijkstra's algorithm finds the path with the smallest total weight .
Comparison chart:
the term
The Dijkstra algorithm is used for graphs where each edge has associated numbers. These numbers are called weights .
A graph with weights is called a weighted graph, and a graph without weights is called an unweighted graph.
To calculate the shortest path in an unweighted graph, breadthfirst search can be used . To calculate the shortest path in the weighted graph, Dijkstra's algorithm can be used .
For graphs with possible loops, the path around the loop cannot be the shortest path. In an undirected graph, each edge is a loop. Dijkstra's algorithm is only suitable for directed acyclic graphs .
Code
#Hash table of the entire graph
graph = {}
graph[ "start" ] = {}
graph[ "start" ][ "a" ] = 6
graph[ "start" ][ "b" ] = 2
graph[ "a" ] = {}
graph[ "a" ][ "fin" ] = 1
graph[ "b" ] = {}
graph[ "b" ][ "a" ] = 3
graph[ "b" ][ "fin" ] = 5
graph[ "fin" ] = {}
#Hash table is overhead
infinity = float ( "inf" )
costs = {}
costs[ "a" ] = 6
costs[ "b" ] = 2
costs[ "fin" ] = infinity
#Hash table is a parentchild node
parents = {}
parents[ "a" ] = "start"
parents[ "b" ] = "start"
parents[ "fin" ] = None
processed = []
def find_lowest_cost_node ( costs ):
lowest_cost = float ( "inf" )
lowest_cost_node = None
for node in costs: # Traverse all nodes
cost = costs[node]
if cost <lowest_cost and node not in processed: # If the cost of the current node is lower and has not been processed
lowest_cost = cost # Treat it as the lowest cost node
lowest_cost_node = node
return lowest_cost_node
node = find_lowest_cost_node(costs) # Find the node with the least cost among the unprocessed nodes
while node is not None : # End after all nodes are processed
cost = costs[node]
neighbors = graph[node]
for n in neighbors.keys(): # Traverse all neighbors of the current node
new_cost = cost + neighbors[n]
if costs[n]> new_cost: # If you go to the neighbor through the current node to be closer to the neighbor
costs[n] = new_cost # Update the cost of the neighbor
parents[n] = node # At the same time, set the parent node of the neighbor as the current node
processed. append(node) # Mark the current node as processed
node = find_lowest_cost_node(costs) # Find the node to be processed next, and loop
Print (Costs)
Print (Parents)
copying the code
summary
 Breadth first search is used to find the shortest path in a nonweighted graph.
 The Dijkstra algorithm is used to find the shortest path in the weighted graph.
 The Dijkstra algorithm works only when the weight is positive.
 If the graph contains negatively weighted edges, use the BellmanFord algorithm.
Greedy Algorithm
Greedy algorithm (greedy algorithm) refers to the best or optimal (that is, the most advantageous) choice is taken in each step of the choice when solving the problem, so as to hope that the result is the best or optimal algorithm.
Greedy algorithms always make decisions that seem to be optimal at the current moment, that is, expect to lead to the global optimal solution of the problem through local optimal decisions.
The results obtained by the greedy algorithm are often not the optimal results (sometimes the optimal solution), but they are relatively close to (close to) the optimal solution.

Greedy algorithm does not have a fixed algorithm solution framework . The key of the algorithm is the choice of greedy strategy. Different strategies are selected according to different problems.

It must be noted that the choice of strategy must have no aftereffect, that is, the choice of a certain state will not affect the previous state, but is only related to the current state. Therefore, the greedy strategy adopted must be carefully analyzed whether it satisfies no aftermath Effectiveness.
For example, the shortest path problem (breadth first, Dijkstra) introduced above belongs to the greedy algorithm, but in the choice of the problem strategy, the optimal solution can be obtained.
Set cover problem
Suppose you have a radio show that you want listeners in all 50 states to listen to. To do this, you need to decide which radio stations to broadcast on. You need to pay for broadcasting on every radio station, so you try to broadcast on as few radio stations as possible. The list of existing broadcasting stations is as follows:
Each broadcasting station covers a specific area, and the coverage areas of different broadcasting stations may overlap.
How to find the smallest collection of radio stations covering all 50 states in the United States? It sounds easy, but it's actually very difficult. The specific method is as follows.
 List every possible set of broadcast stations, this is called a power set. The possible subsets are$2^n$ .
 Among these sets, the smallest set covering 50 states in the United States is selected.
The problem is that it takes a long time to calculate each possible subset of broadcast stations. Since there are 2n possible sets, the running time is O($2^n$ ). If there are not many broadcasting stations, there are only 5~10, this is feasible. But if there are many broadcasting stations, what will be the result? As the number of broadcasting stations increases, the time required will increase sharply.
Approximate Algorithm
** Greedy algorithm can resolve the crisis! ** Use the following greedy algorithm to get a very close solution.
 Select a radio station that covers the most uncovered states. Even if this radio station covers some states already covered, it doesn't matter.
 Repeat the first step until all states are covered.
This is an approximate algorithm. When it takes too long to obtain an accurate solution, an approximate algorithm can be used. The criteria for judging the pros and cons of approximate algorithms are as follows:

How fast

How close the approximate solution is to the optimal solution
In the above example, the running time of the greedy algorithm is O($n^2$ ), where n is the number of broadcasting stations.
Code
# All states that need to be covered
states_needed = set ([ "mt" , "wa" , "or" , "id" , "nv" , "ut" , "ca" , "az" ])
# Alternative broadcast List of stations
stations = {}
stations[ "1" ] = set ([ "id" , "nv" , "ut" ])
stations[ "2" ] = set ([ "wa" , "id" , "mt" ])
stations[ "3" ] = set ([ "or" , "nv" , "ca" ])
stations[ "4" ] = set ([ "nv" , "ut" ])
stations[ "5" ] = set ([ "ca" , "az" ])
# finally selected radio station list
final_stations = set ()
while states_needed:
best_station = None # Radio stations that currently cover the most uncovered states
states_covered = set () # All uncovered states covered by this radio station
for station, states in stations.items():
covered = states_needed & states # A series of states not covered by the current broadcast station
if len (covered)> len (states_covered): # Check whether the broadcast station covers more states than best_station.
best_station = station # Yes, set best_station as the current broadcast station
states_covered = covered
states_needed = states_covered # Update states_needed to remove the covered states
final_stations.add(best_station) # Add the current best_station to the final broadcast station list
Print (final_stations)
copying the code
NP complete problem
The simple definition of NPcomplete problem is that it is notoriously difficult to solve, such as traveling salesman problem and set covering problem.
How to identify NPcomplete problems
 The algorithm runs very fast when there are fewer elements, but as the number of elements increases, the speed becomes very slow.
 Problems involving "all combinations" are usually NPcomplete.
 The problem cannot be divided into small problems, all possible situations must be considered. This may be an NPcomplete problem.
 If the problem involves a sequence (such as the city sequence in the traveling salesman problem) and is difficult to solve, it may be an NPcomplete problem.
 If the problem involves a collection (such as a collection of broadcast stations) and is difficult to solve, it may be an NPcomplete problem.
 If the problem can be converted to a set covering problem or a traveling salesman problem, then it must be an NPcomplete problem.
summary
 The greedy algorithm looks for a local optimal solution in an attempt to obtain the global optimal solution in this way.
 For the NPcomplete problem, no quick solution has been found yet.
 When facing NPcomplete problems, the best approach is to use approximate algorithms.
 Greedy algorithm is easy to implement and fast, and it is a good approximation algorithm.
Dynamic programming
Dynamic programming is a method of decomposing the original big problem into relatively simple subproblems, first solving the subproblems, and then gradually solving the big problems.
Dynamic programming is often applicable to problems with overlapping subproblems and optimal substructure properties .
Basic idea
For a given problem, first find the solution of its subproblems, and then merge the solutions of the subproblems to get the solution of the original problem. Usually many subproblems are very similar. For this reason, the dynamic programming method attempts to solve each subproblem only once, thereby reducing the amount of calculation: once the solution of a given stator problem has been calculated, it is memorized and stored so that the same subproblem is needed next time Directly check the table when solving.
Divide and conquer and dynamic programming
Common point : Both require the original problem to have the optimal substructure property. Both divide and conquer the original problem into several smaller subproblems (small enough to be easily solved). Then merge the solutions of the subproblems. , Forming a solution to the original problem.
difference:
 The divideandconquer method treats the decomposed subproblems as independent of each other and does it through recursion.
 Dynamic programming understands the decomposed subproblems as being related to each other, there are overlapping parts, which need to be memorized, and it is usually done by iteration.
Problem characteristics
Optimal substructure : When the optimal solution of a problem includes the optimal solution of its subproblems, the problem is said to have the property of optimal substructure.
Overlapping subproblems : When solving the problem from top to bottom using the recursive algorithm, the subproblems generated each time are not always new problems, and some subproblems are repeatedly calculated many times. The dynamic programming algorithm takes advantage of the overlapping nature of this subproblem. It solves each subproblem only once, and then saves its solution in a table, and uses the solutions of these subproblems as much as possible in the future.
EvernoteComicWhat is dynamic programming
summary
 Dynamic programming can help you find the optimal solution under given constraints.
 When the problem can be decomposed into independent and discrete subproblems, dynamic programming can be used to solve
 Every dynamic programming solution involves grids
 The value in the cell is usually the value you want to optimize.
 Each cell is a subproblem, so you should consider how to divide the problem into subproblems, which will help you find the coordinate axis of the grid.
knearest neighbor algorithm
The KNearest Neighbor (KNN) classification algorithm is a theoretically mature method and one of the simplest machine learning algorithms. The idea of this method is: if most of the k most similar (ie, the nearest neighbors in the feature space) samples of a sample in the feature space belong to a certain category, the sample also belongs to this category.
A recommendation system can be created through KNN!
summary
 KNN is used for classification and regression, and the nearest neighbors need to be considered.
 Classification is grouping.
 Regression is the prediction result (such as numbers).
 Feature extraction means converting items (such as fruits or users) into a series of comparable numbers.
 Whether the right features can be selected is critical to the success or failure of the KNN algorithm.