One of the load balancing algorithms-in the Golang way

One of the load balancing algorithms-in the Golang way

Title picture:

FROM What Is Load Balancing? How Load Balancers Work

Before the introduction:

Originally intended to be sent for the Dragon Boat Festival. A few days ago, I was really eager to rest and let my mind go, so I dragged it down. Another point is that the library itself does not have time to evaluate whether it has forgotten to do.

But suddenly I felt that it was okay, so I will post it now.

Introduction

Load balancing is widely used in various orchestration systems, micro-service gateways, Nginx/HAProxy as a pre-site cluster, and so on. In more areas that you can t see, or even places that you can t imagine and have never noticed before, load balancing also appears in different ways, such as mechanical hard drives, read and write access to hard drive groups, and multi-core CPUs. Pipeline distribution and so on.

Therefore, the load balancing technology aims to optimize resource utilization, maximize throughput, minimize access delay, and prevent overload. There are multiple presentation methods.

Since the basic feature of load balancing is to schedule inputs to different computing units, when a computing unit fails, the load balancer can select a normal working unit and avoid the failed unit. When the unit resumes normal operation, the load balancer It can also dispatch requests to the unit correctly. Such a transparent and imperceptible failover feature is often mentioned at the same time as load balancing. For example, the company has three dedicated lines. It doesn't matter which one is broken. There is always a good line for us to export to the cloud for maintenance. These few dedicated lines actually constitute an export bundle capable of failover.

In the Internet, the most basic service, the DNS service, has automatic load balancing capabilities. As long as your domain name has multiple A records to indicate servers with a specific IP, these server groups can evenly obtain incoming requests and provide services.

However, the DNS service does not have any advantages in the feature of failover, so it limits its possible advanced applications. Because of this, the front part of our large-scale website cluster must be a front proxy composed of several nginx. After DNS load balancing is done, another layer of redundant load balancer is still needed, because of this In order to effectively cover up the changes (up and down lines) of the business server group.

In addition to using old-brand software load balancing tools such as nginx/HAProxy, companies that don't need money can also choose dedicated hardware load balancers.

When the line of sight shrinks to the business server group, orchestration software such as K8s actively provides a variety of load balancing forms to expose the nodes and services in its private network.

In fact, K8s may be too expensive (additional server and computing resource requirements, additional operation and maintenance management requirements, etc.), you may go to the cloud server naked, whether it is self-developed service governance or using well-known frameworks such as micro or kong Like, they also provide load balancing and scheduling capabilities.

Regardless of these microservice architectures, when you use a consul cluster integrated with a private DNS server, this small comprehensive DNS cluster can also provide enhanced load balancing capabilities, but it may require a certain architectural design and corresponding coding adaptation.

All that has been said above can not conceal one thesis: The use of software to implement load balancing algorithms always has its value. Because when your system architecture has a certain scale and complexity, when scheduling capabilities, request delays are required to be diluted, and Map-Reduce is required to achieve divide and conquer, in short, you often need to use a small load balancing scheduler. No matter what it might look like.

That's why there is this group of articles.

Although there are countless articles, papers, or source code on load balancing, there are still this group of articles because they are not mine. Mine is mine. So,

Basic load balancing algorithm

From a comprehensive perspective, there are at least these most basic load balancing algorithms:

  1. random
  2. polling
  3. Minimum number of connections
  4. hash
  5. Weighted polling

Let's give a brief introduction in turn, and finally review it again.

Pre-introduction

I always like coding, pure coding. So first introduce some basic interface settings:

type Peer interface { String() string } type Factor interface { Factor() string } type BalancerLite interface { Next(factor Factor) (next Peer, c Constrainable) } type Balancer interface { BalancerLite //...more } type FactorComparable interface { Factor ConstrainedBy(constraints interface {}) (peer Peer, c Constrainable, satisfied bool ) } type FactorString string func (s FactorString) Factor () string { return string (s)} const DummyFactor FactorString = "" type Constrainable interface { CanConstrain(o interface {}) (yes bool ) Check(o interface {}) (satisfied bool ) Peer } Copy code

In order not to fall into the design details, I outline the key points to introduce:

  • Peer
    It is a back-end node.
  • Load balancer
    Balancer
    Hold a group of Peers.
  • Factor
    Yes
    Balancer
    In the selection (
    Next(factor)
    ) A reference object provided by the scheduler when a peer,
    Balancer
    It may be used as one of the factors in choosing the algorithm to work.
  • When you are a scheduler, you want to call Next, but there is no suitable "factor" to provide, then provide
    DummyFactor
    All right.
  • Constrainable
    There may be a separate introduction at the end of this series of articles, but it is enough to ignore it for now.

Random algorithm

As the name suggests, pick one out of the backend list, this is a random algorithm. It is the simplest, combined with our pre-prompt, please understand the following example comprehensively:

package main import ( "fmt" "github.com/hedzr/lb/lbapi" mrand "math/rand" "sync" "sync/atomic" "time" ) type randomS struct { peers []lbapi.Peer count int64 } func (s *randomS) Next (factor lbapi.Factor) (next lbapi.Peer, c lbapi.Constraintenable) { l := int64 ( len (s.peers)) ni := atomic.AddInt64(&s.count, inRange( 0 , l))% l next = s.peers[ni] return } func main () { lb := &randomS{ peers: []lbapi.Peer{ exP( "172.16.0.7:3500" ), exP( "172.16.0.8:3500" ), exP( "172.16.0.9:3500" ), }, count: 0 , } sum := make ( map [lbapi.Peer] int ) for i := 0 ; i < 300 ; i++ { p, _ := lb.Next(lbapi.DummyFactor) sum[p]++ } for k, v := range sum { fmt.Printf( "%v: %v\n" , k, v) } } var seededRand = mrand.New(mrand.NewSource(time.Now().UnixNano())) var seedmu sync.Mutex func inRange (min, max int64 ) int64 { seedmu.Lock() defer seedmu.Unlock() return seededRand.Int63n(max-min) + min } type exP string FUNC (S exp) String () String { return String (S)} copy the code

randomS
Realizes a miniature, simple random algorithm LB.

Although I can simplify it to only provide a few lines of code snippets, in order to make it live & runnable, the salt is added slightly. The example is a bit long because of this, and there are some irrelevant things.

The result of the operation may be:

$ go run ./_examples/simple/random/ 172.16.0.8:3500: 116 172.16.0.7:3500: 94 172.16.0.9:3500: 90 Copy code

Well, the random number generator needs to be uniform, and the order of magnitude should be 5K or even 100K to make sense. So the results here are not evenly divided, it's almost okay.

normalization

What needs to be reminded is that the official random LB code is a little more complicated than the core part above. The reason is that we also need to achieve two other design goals:

  1. Thread safe
  2. Can be nested

It is precisely because the key algorithm of random LB is only three lines:

l := int64 ( len (s.peers)) ni := atomic.AddInt64(&s.count, inRange( 0 , l))% l next = s.peers[ni] Copy code

That's why I have a little space to provide the official version of the code altogether. The almost complete snippet is this:

package random import ( "github.com/hedzr/lb/lbapi" mrand "math/rand" "sync" "sync/atomic" "time" ) var seededRand = mrand.New(mrand.NewSource(time.Now().UnixNano())) var seedmu sync.Mutex func inRange (min, max int64 ) int64 { seedmu.Lock() defer seedmu.Unlock() return seededRand.Int63n(max-min) + min } //New make a new load-balancer instance with Round-Robin func New (opts ...lbapi.Opt) lbapi . Balancer { return (&randomS{}).init(opts...) } type randomS struct { peers []lbapi.Peer count int64 rw sync.RWMutex } func (s *randomS) init (opts ...lbapi.Opt) * randomS { for _, opt := range opts { opt(s) } return s } func (s *randomS) Next (factor lbapi.Factor) (next lbapi.Peer, c lbapi.Constraintenable) { next = s.miniNext() if fc, ok := factor.(lbapi.FactorComparable); ok { next, c, ok = fc.ConstraintBy(next) } else if nested, ok := next.(lbapi.BalancerLite); ok { next, c = nested.Next(factor) } return } func (s *randomS) miniNext () (next lbapi.Peer) { s.rw.RLock() defer s.rw.RUnlock() l := int64 ( len (s.peers)) ni := atomic.AddInt64(&s.count, inRange( 0 , l))% l next = s.peers[ni] return } func (s *randomS) Count () int { s.rw.RLock() defer s.rw.RUnlock() return len (s.peers) } func (s *randomS) Add (peers ...lbapi.Peer) { for _, p := range peers { s.AddOne(p) } } func (s *randomS) AddOne (peer lbapi.Peer) { if s.find(peer) { return } s.rw.Lock() defer s.rw.Unlock() s.peers = append (s.peers, peer) } func (s *randomS) find (peer lbapi.Peer) (found bool ) { s.rw.RLock() defer s.rw.RUnlock() for _, p := range s.peers { if lbapi.DeepEqual(p, peer) { return true } } return } func (s *randomS) Remove (peer lbapi.Peer) { s.rw.Lock() defer s.rw.Unlock() for i, p := range s.peers { if lbapi.DeepEqual(p, peer) { s.peers = append (s.peers[ 0 :i], s.peers[i+ 1 :]...) return } } } func (s *randomS) Clear () { s.rw.Lock() defer s.rw.Unlock() s.peers = nil } Copy code

Comparing the two before and after, it should be able to show that the code of #%$& #%@ is so spicy, :)

Only this time, there will be none below.

Round-Robin

You may not actually know who robin is.

It s actually no one, it was a French vocabulary at first

ruban
, Which means ribbon ribbon
Ribbon
. But time washed away everything, and then I didn t know how to drop it, and it gradually evolved into
robin
Up.

The term round-robin is derived from the French term ruban , meaning " ribbon ". Over a long period of time, the term was corrupted and idiomized to robin .

Round-robin tournament

The round and ruban, which originated from the round-robin system, was finally called Round-robin. As for the round-robin letter mentioned in the Chinese version of the Wiki, you might as well take a look. Of course, as to whether it is Robin from Ai'anmen, it is pure leisure time.

Okay, these are not important at all. The important thing is that this polling algorithm selects one peer one by one. That's it.

So the core of its algorithm is roughly like this:

func (s *rrS) miniNext () (next lbapi.Peer) { ni := atomic.AddInt64(&s.count, 1 ) ni %= int64 ( len (s.peers)) next = s.peers[ni] return } Copy code

s.count will always be incremented, and will not be modulo. The purpose of this is that if the peers array undergoes a small increase or decrease, the final selection may be more ambiguous.

For Golang, when s.count reaches int64.MaxValue, if you continue to increase by one, it will automatically wrap around to 0. This feature is the same as most mainstream compiled languages and is the basic feature provided by the CPU.

Least Connections

If the activity on a back-end service has the least number of connections among all back-ends, then it is chosen.

This algorithm is not an example, because in many cases, managing the number of active connections is an unusual housework, which is troublesome and consumes additional resources-this resource is not an addition or a modulus-so except for things like nginx In fact, it is not used much outside of the professional proxy server.

As long as your system can properly manage the number of connections to your backend, there are no algorithmic problems when implementing a Least Connections LB.

Hashing and Consistent Hashing

review

In the early years, when there was no distinction between microservices and monolithic applications, the load balancing of the Hash algorithm was often used as an artifact, because session retention was often a key factor in the horizontal growth of services, and the user s session-id When the hash value is allocated for scheduling, it can ensure that the session of the source user with the same session-id always falls on a certain back-end server, thus ensuring that the session is always valid.

After the Hash algorithm is extended, it is obvious that you can use the client IP value, host name, url or whatever you want to do the hash calculation, as long as you get the hashCode, you can apply the Hash algorithm. Identifiers such as client IP, client host name, etc., due to the same hashCode, so the corresponding back-end peers can also be consistent, which is why the hash algorithm of the session age is important.

In the next year, the government will not use the browser session method, and instead will use the stateless model, so the hash algorithm is actually a bit lonely.

The essence of the stateless model is to bring a token in the header. This token can be expanded into the identity of the user after logging in, which is equivalent to a browser session. After talking for a long time, the original intention of stateless models, such as JWT, is to scale the server farm horizontally.

Later, in 1997, Karger of MIT published the so-called consistent Hashing algorithm paper. The key difference from traditional hashCode calculation is that, on the one hand, the hashCode is constrained to be a positive integer (int32/uint32/int64, etc.). On the one hand, the positive integer space [0, int.MaxValue] is regarded as a loop that can wrap around, that is, the so-called Hash Ring, and the peers to be selected are evenly distributed on this ring, so as to ensure that each selection can be sufficiently smooth Select each peer.

As for the calculation of the subscript value at the time of selection, there is no limit, so you can add optional strategies to the calculation scheme of the subscript value.

In the field of load balancing, the consistent Hash algorithm adds the Replica factor. It actually adds an index number suffix to the host name of the peer when calculating the hash value of the peer, and the index number is incremented by replica times, so that you get Replica copies of the peer have been created. This expands the scale of the original n peers to the scale of nx Replica, which helps to further improve the smoothness of selection.

Code

So having said so much, the core of the algorithm is roughly like this:

type Hasher func (data [] byte ) uint32 //hashS is a impl with ketama consist hash algor type hashS struct { hasher Hasher //default is crc32.ChecksumIEEE replica int //default is 32 hashRing [] uint32 keys map [ uint32 ]lbapi.Peer peers map [lbapi.Peer] bool rw sync.RWMutex } func (s *hashS) Next (factor lbapi.Factor) (next lbapi.Peer, c lbapi.Constraintenable) { var hash uint32 hash = s.hasher([] byte (factor.Factor())) ix := sort.Search( len (s.hashRing), func (i int ) bool { return s.hashRing[i] >= hash }) if ix == len (s.hashRing) { ix = 0 } hashValue := s.hashRing[ix] if p, ok := s.keys[hashValue]; ok { if _, ok = s.peers[p]; ok { next = p } } return } func (s *hashS) Add (peers ...lbapi.Peer) { for _, p := range peers { s.peers[p] = true for i := 0 ; i <s.replica; i++ { hash := s.hasher(s.peerToBinaryID(p, i)) s.hashRing = append (s.hashRing, hash) s.keys[hash] = p } } sort.Slice(s.hashRing, func (i, j int ) bool { return s.hashRing[i] <s.hashRing[j] }) } func (s *hashS) peerToBinaryID (p lbapi.Peer, replica int ) [] byte { str := fmt.Sprintf( "%v-%05d" , p, replica) return [] byte (str) } Copy code

The hashRing structure is established in the Add implementation. Although it is a ring, it is achieved by modulo the array and the subscript. In addition, the keys map solves the mapping relationship from the hash value of the peer to the peer. In the future (in Next), the corresponding peer can be obtained immediately after picking a point from the hashRing.

In Next, the calculation of the hash value of the factor is mainly done. The result of the calculation is mapped to a point pt on the hashRing. If there is not exactly one peer hit, it will scan backwards for the peer closest to pt.

The algorithm idea is summarized

Actually, I don't plan to write this paragraph, because the pictures drawn by other people are so beautiful and thorough.

For example, this picture:

FROM Wayne's Blog-System Design 101-Consistent Hashing

Keynote, drooling, I really plan to learn.

If you are interested in every detail of this algorithm, you can also watch these two videos:

Earlier we mentioned replica, which is a method of creating virtual nodes of peers on hashRing. This idea is to maximize the uneven and uneven selection results caused by changes in peers. Basically, it originally came from libketama, a memcached library, so in most cases the same idea is called the Ketama Hashing algorithm. You can refer to here .

Weighted Round-robin

Adding weight is a very important feature. So in the basic LB algorithm chapter, we need to use weighted round-robin (wrr) as an example to mention it.

The weighted polling algorithm adds a weight to each peer, and adds this weight on the basis of average polling as an adjustment.

The most famous implementation of the weighted polling algorithm has to talk about Nginx and LVS.

Nginx smooth weighted polling

Nginx adopts a method of balancing selection based on the difference between total and each weight: each time it is selected, the currentWeight of each node is added to its weight value, and then the node with the largest currentWeight is selected, and the node's The currentWeight is subtracted from its weight value before returning to the increment. Under such repeated selections, more weighted nodes are selected because of faster increments.

This algorithm idea can be rigorously proven mathematically, but I won t talk about this one, the main point is the core implementation:

func (s *wrrS) Next () (best lbapi.Peer) { total := 0 for _, node := range s.peers { if node == nil { continue } total += s.mUpdate(node, 0 , false ) if s.mTest(best, node) { best = node } } if best != nil { s.mUpdate(best, -total, true ) } return } func (s *wrrS) mUpdate (node lbapi.Peer, delta int , success bool ) (total int ) { if delta == 0 { delta = sm[node].weight } sm[node].current += delta return sm[node].weight } func (s *wrrS) mTest (best, node lbapi.Peer) bool { return best == nil || sm[node].current> sm[best].current } Copy code

The above code has been refined, the actual code is more complicated, because we still need to do the lock operation.

LVS smooth weighted polling

As for the weighted polling method of LVS, the core idea is to use the gcd (greatest common divisor) method.

/* Supposing that there is a server set S = {S0, S1, , Sn-1}; W(Si) indicates the weight of Si; i indicates the server selected last time, and i is initialized with -1; cw is the current weight in scheduling, and cw is initialized with zero; max(S) is the maximum weight of all the servers in S; gcd(S) is the greatest common divisor of all server weights in S; */ while ( true ) { i = (i + 1 ) mod n; if (i == 0 ) { cw = cw-gcd(S); if (cw <= 0 ) { cw = max(S); if (cw == 0 ) return NULL ; } } if (W(Si) >= cw) return Si; } Copy code

You can understand it like this: For three nodes A, B, C, with weights x, y, z, imagine an array filled with x times A, y times B, and z times C, and then use polling When scanning this array, does the final selection ratio satisfy the weight distribution?

In fact, many years ago, my first implementation of wrr was done like this. Later, when the weight value was very large, I was very annoyed. It was not until one day later that I saw the introduction of the LVS algorithm, and I sighed that it was really the book. what.

summary

The main basic LB algorithm has been explained above.

These basic algorithms can be further combined and evolved, for example

  • Weighted random algorithm
  • (Request) Source Hash
  • Destination Hash
  • Weighted least connection algorithm
  • and many more

We want to achieve these goals in the new class library , so further code design and reorganization are needed. For space reasons, next time we will explain what kind of thinking is needed to build a class library.

:end: