MySQL database interview questions

MySQL database interview questions

Basic knowledge of database Why use database data to be stored in memory Advantages: fast access speed Disadvantages: data cannot be saved permanently. 2) It is inconvenient to query data. Data is stored in the database. 1) The data is stored permanently. 2) The query is convenient and efficient by using SQL statements. 3) What is SQL for convenient data management? Structured Query Language (Structured Query Language) is abbreviated as SQL, which is a database query language. Function: Used for accessing data, querying, updating and managing relational database system. What is MySQL? MySQL is a relational database management system developed by the Swedish company MySQL AB and is a product of Oracle. MySQL is one of the most popular relational database management systems. In terms of web applications, MySQL is one of the best RDBMS (Relational Database Management System) application software. It is very commonly used in Java enterprise-level development, because MySQL is open source, free, and easy to expand. What is the first paradigm of the three major database paradigms: each column cannot be split.

2.paradigm: On the basis of the first paradigm, non-primary key columns are completely dependent on the primary key, and cannot be part of the primary key.

Third paradigm: On the basis of the second paradigm, non-primary key columns only depend on the primary key and do not depend on other non-primary keys.

When designing the database structure, try to comply with the three paradigms. If you do not comply, there must be sufficient reasons. Such as performance. In fact, we often compromise database design for performance.

Which mysql-related permission tables have MySQL servers use permission tables to control user access to the database. The permission tables are stored in the mysql database and initialized by the mysql_install_db script. These privilege tables are user, db, table_priv, columns_priv and host respectively. The following describes the structure and content of these tables respectively: User permission table: record the user account information allowed to connect to the server, and the permissions inside are at the global level. db permissions table: record the operating permissions of each account on each database. table_priv permission table: record data table-level operation permissions. columns_priv permission table: record data column-level operation permissions. Host authority table: cooperate with the db authority table to make more detailed control of database-level operation authority on a given host. This permission table is not affected by the GRANT and REVOKE statements.

How many input formats does MySQL binlog have? What is the difference? There are three formats, statement, row and mixed. In statement mode, every sql that will modify data will be recorded in binlog. There is no need to record the changes of each line, which reduces the amount of binlog, saves IO, and improves performance. Since the execution of sql is contextual, relevant information needs to be saved when saving, and there are some statements that use functions and the like that cannot be recorded and copied. At the row level, the context-related information of the SQL statement is not recorded, only which record has been modified is saved. The record unit is the change of each line, which can basically be recorded all but due to many operations, a large number of line changes (such as alter table) will be caused. Therefore, the file of this mode saves too much information and the log volume is too large. mixed, a compromise solution, use statement records for common operations, and row when statement cannot be used. In addition, some optimizations have been made to the row level in the new version of MySQL. When the table structure changes, the statement will be recorded instead of row by row.

What data types does mysql have

1. Integer type, including TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT, which represent 1 byte, 2 byte, 3 byte, 4 byte, and 8 byte integer respectively. Any integer type can be added with the UNSIGNED attribute, indicating that the data is unsigned, that is, a non-negative integer. Length: Integer type can be specified length, for example: INT(11) means INT type with length 11. Length is meaningless in most scenarios. It does not limit the legal range of values, but only affects the number of displayed characters, and it is only meaningful when used in conjunction with the UNSIGNED ZEROFILL attribute. For example, suppose the type is set to INT(5) and the attribute is UNSIGNED ZEROFILL, if the data inserted by the user is 12, the actual data stored in the database is 00012. 2. Real number types, including FLOAT, DOUBLE, and DECIMAL. DECIMAL can be used to store integers larger than BIGINT and can store exact decimals. The FLOAT and DOUBLE have a range of values, and support the use of standard floating point for approximate calculations. FLOAT and DOUBLE are more efficient than DECIMAL when calculating. DECIMAL can be understood as a character string for processing. 3, string types, including VARCHAR, CHAR, TEXT, BLOB VARCHAR is used to store variable-length strings, it saves more space than fixed-length types. VARCHAR uses an extra 1 or 2 bytes to store the length of the string. When the column length is less than 255 bytes, use 1 byte for representation, otherwise use 2 bytes for representation. When the content stored in VARCHAR exceeds the set length, the content will be truncated. CHAR is fixed-length, and sufficient space is allocated according to the defined string length. CHAR will be filled with spaces as needed to facilitate comparison. CHAR is suitable for storing very short strings, or all values are close to the same length. When the content stored in CHAR exceeds the set length, the content will also be truncated. Usage strategy: For frequently changing data, CHAR is better than VARCHAR, because CHAR is not prone to fragmentation. For very short columns, CHAR is more efficient in storage space than VARCHAR. Pay attention to allocate only the space needed when using it. Longer columns will consume more memory when sorting. Try to avoid using TEXT/BLOB types. Temporary tables will be used when querying, causing serious performance overhead. 4. Enumerated type (ENUM), which stores unique data as a predefined collection. Sometimes you can use ENUM instead of commonly used string types. ENUM storage is very compact and will compress the list value to one or two bytes. When ENUM is stored internally, it actually stores an integer. Try to avoid using numbers as constants for ENUM enumerations, because it is easy to get confused. Sorting is in accordance with the internally stored integer 5. Date and time types. Try to use timestamp. Space efficiency is higher than datetime. It is usually inconvenient to store timestamps in integers. If you need to store subtle, you can use bigint storage. Seeing this, this real question is easier to answer. Engine MySQL storage engine MyISAM differs from InnoDB storage engine Storage engine: how data, indexes and other objects in MySQL are stored is the realization of a set of file systems. Commonly used storage engines are as follows: Innodb engine: Innodb engine provides support for database ACID transactions. It also provides row-level locks and foreign key constraints. The goal of its design is to handle a database system with large data capacity. MyIASM engine (the original default engine of Mysql): does not provide transaction support, nor does it support row-level locks and foreign keys. MEMORY engine: All data is in the memory, the data processing speed is fast, but the security is not high.

The difference between MyISAM and InnoDB

The difference between MyISAM index and InnoDB index? InnoDB index is a clustered index, MyISAM index is a non-clustered index. The leaf nodes of InnoDB's primary key index store row data, so the primary key index is very efficient. The leaf node of the MyISAM index stores the row data address, which needs to be addressed again to get the data. InnoDB non-primary key index leaf nodes store the primary key and other indexed column data, so it is very efficient to cover the index when querying.

4 major features of the InnoDB engine insert buffer double write adaptive hash index (ahi) read ahead storage engine selection If there is no special requirement, use the default Innodb That's it. MyISAM: Applications that focus on reading and writing, such as blog systems and news portals. Innodb: update (delete) operation frequency is also high, or to ensure data integrity; high concurrency, support transactions and foreign keys. Such as OA automated office system. What is an index? Indexes are a special type of file (the index on the InnoDB data table is an integral part of the table space), they contain reference pointers to all records in the data table.

Index is a data structure. The database index is a sorted data structure in the database management system to help quickly query and update the data in the database table. The realization of the index usually uses the B tree and its variant B+ tree.

More generally speaking, an index is equivalent to a directory. In order to find the contents of the book conveniently, a catalog is formed by indexing the contents. Index is a file, it is to occupy physical space. What are the advantages and disadvantages of indexes? Advantages of index It can greatly speed up the retrieval of data, which is also the main reason for creating an index. By using the index, you can use the optimization hider in the query process to improve the performance of the system. Disadvantages of indexes Time: it takes time to create and maintain indexes. Specifically, when adding, deleting, and modifying data in a table, indexes should also be dynamically maintained, which will reduce the execution of additions/changes/deletions Efficiency; Space: The index needs to take up physical space. Index usage scenarios (emphasis) where

In the above figure, the records are queried based on id. Because the id field only establishes the primary key index, the only optional index for this SQL execution is the primary key index. If there are more than one, the better one will be selected as the basis for retrieval.

- Add a field that is not indexed alter table innodb1 add sex char(1); - The optional index when searching by sex is null EXPLAIN SELECT * from innodb1 where sex='male';

You can try to create an index on the field (alter table table name add index (field name)) according to the efficiency of the query when a field is not indexed. The same SQL execution efficiency, you will find that the query efficiency will be Significant improvement (the larger the amount of data, the more obvious). You can try to create an index on the field (alter table table name add index (field name)) when a field is not indexed, and then create an index on the field (alter table table name add index (field name)), and you will find that the query efficiency will be Obvious improvement (the larger the amount of data, the more obvious).

Join can improve efficiency by indexing the fields involved in the matching relationship (on) of the join statement

Index coverage

If the fields to be queried have been indexed, the engine will directly query in the index table without accessing the original data (otherwise, as long as one field is not indexed, it will do a full table scan). This is called index coverage. Therefore, we need to write only the necessary query fields after select as much as possible to increase the probability of index coverage. It is worth noting here that do not think about creating an index for each field, because the advantage of using an index first lies in its small size. What are the types of indexes? 4. What are the types of indexes? Primary key index: The data column is not allowed to be repeated, and it is not allowed to be NULL. A table can only have one primary key. Unique index: Data columns are not allowed to be repeated, and NULL values are allowed. A table allows multiple columns to create unique indexes. Alter TABLE table_name ADD UNIQUE (column); Create unique index value. Alter TABLE table_name ADD INDEX index_name (column); to create a common index Alter TABLE table_name ADD INDEX index_name(column1, column2, column3); to create a composite index full-text index: is a key technology currently used by search engines. You can use ALTER TABLE table_name ADD FULLTEXT (column); Create the data structure of the full-text index index (b-tree, hash) The data structure of the index is related to the implementation of the specific storage engine. The most used indexes in MySQL are Hash index, B+ tree index, etc., and the InnoDB storage engine we often use The default index is implemented as: B+ tree index. For a hash index, the underlying data structure is a hash table, so when most of the requirements are for a single record query, you can choose a hash index, which has the fastest query performance; for most other scenarios, it is recommended to choose a BTree index. 1) B-tree index mysql fetches data through the storage engine. Basically, 90% of people use InnoDB. According to the implementation method, there are currently only two types of InnoDB index: BTREE (B-tree) index and HASH index. B-tree index is the most frequently used index type in Mysql database. Basically all storage engines support BTree index. Usually, the index we say is not unexpectedly (B-tree) index (actually implemented with B+ tree, because when viewing table index, mysql always prints BTREE, so it is referred to as B-tree index)

Query method: Primary key index area: PI (the address of the saved data) press the primary key query, common index area: si (the address of the associated id, and then reach the address above). Therefore, query by primary key, the fastest B+tree properties: 1.) The nodes of n subtrees contain n keywords, which are not used to save data but to save the index of the data. 2.) All leaf nodes contain information about all keywords, and pointers to records containing these keywords, and the leaf nodes themselves are linked in order of the size of the keywords from small to large. 3.) All non-terminal nodes can be regarded as index parts, and the nodes only contain the largest (or smallest) keywords in their subtrees. 4.) In the B+ tree, the insertion and deletion of data objects are only performed on the leaf nodes. 5.) The B+ tree has 2 head pointers, one is the root node of the tree, and the other is the leaf node with the smallest key code. 2) Hash index briefly, similar to the simple implementation of HASH table (hash table) in data structure, when we use hash index in mysql, it is mainly through Hash algorithm (common Hash algorithm has direct addressing method) , Square take middle method, folding method, divisor remainder method, random number method), the database field data is converted into a fixed-length Hash value, and the row pointer of this data is stored in the corresponding position of the Hash table; if it occurs Hash collision (the Hash value of two different keywords is the same), it is stored in the form of a linked list under the corresponding Hash key. Of course, this is only a simplified simulation diagram.

The basic principle of index Index is used to quickly find those records with specific values. If there is no index, generally the entire table is traversed when the query is executed. The principle of indexing is very simple, that is, to turn unordered data into ordered queries. 1. Sort the content of the indexed column. 2. Generate an inverted table for the sort results. 3. Add data to the contents of the inverted table. Address chain 4. When inquiring, first get the contents of the inverted table, and then take out the data address chain, so as to get the specific data indexing algorithm? Index algorithm has BTree algorithm and Hash algorithm. BTree algorithm BTree is the most commonly used mysql database index algorithm, and it is also the default algorithm of mysql. Because it can be used not only in the comparison operators =, >, >=, <, <= and between, but also in the like operator, as long as its query condition is a constant that does not start with a wildcard, for example : - As long as its query condition is a constant that does not start with a wildcard select * from user where name like'jack%'; - If a wildcard starts with, or no constant is used, the index will not be used, for example: select * from user where name like'%jack'; Hash algorithm Hash Hash index can only be used for peer-to-peer comparison, such as =, <=> (equivalent to =) operator. Since the data is located once, unlike the BTree index, which requires multiple IO accesses from the root node to the branch node and finally to the page node, the retrieval efficiency is much higher than that of the BTree index. The principle of index design? 1. The column suitable for indexing is the column that appears in the where clause, or the column specified in the join clause 2. The class with small cardinality, the index effect is poor, there is no need to establish an index in this column 3. Use a short index, If you index a long string column, you should specify a prefix length, which can save a lot of index space 4. Don't over-index. Indexing requires additional disk space and reduces the performance of write operations. When the table content is modified, the index will be updated or even reconstructed. The more index columns, the longer this time will be. So only maintain the required index to facilitate the query. The principle of index creation (the most important thing) Although the index is good, but it is not unlimited use, it is best to comply with the following principles 1) The leftmost prefix matching principle, the principle of composite index is very important, mysql will always match to the right Stop matching until a range query (>, <, between, like) is encountered, such as a = 1 and b = 2 and c> 3 and d = 4 If an index of (a, b, c, d) order is established, d The index is not used. If the index of (a, b, d, c) is established, it can be used. The order of a, b, d can be adjusted arbitrarily. 2) Indexes are created for fields that are frequently used as query conditions. 3) Frequently updated fields are not suitable for indexing. 4) Columns that cannot effectively distinguish data are not suitable for index columns (such as gender, gender, gender, unknown, and at most three types. Distinguish The degree is too low) 5) Expand the index as much as possible, do not create a new index. For example, there is already an index of a in the table, and now you want to add an index of (a, b), you only need to modify the original index. 6) Data columns with foreign keys must be indexed. 7) For columns that are rarely involved in the query, do not create indexes for columns with more duplicate values. 8) Do not create indexes for columns defined as text, image, and bit data types. There are three ways to create an index. The first way to delete an index is to create an index when executing CREATE TABLE. CREATE TABLE user_index2 (id INT auto_increment PRIMARY KEY, first_name VARCHAR (16), last_name VARCHAR (16), id_card VARCHAR (18), information text, KEY name (first_name, last_name), FULLTEXT KEY (information), UNIQUE KEY (id_card) ); The second way: use the ALTER TABLE command to increase the index ALTER TABLE table_name ADD INDEX index_name (column_list); 1 ALTER TABLE is used to create a normal index, a UNIQUE index or a PRIMARY KEY index. Among them, table_name is the name of the table whose index is to be added, and column_list indicates which columns to index. When there are multiple columns, each column is separated by commas. The index name index_name can be named by yourself. By default, MySQL will assign a name based on the first index column. In addition, ALTER TABLE allows multiple tables to be changed in a single statement, so multiple indexes can be created at the same time. The third way: use the CREATE INDEX command to create CREATE INDEX index_name ON table_name (column_list); CREATE INDEX can add ordinary indexes or UNIQUE indexes to the table. (However, PRIMARY KEY index cannot be created) Delete index Delete ordinary index, unique index, full-text index according to the index name: alter table table name drop KEY index name alter table user_index drop KEY name; alter table user_index drop KEY id_card; alter table user_index drop KEY information; Delete the primary key index: alter table table name drop primary key (because there is only one primary key). It is worth noting here that if the primary key self-growth, then this operation cannot be performed directly (self-growth depends on the primary key index): 1 ALTER TABLE is used to create ordinary indexes, UNIQUE indexes or PRIMARY KEY indexes. Among them, table_name is the name of the table whose index is to be added, and column_list indicates which columns to index. When there are multiple columns, each column is separated by commas. The index name index_name can be named by yourself. By default, MySQL will assign a name based on the first index column. In addition, ALTER TABLE allows multiple tables to be changed in a single statement, so multiple indexes can be created at the same time. The third way: use the CREATE INDEX command to create CREATE INDEX index_name ON table_name (column_list); CREATE INDEX can add ordinary indexes or UNIQUE indexes to the table. (However, PRIMARY KEY index cannot be created) Delete index Delete ordinary index, unique index, full-text index according to the index name: alter table table name drop KEY index name alter table user_index drop KEY name; alter table user_index drop KEY id_card; alter table user_index drop KEY information; Delete the primary key index: alter table table name drop primary key (because there is only one primary key). It is worth noting here that if the primary key grows automatically, then this operation cannot be performed directly (self-growth depends on the primary key index): 1 ALTER TABLE is used to create ordinary indexes, UNIQUE indexes or PRIMARY KEY indexes. Among them, table_name is the name of the table whose index is to be added, and column_list indicates which columns to index. When there are multiple columns, each column is separated by commas. The index name index_name can be named by yourself. By default, MySQL will assign a name based on the first index column. In addition, ALTER TABLE allows multiple tables to be changed in a single statement, so multiple indexes can be created at the same time. The third way: use the CREATE INDEX command to create CREATE INDEX index_name ON table_name (column_list); CREATE INDEX can add ordinary indexes or UNIQUE indexes to the table. (However, PRIMARY KEY index cannot be created) Delete index Delete ordinary index, unique index, and full-text index according to the index name: alter table table name drop KEY index name alter table user_index drop KEY name; alter table user_index drop KEY id_card; alter table user_index drop KEY information; Delete the primary key index: alter table table name drop primary key (because there is only one primary key). It is worth noting here that if the primary key self-growth, then this operation cannot be performed directly (self-growth depends on the primary key index): CREATE INDEX can add ordinary index or UNIQUE index to the table. (However, PRIMARY KEY index cannot be created) Delete index Delete ordinary index, unique index, and full-text index according to the index name: alter table table name drop KEY index name alter table user_index drop KEY name; alter table user_index drop KEY id_card; alter table user_index drop KEY information; Delete the primary key index: alter table table name drop primary key (because there is only one primary key). It is worth noting here that if the primary key self-growth, then this operation cannot be performed directly (self-growth depends on the primary key index): CREATE INDEX can add ordinary index or UNIQUE index to the table. (However, PRIMARY KEY index cannot be created) Delete index Delete ordinary index, unique index, full-text index according to the index name: alter table table name drop KEY index name alter table user_index drop KEY name; alter table user_index drop KEY id_card; alter table user_index drop KEY information; Delete the primary key index: alter table table name drop primary key (because there is only one primary key). It is worth noting here that if the primary key self-growth, then this operation cannot be performed directly (self-growth depends on the primary key index):

Need to cancel self-growth and then delete: alter table user_index - redefine the field MODIFY id int, drop PRIMARY KEY but usually will not delete the primary key, because the design of the primary key must have nothing to do with business logic. What should I pay attention to when creating an index? Non-empty fields: You should specify the column as NOT NULL, unless you want to store NULL. In mysql, columns with null values are difficult to query optimization because they make indexes, index statistics, and comparison operations more complicated. You should replace the null value with 0, a special value or an empty string; Fields with large discrete values: the column (the degree of difference between the values of the variables) is placed in front of the joint index, and you can pass count() The function looks at the difference value of the field. The larger the return value, the more the unique value of the field, the higher the degree of dispersion of the field; The smaller the index field, the better: The data storage of the database is based on the page unit. The more data stored per page is an IO operation The greater the data obtained, the higher the efficiency. Can index query definitely improve query performance? Why usually, querying data through an index is faster than a full table scan. But we must also pay attention to its cost. The index needs space for storage and regular maintenance. Whenever a record is added or decreased in the table or the index column is modified, the index itself will also be modified. This means that INSERT, DELETE, and UPDATE for each record will cost 4 or 5 more disk I/Os. Because indexes require additional storage space and processing, unnecessary indexes will slow down query response time. Using index queries may not improve query performance. Index range queries (INDEX RANGE SCAN) are suitable for two situations: Based on a range of retrieval, the general query returns a result set that is less than 30% of the number of records in the table Based on non-unique index How to delete data of millions of levels or more about the index: Because the index requires additional maintenance costs, because the index file is a separate file, when we add, modify, or delete data, additional index files will be generated These operations need to consume additional IO, which will reduce the execution efficiency of add/modify/delete. Therefore, when we delete millions of data in the database, query the official MySQL manual to find that the speed of deleting data is directly proportional to the number of indexes created. 1. So when we want to delete millions of data, we can delete the index first (it will take more than three minutes at this time) 2. Then delete the useless data (this process takes less than two minutes) 3. After the deletion is complete, re-create the index (the data is less at this time). The index creation is also very fast, about ten minutes. 4. The direct deletion is definitely much faster than before, not to mention that if the deletion is interrupted, all the deletions will be rolled back. That is even more a pit. Prefix index syntax: index(field(10)), use the first 10 characters of the field value to build the index, the default is to use the entire content of the field to build the index. Prerequisite: The prefix has a high degree of identification. For example, passwords are suitable for establishing a prefix index, because passwords are almost different. The difficulty of actual operation: lies in the length of the prefix interception. We can use select count(*)/count(distinct left(password,prefixLen)); to view an average matching degree of different prefix lengths by adjusting the value of prefixLen (increment from 1), when it is close to 1. The first prefixLen characters of a password can almost determine the only record) What is the principle of the leftmost prefix? What is the leftmost matching principle? As the name suggests, it is leftmost first. When creating a multi-column index, according to business needs, the most frequently used column in the where clause is placed on the leftmost side. The leftmost prefix matching principle, a very important principle, mysql will always match to the right until it encounters a range query (>, <, between, like) and stop matching, such as a = 1 and b = 2 and c> 3 and d = 4 If you create an index of (a, b, c, d) order, d is not an index, if you create an index of (a, b, d, c), you can all use it, a, b, d The order can be adjusted arbitrarily. = and in can be out of order, for example, a = 1 and b = 2 and c = 3 to build (a, b, c) index can be in any order, mysql's query optimizer will help you optimize the form of B-tree that can be recognized by the index Difference from B+ tree In B tree, you can store keys and values in internal nodes and leaf nodes; but in B+ tree, internal nodes are keys without values, and leaf nodes store keys and values at the same time. The leaf nodes of the B+ tree are connected by a chain, and the leaf nodes of the B tree are independent. Compared with the previous direct deletion, it is definitely much faster, not to mention that in case the deletion is interrupted, all deletions will be rolled back. That is even more a pit. Prefix index syntax: index(field(10)), use the first 10 characters of the field value to build the index, the default is to use the entire content of the field to build the index. Prerequisite: The prefix has a high degree of identification. For example, passwords are suitable for establishing a prefix index, because passwords are almost different. The difficulty of actual operation: lies in the length of the prefix interception. We can use select count(*)/count(distinct left(password,prefixLen)); to view an average matching degree of different prefix lengths by adjusting the value of prefixLen (increment from 1), when it is close to 1. The first prefixLen characters of a password can almost determine the only record) What is the principle of the leftmost prefix? What is the leftmost matching principle? As the name suggests, it is leftmost first. When creating a multi-column index, according to business needs, the most frequently used column in the where clause is placed on the leftmost side. The leftmost prefix matching principle, a very important principle, mysql will always match to the right until it encounters a range query (>, <, between, like) and then stop matching, such as a = 1 and b = 2 and c> 3 and d = 4 If the index of (a, b, c, d) is established, d is not used. If the index of (a, b, d, c) is established, it can all be used, a, b, d The order can be adjusted arbitrarily. = and in can be out of order, for example, a = 1 and b = 2 and c = 3 to build (a, b, c) index can be in any order, mysql's query optimizer will help you optimize the form of B-tree that can be recognized by the index Difference from B+ tree In B tree, you can store keys and values in internal nodes and leaf nodes; but in B+ tree, internal nodes are keys without values, and leaf nodes store keys and values at the same time. The leaf nodes of the B+ tree are connected by a chain, and the leaf nodes of the B tree are independent. Compared with the previous direct deletion, it is definitely much faster, not to mention that in case the deletion is interrupted, all deletions will be rolled back. That is even more a pit. Prefix index syntax: index(field(10)), use the first 10 characters of the field value to build the index, the default is to use the entire content of the field to build the index. Prerequisite: The prefix has a high degree of identification. For example, passwords are suitable for establishing a prefix index, because passwords are almost different. The difficulty of actual operation: lies in the length of the prefix interception. We can use select count(*)/count(distinct left(password,prefixLen)); to view an average matching degree of different prefix lengths by adjusting the value of prefixLen (increment from 1), when it is close to 1. The first prefixLen characters of a password can almost determine the only record) What is the principle of the leftmost prefix? What is the leftmost matching principle? As the name suggests, it is the leftmost first. When creating a multi-column index, according to business needs, the most frequently used column in the where clause is placed on the leftmost side. The leftmost prefix matching principle, a very important principle, mysql will always match to the right until it encounters a range query (>, <, between, like) and then stop matching, such as a = 1 and b = 2 and c> 3 and d = 4 If you create an index of (a, b, c, d) order, d is not an index, if you create an index of (a, b, d, c), you can all use it, a, b, d The order can be adjusted arbitrarily. = and in can be out of order, for example, a = 1 and b = 2 and c = 3 The establishment of (a, b, c) indexes can be in any order, and the mysql query optimizer will help you optimize the B-tree in a form that the index can recognize Difference from B+ tree In B tree, you can store keys and values in internal nodes and leaf nodes; but in B+ tree, internal nodes are keys without values, and leaf nodes store keys and values at the same time. The leaf nodes of the B+ tree are connected by a chain, and the leaf nodes of the B tree are independent. All deletions will be rolled back. That's even more a pit. Prefix index syntax: index(field(10)), use the first 10 characters of the field value to build the index, the default is to use the entire content of the field to build the index. Prerequisite: The prefix has a high degree of identification. For example, passwords are suitable for establishing a prefix index, because passwords are almost different. The difficulty of the actual operation: lies in the length of the prefix interception. We can use select count(*)/count(distinct left(password,prefixLen)); to view an average matching degree of different prefix lengths by adjusting the value of prefixLen (increment from 1), when it is close to 1. The first prefixLen characters of a password can almost determine the only record) What is the principle of the leftmost prefix? What is the leftmost matching principle? As the name suggests, it is leftmost first. When creating a multi-column index, according to business needs, the most frequently used column in the where clause is placed on the leftmost side. The leftmost prefix matching principle, a very important principle, mysql will always match to the right until it encounters a range query (>, <, between, like) and stop matching, such as a = 1 and b = 2 and c> 3 and d = 4 If the index of (a, b, c, d) is established, d is not used. If the index of (a, b, d, c) is established, it can all be used, a, b, d The order can be adjusted arbitrarily. = and in can be out of order, for example, a = 1 and b = 2 and c = 3 The establishment of (a, b, c) indexes can be in any order, and the mysql query optimizer will help you optimize the B-tree in a form that the index can recognize Difference from B+ tree In B tree, you can store keys and values in internal nodes and leaf nodes; but in B+ tree, internal nodes are keys without values, and leaf nodes store keys and values at the same time. The leaf nodes of the B+ tree are connected by a chain, and the leaf nodes of the B tree are independent. All deletions will be rolled back. That's even more a pit. Prefix index syntax: index(field(10)), use the first 10 characters of the field value to build the index, the default is to use the entire content of the field to build the index. Prerequisite: The prefix has a high degree of identification. For example, passwords are suitable for establishing a prefix index, because passwords are almost different. The difficulty of the actual operation: lies in the length of the prefix interception. We can use select count(*)/count(distinct left(password,prefixLen)); to view an average matching degree of different prefix lengths by adjusting the value of prefixLen (increment from 1), when it is close to 1. The first prefixLen characters of a password can almost determine the only record) What is the principle of the leftmost prefix? What is the leftmost matching principle? As the name suggests, it is the leftmost first. When creating a multi-column index, according to business needs, the most frequently used column in the where clause is placed on the leftmost side. The leftmost prefix matching principle, a very important principle, mysql will always match to the right until it encounters a range query (>, <, between, like) and then stop matching, such as a = 1 and b = 2 and c> 3 and d = 4 If you create an index of (a, b, c, d) order, d is not an index, if you create an index of (a, b, d, c), you can all use it, a, b, d The order can be adjusted arbitrarily. = and in can be out of order, for example, a = 1 and b = 2 and c = 3 to build (a, b, c) index can be in any order, mysql's query optimizer will help you optimize the form of B-tree that can be recognized by the index Difference from B+ tree In B tree, you can store keys and values in internal nodes and leaf nodes; but in B+ tree, internal nodes are keys without values, and leaf nodes store keys and values at the same time. The leaf nodes of the B+ tree are connected by a chain, and the leaf nodes of the B tree are independent. , By adjusting the value of prefixLen (increment from 1) to view an average matching degree of different prefix lengths, it is enough when it is close to 1 (the first prefixLen characters of a password can almost determine the only record) What is the leftmost prefix in principle? What is the leftmost matching principle? As the name suggests, it is leftmost first. When creating a multi-column index, according to business needs, the most frequently used column in the where clause is placed on the leftmost side. Leftmost prefix matching principle, a very important principle, mysql will always match to the right until it encounters a range query (>, <, between, like) and stop matching, such as a = 1 and b = 2 and c> 3 and d = 4 If you create an index of (a, b, c, d) order, d is not an index, if you create an index of (a, b, d, c), you can all use it, a, b, d The order can be adjusted arbitrarily. = and in can be out of order, for example, a = 1 and b = 2 and c = 3 to build (a, b, c) index can be in any order, mysql's query optimizer will help you optimize the form of B-tree that can be recognized by the index Difference from B+ tree In B tree, you can store keys and values in internal nodes and leaf nodes; but in B+ tree, internal nodes are keys without values, and leaf nodes store keys and values at the same time. The leaf nodes of the B+ tree are connected by a chain, and the leaf nodes of the B tree are independent. , By adjusting the value of prefixLen (increment from 1) to view an average matching degree of different prefix lengths, it is enough when it is close to 1 (the first prefixLen characters of a password can almost determine the only record) What is the leftmost prefix in principle? What is the leftmost matching principle? As the name suggests, it is leftmost first. When creating a multi-column index, according to business needs, the most frequently used column in the where clause is placed on the leftmost side. The leftmost prefix matching principle, a very important principle, mysql will always match to the right until it encounters a range query (>, <, between, like) and then stop matching, such as a = 1 and b = 2 and c> 3 and d = 4 If you create an index of (a, b, c, d) order, d is not an index, if you create an index of (a, b, d, c), you can all use it, a, b, d The order can be adjusted arbitrarily. = and in can be out of order, for example, a = 1 and b = 2 and c = 3 The establishment of (a, b, c) indexes can be in any order, and the mysql query optimizer will help you optimize the B-tree in a form that the index can recognize Difference from B+ tree In B tree, you can store keys and values in internal nodes and leaf nodes; but in B+ tree, internal nodes are keys without values, and leaf nodes store keys and values at the same time. The leaf nodes of the B+ tree are connected by a chain, and the leaf nodes of the B tree are independent. c) Indexes can be in any order, MySQL's query optimizer will help you optimize the difference between B-tree and B+ tree in a form that the index can recognize. In B-tree, you can store keys and values in internal nodes and leaf nodes; but In the B+ tree, the internal nodes are all keys without values, and the leaf nodes store both the key and the value. The leaf nodes of the B+ tree are connected by a chain, and the leaf nodes of the B tree are independent. c) Indexes can be in any order, MySQL's query optimizer will help you optimize the difference between B-tree and B+ tree in a form that the index can recognize. In B-tree, you can store keys and values in internal nodes and leaf nodes; but In the B+ tree, the internal nodes are all keys without values, and the leaf nodes store both the key and the value. The leaf nodes of the B+ tree are connected by a chain, and the leaf nodes of the B tree are independent.

Benefits of using B-tree B-tree can store keys and values at the same time in internal nodes. Therefore, placing frequently accessed data close to the root node will greatly improve the query efficiency of hot data. This feature makes B-trees more efficient in scenarios where specific data is repeatedly queried multiple times. Benefits of using B+ tree Because the internal nodes of the B+ tree only store keys, not values, more keys can be obtained in the memory page with one read, which is conducive to narrowing the search range faster. The leaf nodes of the B+ tree are connected by a chain. Therefore, when a full data traversal is required, the B+ tree only needs to use O(logN) time to find the smallest node, and then perform O(N) sequential traversal through the chain. can. The B-tree needs to traverse each level of the tree, which will require more memory replacement times, so it will take more time 19. What is the difference or pros and cons between Hash index and B+ tree? First of all, we must know the underlying implementation principle of Hash index and B+ tree index: The bottom layer of hash index is the hash table. When searching, the corresponding hash function can be obtained by calling the hash function once. Key value, and then back to the table query to get the actual data. The bottom layer of the B+ tree is a multi-way balanced search tree. For each query, it starts from the root node, finds the leaf node to get the key value, and then judges whether it needs to return to the table to query the data according to the query. Then you can see that they have the following differences: Hash index is faster to perform equivalent query (in general), but it cannot perform range query. Because after the hash function is used to build the index in the hash index, the order of the index cannot be consistent with the original order, and range queries cannot be supported. And all the nodes of the B+ tree follow (the left node is smaller than the parent node, the right node is larger than the parent node, and the multi-tree is similar), the natural support range. Hash index does not support the use of index for sorting, the principle is the same as above. Hash index does not support fuzzy query and leftmost prefix matching of multi-column index. The principle is also because the hash function is unpredictable. The indexes of AAAA and AAAAB are not related. Hash index can not avoid returning to the table to query data at any time, and B+ tree can only complete the query through the index when it meets certain conditions (clustered index, covering index, etc.). Although the hash index is faster in equivalent queries, it is not stable. Performance is unpredictable. When a key value has a large number of repetitions, hash collisions occur, and the efficiency may be extremely poor at this time. The query efficiency of the B+ tree is relatively stable, and all queries are from the root node to the leaf node, and the height of the tree is relatively low. Therefore, in most cases, direct selection of B+ tree index can obtain stable and better query speed. There is no need to use a hash index. Why the database uses B+ tree instead of B tree B tree is only suitable for random retrieval, while B+ tree supports both random retrieval and sequential retrieval; B+ tree space utilization is higher, which can reduce the number of I/Os, and the cost of disk read and write is lower . Generally speaking, the index itself is also very large, it is impossible to store all of it in memory, so the index is often stored in the form of an index file on the disk. In this case, disk I/O consumption will occur during the index lookup process. The internal node of the B+ tree does not have a pointer to the specific information of the keyword. It is only used as an index. Its internal node is smaller than that of the B tree. The number of keywords in the node that the disk can hold is more, and it is read into the memory at one time Key to find The more words there are, the correspondingly, the number of IO reads and writes is reduced. The number of IO reads and writes is the biggest factor affecting index retrieval efficiency; B+ tree query efficiency is more stable. The B-tree search may end at a non-leaf node. The closer to the root node, the shorter the search time. The existence of the record can be determined as long as the keyword is found. Its performance is equivalent to a binary search in the full set of keywords. In the B+ tree, the sequential search is more obvious. When searching randomly, any keyword search must take a path from the root node to the leaf node. The search path length of all keywords is the same, which leads to the query efficiency of each keyword. quite. The B-tree does not solve the problem of low efficiency of element traversal while improving the performance of disk IO. The leaf nodes of the B+ tree are connected together in the order of pointers, and the entire tree can be traversed by traversing the leaf nodes. Moreover, range-based queries in the database are very frequent, and B-trees do not support such operations. It is more efficient when adding or deleting files (nodes). Because the leaf nodes of the B+ tree contain all keywords and are stored in an ordered linked list structure, this can improve the efficiency of addition and deletion. When the B+ tree satisfies the clustered index and the covering index, it does not need to return to the table to query the data. In the B+ tree index, the leaf node may store the current key value, or it may store the current key value and the entire row of data. Is the clustered index and non-clustered index. In InnoDB, only the primary key index is a clustered index. If there is no primary key, a unique key is selected to build a clustered index. If there is no unique key, a key is implicitly generated to build a clustered index. When the query uses a clustered index, the entire row of data can be obtained at the corresponding leaf node, so there is no need to perform back-to-table query again. What is a clustered index? When to use clustered index and non-clustered index Clustered index: put the data storage and index together, find the index and find the data Non-clustered index: store the data in the index structure, the leaf of the index structure The node points to the corresponding row of the data. Myisam caches the index in the memory through the key_buffer. When it needs to access the data (accessing the data through the index), it directly searches the index in the memory, and then finds the corresponding data on the disk through the index. This is why Index is not in key

Will the non-clustered index return to the table query? Not necessarily. This relates to whether all the fields required by the query statement hit the index. If all the fields hit the index, then there is no need to perform a back-to-table query. For a simple example, suppose we have established an index on the age of the employee table, then when the query select age from employee where age <20, the leaf node of the index already contains age information and will not be repeated Back to the table query. What is a joint index? Why do we need to pay attention to the order in the joint index? MySQL can use multiple fields to create an index at the same time, called a joint index. In the joint index, if you want to hit the index, you need to use them one by one in the order of the fields when the index is created, otherwise the index cannot be hit. The specific reason is: when MySQL uses indexes, the indexes need to be ordered. Assuming that a joint index of "name, age, school" is now established, the order of the index is: first sort by name, if the names are the same, sort by age, if age If the values of are also equal, they are sorted according to school. When querying, the index is only strictly ordered according to the name, so you must first use the name field to perform an equivalent query, and then for the matched column, it is strictly ordered according to the age field. At this time, you can use the age field. Do index search, and so on. Therefore, you should pay attention to the order of the index columns when creating a joint index. Generally, put the columns with frequent query requirements or high field selectivity first. In addition, individual adjustments can be made according to special case queries or table structure. What is a database transaction? A transaction is an indivisible sequence of database operations and the basic unit of database concurrency control. The result of its execution must change the database from a consistent state to another consistent state. A transaction is a logical set of operations, either all of them are executed or none of them are executed.

The most classic transaction is often cited as an example of transfer.

If Xiao Ming wants to transfer 1000 yuan to Xiaohong, this transfer will involve two key operations: reduce Xiao Ming's balance by 1000 yuan, and increase Xiaohong's balance by 1000 yuan. If there is a sudden error between these two operations, such as a banking system crash, which causes Xiaoming's balance to decrease but Xiaohong's balance does not increase, it would be wrong. The transaction is to ensure that these two key operations either succeed or both fail.

What are the four characteristics of things (ACID)? Relational databases need to follow the ACID rules. The specific content is as follows:

1. Atomicity: A transaction is the smallest unit of execution and division is not allowed. The atomicity of the transaction ensures that the actions are either all completed or completely ineffective; 2. Consistency: Before and after the transaction is executed, the data remains consistent, and the results of multiple transactions reading the same data are the same; 3. Isolation: Concurrency When accessing the database, a user's transaction is not interfered by other transactions, and the database is independent among concurrent transactions; 4. Persistence: after a transaction is committed. Its changes to the data in the database are permanent, and even if the database fails, it should not have any impact on it. What is dirty read? Phantom reading? Not repeatable? Drity Read: A transaction has updated a piece of data, and another transaction reads the same piece of data at this time. For some reason, the previous RollBack operation is performed, and the next transaction reads the data Would be incorrect. Non-repeatable read: The data is inconsistent in the two queries of a transaction. This may be the original data updated by a transaction inserted between the two queries. Phantom Read: The number of data items in the two queries of a transaction is inconsistent. For example, one transaction queries several rows of data, while another transaction inserts new columns of data at this time. The previous transaction in the next query, you will find that there are several columns of data that it did not have before.

What is the isolation level of a transaction? What is the default isolation level of MySQL? In order to achieve the four major characteristics of transactions, the database defines 4 different transaction isolation levels, from low to high as Read uncommitted, Read committed, Repeatable read, and Serializable. These four levels can solve dirty reads, non-repeatable reads, and serializable. Phantom reading these types of questions.

The SQL standard defines four isolation levels: READ-UNCOMMITTED (read uncommitted): The lowest isolation level that allows data changes that have not been committed to be read, which may result in dirty reads, phantom reads, or non-repeatable reads. READ-COMMITTED (read committed): Allows to read data that has been committed by concurrent transactions, which can prevent dirty reads, but phantom reads or non-repeatable reads may still occur. REPEATABLE-READ (repeatable read): The results of multiple reads of the same field are consistent, unless the data is modified by the transaction itself, which can prevent dirty reads and non-repeatable reads, but phantom reads may still occur . SERIALIZABLE (serializable): The highest isolation level, fully compliant with the ACID isolation level. All transactions are executed one by one, so that there is no interference between transactions, that is, this level can prevent dirty reads, non-repeatable reads, and phantom reads.

What needs to be noted here is: the REPEATABLE_READ isolation level used by Mysql by default, the READ_COMMITTED isolation level used by Oracle by default

The realization of the transaction isolation mechanism is based on the lock mechanism and concurrent scheduling. Among them, concurrent scheduling uses MVVC (Multi-Version Concurrency Control), which supports features such as concurrent consistent read and rollback by saving the modified old version information.

Because the lower the isolation level, the fewer locks the transaction requests, so the isolation level of most database systems is READ-COMMITTED:, but what you need to know is that the InnoDB storage engine uses **REPEATABLE-READ by default (Rereadable)** There will be no performance loss.

The InnoDB storage engine generally uses the **SERIALIZABLE (serializable)** isolation level in the case of distributed transactions.

Do you know about MySQL locks? When the database has concurrent transactions, data inconsistencies may occur. At this time, some mechanism is needed to ensure the order of access, and the lock mechanism is such a mechanism.

Just like a hotel room, if everyone enters and exits at will, there will be multiple people robbing the same room, and a lock is installed on the room, and the person who applies for the key can check in and lock the room. Others have to wait for him to use it. It can be used again.

The relationship between the isolation level and the lock is at the Read Uncommitted level, no shared lock is required to read the data, so that it will not conflict with the exclusive lock on the modified data

Under the Read Committed level, the read operation needs to add a shared lock, but the shared lock is released after the statement is executed;

In the Repeatable Read level, read operations need to add a shared lock, but the shared lock is not released before the transaction is committed, that is, the shared lock must be released after the transaction is completed.

SERIALIZABLE is the most restrictive isolation level, because this level locks the entire range of keys and holds the lock until the transaction is completed.

According to the granularity of the lock, what are the database locks? Locking mechanism and InnoDB lock algorithm In relational databases, database locks can be divided into row-level locks (INNODB engine), table-level locks (MYISAM engine), and page-level locks (BDB engine) according to the granularity of locks. Locks used by MyISAM and InnoDB storage engines: MyISAM uses table-level locking. InnoDB supports row-level locking and table-level locking. The default is row-level locking. Row-level locks are row-level locks. Table-level locks and page-level locks are compared to row-level locks. Row-level locks are the finest locking granularity in Mysql. Lock, which means that only the row of the current operation is locked. Row-level locks can greatly reduce conflicts in database operations. The locking granularity is the smallest, but the locking overhead is also the largest. Row-level locks are divided into shared locks and exclusive locks.

Features: high overhead and slow locking; deadlock will occur; the smallest locking granularity, the lowest probability of lock conflicts, and the highest degree of concurrency.

Table-level locks Table-level locks are the lock with the largest locking granularity in MySQL, which means to lock the entire table of the current operation. It is simple to implement and consumes less resources. It is supported by most MySQL engines. The most commonly used MYISAM and INNODB both support table-level locking. Table-level locks are divided into table shared read locks (shared locks) and table exclusive write locks (exclusive locks).

Features: low overhead, fast locking; no deadlock; large locking granularity, the highest probability of issuing lock conflicts, and the lowest concurrency.

Page-level locks Page-level locks are a type of lock that has a locking granularity between row-level locks and table-level locks in MySQL. Table-level locking is fast, but there are many conflicts, and row-level conflicts are few, but the speed is slow. Therefore, a compromised page level is adopted, and a set of adjacent records is locked at a time.

Features: overhead and lock time are between table locks and row locks; deadlocks will occur; locking granularity is between table locks and row locks, and the degree of concurrency is average

What kind of locks does MySQL have in terms of lock categories? Isn't it a bit of a hindrance to the efficiency of concurrency to lock like the above? From the perspective of lock types, there are shared locks and exclusive locks.

Shared lock: also known as read lock. When the user wants to read data, add a shared lock to the data. Multiple shared locks can be added at the same time.

Exclusive lock: Also called write lock. When the user wants to write data, an exclusive lock is added to the data. Only one exclusive lock can be added, and it is mutually exclusive with other exclusive locks and shared locks.

Using the above example, there are two kinds of user behaviors. One is to look at the house. It is acceptable for multiple users to look at the house together. One is a real one-night stay. During this period, no matter if you want to check in or want to see the room.

The granularity of the lock depends on the specific storage engine. InnoDB implements row-level locks, page-level locks, and table-level locks.

Their locking overhead is from large to small, and their concurrency is also from large to small. How is the row lock of the InnoDB engine implemented in MySQL? Answer: InnoDB is based on the index to complete the row lock

Example: select * from tab_with_index where id = 1 for update;

for update can complete row lock locking according to conditions, and id is a column with an index key. If id is not an index key, InnoDB will complete table locking, and there will be no concurrency. There are three lock algorithms for InnoDB storage engine. Record lock: Lock on a single row record Gap lock: gap lock, lock a range, excluding the record itself Next-key lock: record+gap lock a range, including the knowledge points of the record itself: 1. Innodb uses next for row queries -key lock 2.Next-locking keying in order to solve the Phantom Problem phantom reading problem 3. When the query index contains unique attributes, downgrade next-key lock to record key 4. The purpose of Gap lock design is to prevent multiple transactions from changing Records are inserted into the same range, and this will cause phantom reading problems. 5. There are two ways to explicitly close gap locks: (except for foreign key constraints and uniqueness checks, only record locks are used in the rest of the cases) A. Transact Isolation level is set to RC B. Set the parameter innodb_locks_unsafe_for_binlog to 1 What is a deadlock? How to deal with it? Deadlock refers to a phenomenon in which two or more transactions occupy each other on the same resource and request to lock each other's resources, leading to a vicious circle.

Common ways to solve deadlock

1. If different programs will access multiple tables concurrently, try to agree to access the tables in the same order, which can greatly reduce the chance of deadlock.

2. In the same transaction, try to lock all the required resources at once to reduce the probability of deadlock;

3. For the business part that is very prone to deadlock, you can try to upgrade the lock granularity, and reduce the probability of deadlock through table-level locking;

If business processing is not good, you can use distributed transaction lock or use optimistic lock database. What are optimistic and pessimistic locks? How did it happen? The task of concurrency control in a database management system (DBMS) is to ensure that when multiple transactions access the same data in the database at the same time, the isolation and unity of the transaction and the unity of the database are not destroyed. Optimistic concurrency control (optimistic locking) and pessimistic concurrency control (pessimistic locking) are the main technical methods used for concurrency control.

Pessimistic lock: Assuming that concurrency conflicts will occur, shield all operations that may violate data integrity. When the data is queried, the transaction is locked until the transaction is committed. Implementation method: use the lock mechanism in the database

Optimistic locking: Assuming that no concurrency conflicts will occur, only check whether data integrity is violated when the operation is submitted. When modifying data, the transaction is locked, and the lock is performed by version. Implementation method: Le generally uses version number mechanism or CAS algorithm implementation.

Use scenarios of two locks

From the above introduction of the two types of locks, we know that the two types of locks have their own advantages and disadvantages. One cannot be considered as better than the other. For example, optimistic locks are suitable for less writes (read more scenarios), that is, conflicts When this happens rarely, this saves the overhead of the lock and increases the overall throughput of the system.

However, if it is over-write, conflicts will usually occur frequently, which will cause the upper-level application to constantly retry, which will reduce the performance, so it is more appropriate to use pessimistic lock in the scenario of over-write. Why use views for views? What is a view? In order to improve the reusability of complex SQL statements and the security of table operations, the MySQL database management system provides a view feature. The so-called view is essentially a virtual table, which does not exist physically. Its content is similar to a real table and contains a series of named column and row data. However, the view does not exist in the database in the form of stored data values. Row and column data come from the basic table referenced by the query that defines the view, and are dynamically generated when the view is specifically referenced.

The view enables developers to only care about certain specific data of interest and specific tasks, and can only see the data defined in the view, not the data in the table referenced by the view, thereby improving the security of the data in the database . What are the characteristics of the view? The characteristics of the view are as follows:

View columns can come from different tables, which are abstractions of tables and new relationships established in a logical sense. A view is a table (virtual table) generated from a basic table (real table). View creation and deletion do not affect the basic table. Updates (add, delete, and modify) the content of the view directly affect the basic table. When the view comes from multiple basic tables, adding and deleting data is not allowed.

View operations include creating a view, viewing a view, deleting a view, and modifying a view. What are the usage scenarios for views? The basic purpose of the view: simplify sql query and improve development efficiency. If there is another use, it is to be compatible with the old table structure.

The following are common usage scenarios for views:

Reuse SQL statements; Simplify complex SQL operations. After writing a query, it can be easily reused without knowing its basic query details; Use the components of the table instead of the entire table; Protect data. You can grant users access to specific parts of the table instead of access to the entire table; Change the data format and presentation. The view can return data that is different from the representation and format of the underlying table.

Advantages of the view 1. Simplified query. View can simplify the user's operation 2. Data security. View enables users to view the same data from multiple angles, and can provide security protection for confidential data. 3. Logical data independence. Views provide a certain degree of logical independence for reconstructing the database. Disadvantages of views 1. Performance. The database must convert the query of the view into a query of the basic table. If the view is defined by a complex multi-table query, then, even a simple query of the view, the database will turn it into a complex combination. It takes a certain amount of time.

2. Modify the restrictions. When the user tries to modify certain rows of the view, the database must convert it into a modification of certain rows of the basic table. In fact, this is also the case when inserting or deleting from the view. For simple views, this is very convenient, but for more complex views, it may be unmodifiable

These views have the following characteristics: 1. Views with set operators such as UNIQUE. 2. Views with GROUP BY clause. 3. There are views of aggregate functions such as AVG\SUM\MAX. 4. Views using the DISTINCT keyword. 5. Views that connect tables (with some exceptions) What is a cursor? The cursor is a data buffer opened by the system for users to store the execution results of SQL statements. Each cursor area has a name. The user can obtain the records one by one through the cursor and assign them to the main variable, which will be further processed by the main language. What are stored procedures and functions? What are the advantages and disadvantages? A stored procedure is a pre-compiled SQL statement, which has the advantage of allowing a modular design, that is, it only needs to be created once, and it can be called multiple times in the program later. If a certain operation needs to execute SQL multiple times, using stored procedures is faster than pure SQL statement execution. advantage

1) The stored procedure is pre-compiled, and the execution efficiency is high.

2) The code of the stored procedure is directly stored in the database, and the stored procedure name is directly called to reduce network communication.

3) High security, users with certain permissions are required to execute stored procedures.

4) Stored procedures can be reused, reducing the workload of database developers.

Disadvantage

1) Debugging is troublesome, but debugging with PL/SQL Developer is very convenient! Make up for this shortcoming.

2) The problem of porting, the database-side code is of course related to the database. But if it is an engineering project, there is basically no migration problem.

3) Recompilation problem, because the back-end code is compiled before running, if the object with the reference relationship changes, the affected stored procedures and packages will need to be recompiled (but it can also be set to automatically compile at runtime).

4) If a large number of stored procedures are used in a program system, the data structure will change as the user's demand increases when the program is delivered, and then there are related problems with the system. Finally, if the user wants to maintain the system, you can say It is very difficult, and the price is unprecedented, and it is more troublesome to maintain. What is a trigger? What are the usage scenarios of triggers? A trigger is a special event-driven stored procedure defined by the user on the relational table. A trigger is a piece of code that is automatically executed when an event is triggered. Usage scenarios Cascade changes can be realized through related tables in the database. Real-time monitoring of changes in a certain field in a table and corresponding processing needs to be made. For example, the number of certain services can be generated. Be careful not to abuse, otherwise it will cause difficulties in maintaining the database and applications. You need to keep in mind the above basic knowledge points, the focus is to understand the difference between the data types CHAR and VARCHAR, the difference between the table storage engine InnoDB and MyISAM. What triggers are there in MySQL? There are six types of triggers in the MySQL database: Before Insert After Insert Before Update After Update Before Delete After Delete Which types of SQL statements are commonly used in SQL statements: Data definition language DDL (Data Ddefinition Language) CREATE , DROP, ALTER

Mainly for the above operations that have operations on logical structures, including table structures, views, and indexes.

Data query language DQL (Data Query Language) SELECT

This is easier to understand, namely the query operation, with the select keyword. Various simple queries, connection queries, etc. belong to DQL.

Data manipulation language DML (Data Manipulation Language) INSERT, UPDATE, DELETE

It is mainly for the above operations to operate on the data. Corresponding to the query operation mentioned above, DQL and DML jointly construct the addition, deletion, modification, and query operations commonly used by most junior programmers. The query is a more special kind that is divided into DQL.

Data control function DCL (Data Control Language) GRANT, REVOKE, COMMIT, ROLLBACK

Mainly for the above operations that have operations on the security and integrity of the database, which can be simply understood as permission control, etc. What are super keys, candidate keys, primary keys, and foreign keys? Super key: The set of attributes that can uniquely identify the tuple in the relationship is called the super key of the relationship mode. An attribute can be used as a super key, and a combination of multiple attributes can also be used as a super key. Super keys include candidate keys and primary keys. Candidate key: It is the smallest super key, that is, the super key without redundant elements. Primary key: A combination of data columns or attributes in a database table that uniquely and completely identify the stored data object. A data column can only have one primary key, and the value of the primary key cannot be missing, that is, it cannot be a null value (Null). Foreign key: The primary key of another table that exists in one table is called the foreign key of this table.

What kinds of SQL constraints are there? What kinds of SQL constraints are there?

NOT NULL: The content of the control field must not be empty (NULL). UNIQUE: The content of the control field cannot be repeated. A table allows multiple Unique constraints. PRIMARY KEY: It is also used for the control field content cannot be repeated, but it only allows one in a table. FOREIGN KEY: It is used to prevent the action of destroying the connection between tables, and it can also prevent illegal data from being inserted into the foreign key column, because it must be one of the values in the table it points to. CHECK: used to control the value range of the field. 6.types of related queries Cross join (CROSS JOIN) Inner join (INNER JOIN) Outer join (LEFT JOIN/RIGHT JOIN) Union query (UNION and UNION ALL) Full join (FULL JOIN) Cross join (CROSS JOIN) ) SELECT * FROM A,B(,C) or SELECT * FROM A CROSS JOIN B (CROSS JOIN C)# There are no associated conditions, the result is a Cartesian product, the result set will be large, meaningless, and inner joins are rarely used (INNER JOIN) SELECT * FROM A, B WHERE A.id=B.id or SELECT * FROM A INNER JOIN B ON A.id=B.id A collection of data records in multiple tables that meet certain conditions at the same time, INNER JOIN Can be abbreviated as JOIN

Inner joins are divided into three categories Equivalent connection: ON A.id=B.id Unequal value connection: ON A.id> B.id Self connection: SELECT * FROM A T1 INNER JOIN A T2 ON T1.id =T2.pid

Outer join (LEFT JOIN/RIGHT JOIN) Left outer join: LEFT OUTER JOIN, the left table is the main one, the left table is queried first, the right table is matched according to the association condition after ON, and the ones that are not matched are filled with NULL, which can be abbreviated Into LEFT JOIN Right outer join: RIGHT OUTER JOIN, the right table is the main one, the right table is queried first, and the left table is matched according to the association conditions after ON. The ones that are not matched are filled with NULL, which can be abbreviated as RIGHT JOIN joint query ( UNION and UNION ALL) SELECT * FROM A UNION SELECT * FROM B UNION ... is to gather multiple result sets together. The result before UNION is the benchmark. Note that the number of columns in the joint query must be equal and the same If you use UNION ALL, duplicate records will not be merged. The efficiency of UNION is higher than UNION ALL full join (FULL JOIN). MySQL does not support full join. You can use LEFT JOIN and UNION and RIGHT JOIN together SELECT * FROM A LEFT JOIN B ON A.id=B.id UNIONSELECT * FROM A RIGHT JOIN B ON A.id=B.id There are 2 tables for connection interview questions, 1 R, 1 S, R table has ABC has three columns, S table has two CD columns, and each table has three records. R table

S table

1. Cross connection (Cartesian product select r. ,s.  from r, s

2. Inner join result: select r. ,s.  from r inner join s on rc=sc

3. Left join result: select r. ,s.  from r left join s on rc=sc

4. Right join result: select r. ,s.  from r right join s on rc=sc

5. The result of the full table join (MySql does not support, Oracle supports): select r. ,s.  from r full join s on rc=sc

What is a subquery 1. Condition: The query result of one SQL statement is used as the condition or query result of another query statement. 2. Nesting: Multiple SQL statements are nested, and the inner SQL query statement is called a subquery. 3.situations of subquery 1. The subquery is a single row and single column: the result set is a value, and the parent query uses operators such as =, <,> - Query who is the employee with the highest salary? select * from employee where salary=(select max(salary) from employee); 2. The subquery is multi-row and single-column: the result set is similar to an array, and the parent query uses: in operator

o o - Query who is the employee with the highest salary? select * from employee where salary=(select max(salary) from employee);    

3. The sub-query is a multi-row and multi-column situation: the result set is similar to a virtual table and cannot be used in the where condition. It is used as a sub-table in the select clause 4. oooooo - 1) Query the employment after 2011 Employee information - 2) Query all department information, compare it with the information in the virtual table above, and find all employees with the same department ID. select * from dept d, (select * from employee where join_date> '2011-1-1') e where e.dept_id = d.id;
- Use table connection: select d. , e. from dept d inner join employee e on d.id = e.dept_id where e.join_date> '2011-1-1' Difference between in and exists in mysql The statement is a loop loop for the outer table, and each time the loop loops, the inner table is queried. Everyone has always believed that exists is more efficient than in, but this statement is actually inaccurate. This is to distinguish the environment. 1. If the two tables to be queried are of the same size, there is little difference between in and exists. 2. If one of the two tables is smaller and the other is a large table, use exists for the larger subquery table, and use in for the smaller subquery table. 3. Not in and not exists: If not in is used in the query statement, then both the internal and external tables are scanned across the table, and the index is not used; and the not extsts subquery can still use the index on the table. So no matter the size of the table, using not exists is faster than not in. The difference between varchar and char The characteristics of char char represents a fixed-length character string, and the length is fixed; If the length of the inserted data is less than the fixed length of char, it will be filled with spaces; Because the length is fixed, the access speed is faster than Varchar is much faster, even 50% faster, but because its length is fixed, it will take up extra space, which is a space-for-time approach; For char, the maximum number of characters that can be stored is 255, regardless of encoding The characteristics of varchar varchar represents a variable-length character string, and the length is variable; the inserted data is stored according to how long it is; varchar is the opposite of char in terms of access, it is slow to access because of the length It is not fixed, but because of this, it does not occupy extra space, which is the practice of changing space for time; For varchar, the maximum number of characters that can be stored is 65532

In short, combining the performance perspective (char is faster) and the disk space saving perspective (varchar is smaller), the specific situation requires specific database design is the appropriate approach. The meaning of 50 in varchar(50) can store up to 50 characters. Varchar(50) and (200) occupies the same space for storing hello, but the latter will consume more memory when sorting, because order by col uses fixed_length to calculate the col length (The same is true for the memory engine). In earlier versions of MySQL, 50 represents the number of bytes, and now represents the number of characters. The meaning of 20 in int(20) refers to the length of the displayed character. 20 means that the maximum display width is 20, but it still occupies 4 bytes of storage, and the storage range remains unchanged; it does not affect the internal storage, but only affects the int with zerofill definition, how many 0s are added in front, it is easy for the report to show why mysql is so designed to be large Most applications do not make sense, just specify some tools to display the number of characters; int(1) and int(20) are stored and calculated the same; the difference between int(10) and char(10) and varchar(10) in mysql 10 in int(10) indicates the length of the displayed data, not the size of the stored data; 10 in chart(10) and varchar(10) indicates the size of the stored data, that is, how many characters are stored. int(10) 10-digit data length 9999999999, occupying 32 bytes, int type 4-digit char(10) 10-digit fixed character string, up to 10 characters with insufficient space, varchar(10) 10-digit variable character string, insufficient Fill in spaces Up to 10 characters char(10) means storing 10 characters of fixed length, fill up with spaces if less than 10 characters, occupying more storage space varchar(10) means storing 10 characters of variable length, how many are stored Just how many. Spaces are also stored as one character. This is different from the space of char(10). The space of char(10) means that the placeholder is not a character. What is the difference between FLOAT and DOUBLE? FLOAT type data can store up to 8 decimal numbers and occupy 4 bytes in the memory. DOUBLE type data can store up to 18 decimal numbers and occupy 8 bytes in the memory. The difference between drop, delete and truncate all means delete, but there are some differences between the three:

Therefore, when a table is no longer needed, use drop; when you want to delete some data rows, use delete; when you want to keep the table and delete all data, use truncate. What is the difference between UNION and UNION ALL? If you use UNION ALL, duplicate records will not be merged. The efficiency of UNION is higher than that of UNION ALL. How does SQL optimization locate and optimize the performance of SQL statements? Has the created index been used? Or how can we know the reason why this statement is running slowly? For the positioning of low-performance SQL statements, the most important and effective method is to use the execution plan. MySQL provides the explain command to view the execution plan of the statement. We know that no matter what kind of database, or what kind of database engine, a lot of related optimizations will be made during the execution of a SQL statement. For query statements, the most important optimization method is to use indexes. The execution plan is to display the details of the execution of the SQL statement by the database engine, including whether to use an index, what index to use, and related information about the index used.

The information id contained in the execution plan consists of a set of numbers. Represents the execution order of each subquery in a query; The execution order of the same id is from top to bottom. The id is different, the greater the id value, the higher the priority, and the earlier it will be executed. When id is null, it means a result set, and it is not necessary to query it, and it often appears in query statements such as union. select_type The query type of each subquery, some common query types.

Table query data table, when querying data from the derived table, x will be displayed indicating the corresponding execution plan id partitions table partition, table creation time can specify which column to partition the table. for example:

create table tmp (id int unsigned not null AUTO_INCREMENT, name varchar(255), PRIMARY KEY (id)) engine = innodbpartition by key (id) partitions 5; type (very important, you can see if there is an index) Access type ALL Scans the entire table data Index traverses the index Range Index range search index_subquery uses ref in subqueries Index search data eq_ref Use PRIMARY KEYorUNIQUE NOT NULL index association in join query. possible_keys The index that may be used. Note that it may not be used. If there is an index on the field involved in the query, the index will be listed. When the column is NULL, it is necessary to consider whether the current SQL needs to be optimized.

key shows the index actually used by MySQL in the query. If the index is not used, it is displayed as NULL.

TIPS: If a covering index is used in the query (covering index: the data of the index covers all the data that needs to be queried), the index only appears in the key list

key_length index length

ref represents the connection matching condition of the above table, that is, which columns or constants are used to find the value on the index column

rows returns the estimated number of result sets, which is not an accurate value.

The extra information is very rich, the common ones are:

1.Using index uses a covering index 2.Using where uses a where clause to filter the result set 3.Using filesort uses file sorting, which appears when sorting using non-indexed columns, which is very performance-consuming, and try to optimize it. 4.Using temporary The goal of using temporary table sql optimization can refer to the Ali development manual

[Recommended] The goal of SQL performance optimization: at least reach the range level, the requirement is the ref level, and if it can be consts, the best. Note: 1) There is at most one matching row (primary key or unique index) in a consts single table, and the data can be read during the optimization phase. 2) ref refers to the use of a normal index (normal index). 3) Range performs range search on the index. Counter-example: The result of the explain table, type=index, index physical file full scan, the speed is very slow, this index level is lower than the range, which is indistinguishable from the full table scan.

The life cycle of SQL? 1. The application server establishes a connection with the database server 2. The database process gets the request sql 3. Analyzes and generates the execution plan, executes 4. Reads the data to the memory and performs logical processing 5. Through the connection in step 1, sends the result to the client 6. Close the connection and release resources

How to optimize large table data query 1. Optimize shema, sql statement + index; 2. 2.add cache, memcached, redis; 3. Master-slave replication, read-write separation; 4. Vertical split, according to the degree of coupling of your modules , Divide a large system into multiple small systems, that is, distributed systems; 5. Horizontal segmentation, for tables with a large amount of data, this step is the most troublesome and can test the technical level. A reasonable sharding should be selected. Key, in order to have a good query efficiency, the table structure must be changed to make certain redundancy, and the application must also be changed. Try to bring sharding key in sql, locate the data to the limited table to check, instead of scanning all the tables;

How to deal with oversized paging? Large paging is generally solved from two directions. At the database level, this is also our main focus (although the effect is not so great), similar to select * from table where age> 20 limit 1000000, 10 This kind of query is actually also available There is room for optimization. This statement needs to load 1000000 data and basically discard all of it. Of course it is slower to take only 10. At that time we can modify it to select * from table where id in (select id from table where age> 20 limit 1000000, 10). Although it also loads one million data, due to index coverage, all fields to be queried are in the index, so the speed will be very fast. At the same time, if the ID is continuous, we can also select * from table where id> 1000000 limit 10, the efficiency is also good, there are many possibilities for optimization, but the core idea is the same, which is to reduce the load data. Reduce this kind of request from the perspective of demand... Mainly do not make similar requirements (direct Jump to a specific page after several million pages. Only allow to view page by page or follow a given route, which is predictable and cacheable) and prevents ID leakage and continuous malicious attacks.

The solution to large paging is actually based on caching, which can predict the content in advance, cache it in a kV database such as redis, and return it directly. In Alibaba's "Java Development Manual", the solution to large paging is similar In the first mentioned above.

[Recommended] Use delayed associations or sub-queries to optimize super-multi-page scenarios. Note: MySQL does not skip the offset rows, but takes the offset+N rows, then returns the offset rows before giving up, and returns N rows. When the offset is particularly large, the efficiency is very low, or it controls the total number of pages returned. , Or perform SQL rewriting on the number of pages that exceed a certain threshold. Positive example: first quickly locate the id segment that needs to be obtained, and then associate: SELECT a.* FROM table 1 a, (select id from table 1 where condition LIMIT 100000,20) b where a.id=b.id mysql paging LIMIT The clause can be used to force the SELECT statement to return a specified number of records. LIMIT accepts one or two numeric parameters. The parameter must be an integer constant. If two parameters are given, the first parameter specifies the offset of the first returned record row, and the second parameter specifies the maximum number of returned record rows. The offset of the initial record row is 0 (not 1) mysql> SELECT * FROM table LIMIT 5,10;//Retrieve record rows 6-15 in order to retrieve all records from a certain offset to the end of the record set You can specify the second parameter as -1: mysql> SELECT * FROM table LIMIT 95,-1;//Retrieve record row 96-last. If only one parameter is given, it means to return the maximum number of record rows: mysql> SELECT * FROM table LIMIT 5;//Retrieve the first 5 rows of records. In other words, LIMIT n is equivalent to LIMIT 0,n. The slow query log is used to record SQL logs whose execution time exceeds a certain critical value, and is used to quickly locate slow queries, as a reference for our optimization. Enable slow query log

Configuration item: slow_query_log

You can use show variables like'slov_query_log' to check whether it is turned on. If the status value is OFF, you can use set GLOBAL slow_query_log = on to turn it on. It will generate a xxx-slow.log file in the datadir.

Set critical time

Configuration item: long_query_time

View: show VARIABLES like'long_query_time', in seconds

Setting: set long_query_time=0.5

The actual operation should be set from a long time to a short time, that is to optimize the slowest SQL

Check the log. Once the SQL exceeds the critical time we set, it will be recorded in xxx-slow.log. Have you been concerned about the time-consuming SQL in the business system? Are statistics too slow to query? How to optimize slow queries? In the business system, in addition to the query using the primary key, I will test the other time-consuming on the test library. The statistics of the slow query are mainly done by the operation and maintenance, and the slow query in the business will be fed back to us on a regular basis. The optimization of slow queries must first understand the reasons for the slowness? Is the query condition not hitting the index? Is the unneeded data column loaded? Or is the amount of data too large? Therefore, optimization is also aimed at these three directions. First analyze the statement to see if additional data is loaded. It may be that extra rows are queried and discarded. It may be that many columns that are not needed in the results are loaded. Analyze and rewrite the statement. Analyze the execution plan of the statement, and then obtain its use of the index, and then modify the statement or modify the index, so that the statement can hit the index as much as possible. If the optimization of the statement has not been possible, you can consider whether the amount of data in the table is too large, and if so, you can split the table horizontally or vertically. Why should we try to set a primary key? The primary key is the guarantee for the database to ensure the uniqueness of the data rows in the entire table. Even if the table does not have a primary key in business, it is recommended to add a self-growing ID column as the primary key. After the primary key is set, the subsequent deletion, modification, and inspection may be faster and ensure the security of the operating data range. Does the primary key use auto-incremental ID or UUID? It is recommended to use self-incrementing ID instead of UUID.

Because in the InnoDB storage engine, the primary key index exists as a clustered index, that is, the primary key index and all the data (in order) are stored on the B+ tree leaf node of the primary key index. If the primary key index is an auto-increment ID, Then it only needs to be arranged backwards continuously. If it is a UUID, because the incoming ID and the original size are uncertain, it will cause a lot of data insertion, data movement, and then cause a lot of memory fragmentation, which will cause the insertion performance to decline .

In short, in the case of a larger amount of data, the performance of using an auto-increment primary key will be better.

Regarding the primary key is a clustered index, if there is no primary key, InnoDB will choose a unique key as the clustered index, if there is no unique key, an implicit primary key will be generated. Why is the field required to be defined as not null? The null value will take up more bytes and will cause a lot of non-compliance with expectations in the program. If you want to store a user's password hash, what field should be used to store it? Fixed-length strings such as password hash, salt, and user ID number should be stored in char instead of varchar, which can save space and improve retrieval efficiency. Optimize data access during the query process. Access to too much data leads to a decrease in query performance. Determine whether the application is retrieving a large amount of data that exceeds the required amount, which may be too many rows or columns. Determine whether the MySQL server is analyzing a large number of unnecessary rows of data. Avoid making the following SQL statement mistakes Query unwanted data. Solution: Use limit to solve Multi-table association returns all columns. Solution: Specify the column name Always return all columns. Solution: Avoid using SELECT * Query the same data repeatedly. Solution: The data can be cached, and the cache can be read directly next time Is scanning for additional records. Solution: Use explain for analysis. If you find that the query needs to scan a large amount of data, but only returns a few rows, you can optimize by the following techniques: Use index coverage scan to put all the columns in the index and store it like this The engine does not need to go back to the table to get the corresponding row to return the result. Change the structure of the database and table, modify the data table paradigm Rewrite the SQL statement so that the optimizer can execute the query in a better way. Optimize long and difficult query statements. A complex query or multiple simple queries. MySQL can scan millions of rows of data in memory per second. In contrast, responding to data to the client is much slower. Use as little as possible The query is good, but sometimes it is necessary to break a large query into multiple small queries. Split query Divide a large query into multiple smaller identical queries Deleting 10 million data at a time is more costly than deleting 10,000 at a time and pausing for a while. Decompose related queries to make the cache more efficient. Executing a single query can reduce lock contention. It is easier to split the database by making associations at the application layer. The query efficiency will be greatly improved. Inquiries with fewer redundant records. Optimize specific types of query statements count() Will ignore all columns and directly count all the columns, do not use count (column name) In MyISAM, count( ) without any where conditions is very fast. When there is a where condition, the count statistics of MyISAM may not be faster than other engines. You can use explain to query approximate values, and substitute approximate values for count() Increase the summary table Use the cache to optimize the associated query Determine whether there is an index in the ON or USING clause. Ensure that GROUP BY and ORDER BY have only one column in the table, so that MySQL may use indexes. Optimize subqueries Replace with associated queries Optimize GROUP BY and DISTINCT These two query data can be optimized using indexes, which is the most effective optimization method. In associated queries, the use of identity column grouping is more efficient. If ORDER is not required BY, add ORDER BY NULL when performing GROUP BY, MySQL will no longer sort files. WITH ROLLUP super aggregation, which can be moved to the application to process and optimize the LIMIT paging. When the LIMIT offset is large, the query efficiency is low. The largest ID of the last query can be recorded, and the next query will be directly based on the ID to query and optimize UNION Query UNION ALL is more efficient than UNION. Optimizing WHERE clause problem-solving methods. For this type of test, first explain how to locate inefficient SQL statements, and then perform troubleshooting based on the possible inefficiency of SQL statements. Start with the index. If the index is not available Questions, considering the above aspects, data access problems, long difficult query sentences or some specific types of optimization problems, answer them one by one. Some methods of SQL statement optimization? 1. To optimize the query, try to avoid full table scans, first consider establishing indexes on the columns involved in where and order by. 2. Try to avoid the null value judgment of the field in the where clause, otherwise it will cause the engine to give up using the index and perform a full table scan, such as: select id from t where num is null-- can be on num Set the default value of 0, make sure that the num column in the table does not have a null value, and then query like this: select id from t where num= 3. Try to avoid using the != or <> operator in the where clause, otherwise the engine will give up using it Index and perform a full table scan. 4. Should try to avoid using or in the where clause to join conditions, otherwise it will cause the engine to give up using the index and perform a full table scan, such as: select id from t where num=10 or num=20-- You can query like this: select id from t where num=10 union all select id from t where num=20 5.in and not in should also be used with caution, otherwise it will cause a full table scan, such as: select id from t where num in( 1,2,3) - For continuous values, do not use in if you can use between: select id from t where num between 1 and 3 6. The following query will also cause a full table scan: select id from t where For name like'% %' to improve efficiency, you can consider full-text search. 7. If you use parameters in the where clause, it will also cause a full table scan. Because SQL only parses local variables at runtime, the optimizer cannot defer the choice of the access plan until runtime; it must choose at compile time. However, if the access plan is established at compile time, the value of the variable is still unknown and therefore cannot be used as an input item for index selection. For example, the following statement will perform a full table scan: select id from t where num=@num-- can be changed to force the query to use the index: select id from t with(index (index name)) where num=@num 8 .Should try to avoid performing expression operations on fields in the where clause, which will cause the engine to abandon the use of indexes and perform full table scans. For example: select id from t where num/2=100-- should be changed to: select id from t where num=100 In and not in should also be used with caution, otherwise it will lead to a full table scan, such as: select id from t where num in(1,2,3) - For continuous values, if you can use between, don t use in. Select id from t where num between 1 and 3 6. The following query will also lead to a full table scan: select id from t where name like'% %' To improve efficiency, you can consider full-text search. 7. If you use parameters in the where clause, it will also cause a full table scan. Because SQL only parses local variables at runtime, the optimizer cannot defer the choice of the access plan until runtime; it must choose at compile time. However, if the access plan is established at compile time, the value of the variable is still unknown and therefore cannot be used as an input item for index selection. For example, the following statement will perform a full table scan: select id from t where num=@num-- can be changed to force the query to use the index: select id from t with(index (index name)) where num=@num 8 .Should try to avoid performing expression operations on fields in the where clause, which will cause the engine to abandon the use of indexes and perform full table scans. For example: select id from t where num/2=100-- should be changed to: select id from t where num=100 In and not in should also be used with caution, otherwise it will lead to a full table scan, such as: select id from t where num in(1,2,3) - For continuous values, if you can use between, don t use in. : Select id from t where num between 1 and 3 6. The following query will also cause a full table scan: select id from t where name like'% %' To improve efficiency, you can consider full-text search. 7. If you use parameters in the where clause, it will also cause a full table scan. Because SQL only parses local variables at runtime, the optimizer cannot defer the choice of the access plan until runtime; it must choose at compile time. However, if the access plan is established at compile time, the value of the variable is still unknown and therefore cannot be used as an input item for index selection. For example, the following statement will perform a full table scan: select id from t where num=@num-- can be changed to force the query to use the index: select id from t with(index (index name)) where num=@num 8 .Should try to avoid performing expression operations on fields in the where clause, which will cause the engine to abandon the use of indexes and perform full table scans. For example: select id from t where num/2=100-- should be changed to: select id from t where num=100 If you use parameters in the where clause, it will also cause a full table scan. Because SQL only parses local variables at runtime, the optimizer cannot defer the choice of the access plan until runtime; it must choose at compile time. However, if the access plan is established at compile time, the value of the variable is still unknown and therefore cannot be used as an input item for index selection. For example, the following statement will perform a full table scan: select id from t where num=@num-- can be changed to force the query to use the index: select id from t with(index (index name)) where num=@num 8 .Should try to avoid performing expression operations on fields in the where clause, which will cause the engine to abandon the use of indexes and perform full table scans. For example: select id from t where num/2=100-- should be changed to: select id from t where num=100 If you use parameters in the where clause, it will also cause a full table scan. Because SQL only parses local variables at runtime, the optimizer cannot defer the choice of the access plan until runtime; it must choose at compile time. However, if the access plan is established at compile time, the value of the variable is still unknown and therefore cannot be used as an input item for index selection. For example, the following statement will perform a full table scan: select id from t where num=@num-- can be changed to force the query to use the index: select id from t with(index (index name)) where num=@num 8 .Should try to avoid performing expression operations on fields in the where clause, which will cause the engine to abandon the use of indexes and perform full table scans. For example: select id from t where num/2=100-- should be changed to: select id from t where num=1002 9. Should try to avoid performing functional operations on the fields in the where clause, which will cause the engine to abandon the use of the index and perform a full table scan. Such as: select id from t where substring(name,1,3)='abc'-- The id whose name starts with abc should be changed to: select id from t where name like'abc%' 10. Perform functions, arithmetic operations or other expression operations on the left side of the "=" in the where clause, otherwise the system may not be able to use the index correctly. Why database optimization should be optimized System throughput bottlenecks often appear in the access speed of the database As the application runs, there will be more and more data in the database, and the processing time will be correspondingly slower Data is stored on disk Above, the read and write speed cannot be compared with the memory optimization principle: reduce system bottlenecks, reduce resource usage, and increase system response speed. Database structure optimization A good database design plan will often have a multiplier effect on the performance of the database.

Need to consider many aspects such as data redundancy, query and update speed, and whether the data type of the field is reasonable.

Decompose a table with many fields into multiple tables

For tables with many fields, if the frequency of use of some fields is very low, these fields can be separated to form a new table.

Because when a table has a large amount of data, it will slow down due to the existence of low-frequency fields.

Increase the intermediate table

For tables that require frequent joint queries, intermediate tables can be established to improve query efficiency.

By creating an intermediate table, insert the data that needs to be queried through a union into the intermediate table, and then change the original union query to a query on the intermediate table.

Add redundant fields

When designing data tables, you should follow the norms of the paradigm theory as much as possible, and reduce redundant fields as much as possible to make the database design look delicate and elegant. However, reasonable addition of redundant fields can improve the query speed.

The higher the degree of normalization of the table, the more the relationship between the table and the table, the more cases that need to be connected and the query, and the worse the performance.

note:

If the value of a redundant field is modified in one table, it is necessary to find a way to update it in other tables, otherwise it will cause data inconsistency. What should he do if the MySQL database cpu soars to 500%? When the cpu soars to 500%, first use the operating system command top command to observe whether it is caused by mysqld occupancy. If not, find out the high occupancy process and perform related processing.

If it is caused by mysqld, show processlist to see if there is any consuming SQL running. Find out the SQL that consumes high, and see if the execution plan is accurate, if the index is missing, or if it is caused by too much data.

Generally speaking, these threads must be killed (while observing whether the cpu usage rate drops), and after corresponding adjustments (such as adding indexes, changing SQL, and changing memory parameters), run these SQL again.

It is also possible that each SQL consumes not much resources, but suddenly, a large number of sessions are connected to cause a soaring cpu. In this case, you need to analyze with the application why the number of connections will increase sharply, and then make corresponding adjustments. For example, how to optimize large tables such as limiting the number of connections? There are nearly tens of millions of data in a table, CRUD is slow, how to optimize? How do you sub-database and sub-table? What is the problem with sub-table sub-database? Is middleware useful? Do you know their principle? When the number of records in a single MySQL table is too large, the CRUD performance of the database will decrease significantly. Some common optimization measures are as follows:

Restrict the scope of the data: Be sure to prohibit query statements without any restrictions on the scope of the data. For example: when we inquire about the order history, we can control it within one month. Read/write separation: the classic database splitting scheme, the main database is responsible for writing, and the secondary database is responsible for reading; Cache: use MySQL cache, and for heavyweight and less updated data, you can consider using application-level cache; also Yes, it is optimized by sub-database sub-table, mainly vertical sub-table and horizontal sub-table

Vertical partition:

Split according to the relevance of the data tables in the database. For example, if the user table contains both the user's login information and the user's basic information, the user table can be split into two separate tables, or even put into a separate library for sub-library. Simply put, vertical splitting refers to the splitting of data table columns, splitting a table with many columns into multiple tables. As shown in the figure below, it should be easier for everyone to understand.

The advantages of vertical splitting: it can make the row data smaller, reduce the number of blocks read during query, and reduce the number of I/Os. In addition, vertical partitioning can simplify the structure of the table and is easy to maintain.

Disadvantages of vertical split: the primary key will be redundant, redundant columns need to be managed, and will cause Join operations, which can be solved by joining at the application layer. In addition, vertical partitioning will make transactions more complicated;

The vertical split table puts the primary key and some columns in one table, and then puts the primary key and other columns in another table

Applicable scenarios 1. If some columns in a table are commonly used, others are not commonly used 2. Data rows can be made smaller, a data page can store more data, and the number of I/Os during querying can be reduced. Disadvantages 1. Some tables are divided The strategy is based on the logic algorithm of the application layer. Once the logic algorithm is changed, the entire sub-table logic will change, and the scalability is poor. 2. For the application layer, the logic algorithm increases the development cost. 3. Manage redundant columns and query all data requires join operations Horizontal partition:

Keep the data table structure unchanged, and store data fragments through a certain strategy. In this way, each piece of data is scattered into different tables or libraries, achieving the purpose of distribution. Horizontal split can support a very large amount of data. Horizontal splitting refers to the splitting of data table rows. When the number of table rows exceeds 2 million rows, it will slow down. At this time, the data of one table can be divided into multiple tables for storage. For example: we can split the user information table into multiple user information tables, so that we can avoid the performance impact caused by the excessive amount of data in a single table.

Water product splitting can support a very large amount of data. One thing to note is: split table only solves the problem of too large data in a single table, but because the data of the table is still on the same machine, it does not make much sense to improve the concurrency of MySQL, so horizontal split is best to split the database. .

Horizontal splitting can support a very large amount of data storage, and there are few application-side transformations, but fragmentation transactions are difficult to solve, the cross-border join performance is poor, and the logic is complicated.

The author of "The Practice of Java Engineers" recommends not to fragment the data as much as possible, because the split will bring various complexity of logic, deployment, operation and maintenance, and the general data table can support less than tens of millions when properly optimized. The amount of data is not a big problem. If you really want to fragment, try to choose the client-side fragmentation architecture, which can reduce the network I/O once and middleware.

Horizontal sub-table: The table is very large. After the division, the number of data and index pages that need to be read during query can be reduced, and the number of index layers can be reduced, and the number of queries can be increased.

Applicable scenarios 1. The data in the table has its own independence. For example, the table records data in various regions or data in different periods, especially some data is commonly used, and some are not commonly used. 2. The data needs to be stored on multiple media. Disadvantages of horizontal segmentation 1. It adds complexity to the application. Usually, multiple table names are required for query. UNION operation is required to query all data. 2. In many database applications, this complexity will exceed the advantages it brings. It will increase the number of disks to read an index layer. Here are two common scenarios for database sharding:

1. Client proxy: The fragmentation logic is on the application side, encapsulated in a jar package, and implemented by modifying or encapsulating the JDBC layer. Dangdang's Sharding-JDBC and Ali's TDDL are two commonly used implementations. 2. Middleware agent: an agent layer is added between the application and the data. The sharding logic is uniformly maintained in the middleware service. We are now talking about Mycat, 360's Atlas, Netease's DDB, etc. are all implementations of this architecture.

Problems faced after sub-database and sub-tables Problems faced after sub-database and sub-tables After the transaction supports sub-database and sub-tables, it becomes a distributed transaction. If you rely on the distributed transaction management function of the database itself to execute the transaction, you will pay a high performance price; if the application program assists in the control and forms a program logic transaction, it will also cause programming burdens. As long as the cross-database join is split, the problem of cross-node Join is inevitable. But good design and segmentation can reduce the occurrence of such situations. The common way to solve this problem is to implement it in two queries. Find the ids of the associated data in the result set of the first query, and initiate a second request to get the associated data based on these ids. Sub-database and sub-table solution products Cross-node count, order by, group by and aggregate function problems are a kind of problems, because they all need to be calculated based on all data sets. Most agents do not automatically handle the merger work. Solution: Similar to solving the cross-node join problem, merge the results on the application side after obtaining the results on each node. The difference with join is that the query of each node can be executed in parallel, so it is much faster than a single large table in many cases. But if the result set is large, the consumption of application memory is a problem. Data migration, capacity planning, expansion and other issues come from the Taobao integrated business platform team. It uses the feature of forward compatibility with multiples of 2 (for example, if the remainder of 4 is 1 is also 1 for 2), it is allocated Data, avoiding row-level data migration, but still need to perform table-level migration, and there are restrictions on the scale of expansion and the number of sub-tables. In general, none of these solutions are ideal, and there are more or less shortcomings. This also reflects the difficulty of Sharding's expansion. ID problem Once the database is divided into multiple physical nodes, we can no longer rely on the database's own primary key generation mechanism. On the one hand, the self-generated ID of a partitioned database cannot be guaranteed to be globally unique; on the other hand, the application needs to obtain the ID before inserting data in order to perform SQL routing. Some common primary key generation strategies UUID Using UUID as the primary key is the simplest solution, but the shortcomings are also very obvious. Because the UUID is very long, in addition to occupying a large amount of storage space, the main problem is the index, there are performance problems in indexing and querying based on the index. Twitter's distributed self-increasing ID algorithm Snowflake In distributed systems, there are still many occasions where global UIDs need to be generated. Twitter's snowflake solves this demand, and the implementation is still very simple. Remove the configuration information, the core code is milliseconds Class time 41 bits machine ID 10 bits serial 12 bits in milliseconds. Sorting and paging across shards Generally speaking, paging needs to be sorted according to the specified field. When the sort field is a fragment field, we can easily locate the specified fragment through fragmentation rules. When the sort field is not a fragment field, the situation becomes more complicated. For the accuracy of the final result, we need to sort and return the data in different shard nodes, summarize and sort the result sets returned by different shards, and finally return it to the user. As shown below:

MySQL replication principle and process master-slave replication: The DDL and DML operations in the master database are transferred to the slave database through the binary log (BINLOG), and then these logs are re-executed (redoed); thus, the data of the slave database and the master The database remains consistent. The role of master-slave replication 1. There is a problem with the master database, you can switch to the slave database. 2. It is possible to separate read and write at the database level. 3. You can perform daily backups on the slave database. Problems solved by MySQL master-slave replication Data distribution: Start or stop replication at will, and distribute data backups in different geographic locations. Load balancing: Reduce the pressure on a single server. High availability and failover: Help applications avoid single points of failure. Upgrade Test: You can use a higher version of MySQL as the slave database. The working principle of MySQL master-slave replication. Record higher data to the binary log on the master database. Copy the log of the master database from the database to its own relay log. Read from the database. Take the event of the relay log and replay it into the basic principle process of the slave database data. The three threads and the related master: binlog thread-record all the statements that change the database data and put them into the binlog on the master Medium; From: io thread-after using start slave, is responsible for pulling the binlog content from the master and putting it into its own relay log; From: sql execution thread-executing the statements in the relay log; copying process

Binary log: Binary log of the master database Relay log: Relay log of the slave server Step 1: The master writes the operation record to the binlog file serially before the completion of each transaction update data. Step 2: Salve opens an I/O Thread, which opens an ordinary connection in the master, and its main job is binlog dump process. If the read progress has kept up with the master, it enters the sleep state and waits for the master to generate new events. The ultimate goal of the I/O thread is to write these events to the relay log. The third step: SQL Thread will read the relay log and execute the SQL events in the log sequentially, so as to be consistent with the data in the main database. What are the solutions for read-write separation? Read-write separation relies on master-slave replication, and master-slave replication serves the read-write separation. Because the master-slave replication requires that the slave cannot write but only read (if you perform a write operation on the slave, then show slave status will show Slave_SQL_Running=NO, and you need to manually synchronize the slave as mentioned above). Solution 1 uses mysql-proxy proxy Advantages: Directly realize read-write separation and load balancing, without modifying the code, master and slave use the same account, MySQL official does not recommend using it in actual production Disadvantages: reduced performance, does not support transactions. +aop+annotation determines the data source at the dao layer. If mybatis is used, the read-write separation can be placed in the ORM layer. For example, mybatis can intercept SQL statements through the mybatis plugin. All insert/update/delete access the master library, and all selects access the salve library. Is transparent. When the plugin is implemented, the master and slave libraries can be selected through annotations or the analysis statement is read and write methods. However, there is still a problem, that is, transactions are not supported, so we need to rewrite the DataSourceTransactionManager to throw read-only transactions into the read library. The rest are read and written and thrown into the writing library. Scheme 3 uses AbstractRoutingDataSource+aop+annotation to determine the data source at the service layer, which can support transactions. Disadvantages: When the internal methods of the class are called each other through this.xx(), aop will not intercept and require special processing. Backup plan, implementation principles of mysqldump and xtranbackup (1) The backup plan depends on the size of the library. Generally speaking, the library within 100G can be considered using mysqldump, because mysqldump is more lightweight and flexible, and the backup time is selected during the peak period of business , You can perform a full backup every day (the files backed up by mysqldump are relatively small, and smaller after compression). For libraries above 100G, you can consider using xtranbackup to do it. The backup speed is obviously faster than mysqldump. Generally, one full backup is selected a week, and incremental backups are performed every day for the rest. The backup time is the peak period of the business. (2) Backup recovery time Physical backup recovery is fast, logical backup recovery is slow. This is related to the speed of the machine, especially the hard disk. The following are just a few for reference. 20G 2 minutes (mysqldump) 80G 30 minutes (mysqldump) 111G 30 minutes (mysqldump) 3 hours for 288G (xtra) 4 hours for 3T (xtra) Logical import time is generally more than 5 times the backup time (3) How to deal with backup and recovery failures First of all, you should make sufficient preparations before recovery to avoid An error occurred during recovery. For example, the validity check, permission check, and space check after backup. If an error is reported, make corresponding adjustments according to the error message. (4) The realization principle of mysqldump and xtrabackup mysqldump mysqldump belongs to logical backup. Add the -single-transaction option to make a consistent backup. The background process will first set the transaction isolation level of the session to RR (SET SESSION TRANSACTION ISOLATION LEVELREPEATABLE READ), and then explicitly start a transaction (START TRANSACTION/*!/), which ensures that the data read in the transaction is a snapshot of the transaction. Then read the data of the table. If you add master-data=1, you will also add a database read lock (FLUSH TABLES WITH READ LOCK) at the beginning, and after the transaction is opened, record the location of the binlog of the database at this time (showmaster status) , Unlock immediately, and then read the data in the table. After all the data has been exported, you can end the transaction Xtrabackup: xtrabackup is a physical backup, directly copy the tablespace file, and continuously scan the generated redo log and save it. After the innodb backup is finally completed, a flush engine logs operation will be performed (the old version has a bug, and data will be lost if this operation is not performed on 5.6) to ensure that all redo logs have been placed (involving the two phases of the transaction) The concept of submission, because xtrabackup does not copy binlog, it is necessary to ensure that all redo logs are placed on the disk, otherwise the last set of submitted transaction data may be lost). This point in time is the point in time when innodb completes the backup. Although the data files are not consistent, the redo during this period of time can make the data files consistent (what they do when restoring). Then you also need to flush tables with read lock, back up the tables of other engines such as myisam, and unlock them after the backup is complete. In this way, a perfect hot backup is achieved. What are the ways to repair the damaged data table? Use myisamchk to repair, the specific steps: 1) Stop the mysql service before repairing. 2) Open the command line mode, and then enter the/bin directory of mysql. 3) Execute myisamchk -recover database path/.MYI Use repair table or OPTIMIZE table commands to repair, REPAIR TABLE table_name repair table OPTIMIZE TABLE table_name optimization table REPAIR TABLE is used to repair damaged tables. OPTIMIZE TABLE is used to reclaim the idle database space. When the data rows on the table are deleted, the occupied disk space is not immediately reclaimed. After the OPTIMIZE TABLE command is used, these spaces will be reclaimed, and the data rows on the disk Perform rearrangement (note: it is on the disk, not the database)