1. Primary key, super key, candidate key, foreign key
A combination of data columns or attributes that uniquely and completely identify stored data objects in a database table. A data column can only have one primary key , and the value of the primary key cannot be missing, that is, it cannot be a null value (Null).
The attribute set that can uniquely identify the tuple in the relationship is called the super key of the relationship mode. An attribute can be used as a super key, and a combination of multiple attributes can also be used as a super key. Super keys include candidate keys and primary keys.
It is the smallest super key , that is, the super key without redundant elements.
The primary key of another table that exists in one table is called the foreign key of this table.
2. 4.characteristics and meanings of database transactions
4.basic elements for the correct execution of database transaction transanction. ACID, Atomicity, Correspondence, Isolation, Durability.
Atomicity : All operations in the entire transaction are either completed or not completed at all, and it is impossible to stop at a certain link in the middle. If an error occurs during the execution of the transaction, it will be rolled back to the state before the transaction started, as if the transaction had never been executed.
Consistency : Before the start of the transaction and after the end of the transaction, the integrity constraints of the database have not been destroyed.
Isolation : The execution of transactions in an isolated state makes them seem to be the only operation performed by the system in a given time. If there are two transactions that run in the same time and perform the same function, the isolation of the transaction will ensure that each transaction in the system thinks that only the transaction is using the system. This property is sometimes called serialization. In order to prevent confusion between transaction operations, requests must be serialized or serialized so that only one request is used for the same data at the same time.
Persistence : After the transaction is completed, the changes made by the firm to the database are persisted in the database and will not be rolled back.
3. The role of the view, can the view be changed?
A view is a virtual table, which is different from a table that contains data. The view only contains queries that dynamically retrieve data when used; it does not contain any columns or data. Using views can simplify complex SQL operations, hide specific details, and protect data; after views are created, they can be used in the same way as tables.
The view cannot be indexed, nor can it have associated triggers or default values. If there is an order by in the view itself, the order by of the view will be overwritten again.
Create a view: create view XXX as XXXXXXXXXXXXXX;
for some views, such as the grouping aggregate function Distinct Union that does not use the join subquery, it can be updated, and the update of the view will update the base table; but the view is mainly used for simplification Retrieve and protect data, not for updating, and most views cannot be updated.
4. The difference between drop, delete and truncate
Drop directly deletes the table truncate to delete the data in the table, and then inserts the self-increasing id from 1 to delete the data in the table. You can add the where word.
(1) The DELETE statement executes the delete process to delete a row from the table each time, and at the same time the delete operation of the row is recorded in the log as a transaction and saved in the log for rollback operation. TRUNCATE TABLE deletes all data from the table at one time and does not record a separate delete operation record into the log for storage. Deleted rows cannot be recovered. And the delete trigger related to the table will not be activated during the delete process. The execution speed is fast.
(2) The space occupied by tables and indexes. When the table is TRUNCATE, the space occupied by the table and index will be restored to the original size, and the DELETE operation will not reduce the space occupied by the table or index. The drop statement releases all the space occupied by the table.
(3) Generally speaking, drop> truncate> delete
(4) The scope of application. TRUNCATE can only be used for TABLE; DELETE can be table and view
(5) TRUNCATE and DELETE only delete data, while DROP deletes the entire table (structure and data).
(6) truncate and delete without where: only delete data, without deleting the structure (definition) of the table. The drop statement will delete the constraint, trigger, and index on which the structure of the table is dependent; dependent The stored procedure/function in the table will be retained, but its status will become: invalid.
(7) The delete statement is DML (data maintain language), this operation will be placed in the rollback segment, and it will take effect after the transaction is committed. If there is a corresponding tigger, it will be triggered when it is executed.
(8) truncate and drop are DLL (data define language), the operation takes effect immediately, the original data is not placed in the rollback segment, and cannot be rolled back
(9) When there is no backup, use drop and truncate with caution. To delete some data rows, use delete and pay attention to combining where to restrict the scope of influence. The rollback segment must be large enough. To delete the table, use drop; if you want to keep the table and delete the data in the table, if it has nothing to do with the transaction, use truncate. If it is related to business, or the teacher wants to trigger the trigger, still use delete.
(10) Truncate table table name is fast and efficient, because:
truncate table is functionally the same as the DELETE statement without the WHERE clause: both delete all rows in the table. But TRUNCATE TABLE is faster than DELETE, and it uses fewer system and transaction log resources. The DELETE statement deletes one row at a time, and records one item in the transaction log for each row deleted. TRUNCATE TABLE deletes data by releasing data pages used to store table data, and only records the release of pages in the transaction log.
(11) TRUNCATE TABLE deletes all rows in the table, but the table structure and its columns, constraints, indexes, etc. remain unchanged. The count value used for the new row identification is reset to the seed of the column. If you want to keep the mark count value, use DELETE instead. If you want to delete the table definition and its data, use the DROP TABLE statement.
(12) For tables referenced by FOREIGN KEY constraints, TRUNCATE TABLE cannot be used, but DELETE statements without WHERE clauses should be used. Since TRUNCATE TABLE is not recorded in the log, it cannot activate triggers.
5. The working principle and types of indexes
The database index is a sorted data structure in the database management system to help quickly query and update the data in the database table. The realization of the index usually uses the B tree and its variant B+ tree .
In addition to data, the database system also maintains data structures that meet specific search algorithms. These data structures reference (point to) data in a certain way, so that advanced search algorithms can be implemented on these data structures. This data structure is the index.
There is a price to pay for setting up an index for a table: one is to increase the storage space of the database, and the other is to spend more time when inserting and modifying data (because the index will also change accordingly).
The figure shows a possible way of indexing. On the left is the data table, there are two columns of seven records, the leftmost is the physical address of the data record (note that logically adjacent records are not necessarily physically adjacent on the disk). In order to speed up the search of Col2, a binary search tree as shown on the right can be maintained. Each node contains an index key value and a pointer to the physical address of the corresponding data record, so that binary search can be used in O(log2n) Obtain the corresponding data within the complexity.
Creating an index can greatly improve the performance of the system.
1. by creating a unique index, the uniqueness of each row of data in the database table can be guaranteed.
2. the data retrieval speed can be greatly accelerated, which is also the main reason for creating an index.
3. it can speed up the connection between the table and the table, which is particularly meaningful in terms of achieving the referential integrity of the data.
4. when using grouping and sorting clauses for data retrieval, the time for grouping and sorting in the query can also be significantly reduced.
Fifth, by using the index, you can use the optimization hider in the query process to improve the performance of the system.
Someone may ask: Adding indexes has so many advantages, why not create an index for each column in the table? Because, increasing the index also has many disadvantages.
1. it takes time to create and maintain indexes, and this time increases as the amount of data increases.
2. the index needs to occupy physical space. In addition to the data space occupied by the data table, each index also occupies a certain physical space. If a clustered index is to be established, the space required will be larger.
3. when the data in the table is added, deleted, and modified, the index must also be dynamically maintained, which reduces the speed of data maintenance.
The index is built on some columns in the database table. When creating an index, you should consider on which columns you can create an index and on which columns you cannot create an index. **Generally speaking, indexes should be created on these columns: **On the columns that often need to be searched, the search speed can be speeded up; on the columns as the primary key, the uniqueness of the column and the arrangement of the data in the organization table should be forced Structure; on the columns that are often used in the connection, these columns are mainly foreign keys, which can speed up the connection; create an index on the column that often needs to be searched according to the range, because the index has been sorted, and the specified range is continuous ; Create indexes on columns that often need to be sorted, because the indexes are already sorted, so that queries can use index sorting to speed up sorting query time; create indexes on columns that are often used in the WHERE clause to speed up the judgment of conditions.
Similarly, indexes should not be created for some columns. Generally speaking, these columns that should not be indexed have the following characteristics:
1. indexes should not be created for columns that are rarely used or referenced in queries. This is because, since these columns are rarely used, indexing or no indexing does not improve the query speed. On the contrary, due to the increase of the index, it reduces the maintenance speed of the system and increases the space requirement.
2. indexes should not be added for columns with few data values. This is because, because these columns have very few values, such as the gender column of the personnel table, in the query results, the data rows of the result set account for a large proportion of the data rows in the table, that is, the data that needs to be searched in the table The proportion of rows is large. Increasing the index cannot significantly speed up the retrieval speed.
3. for columns defined as text, image, and bit data types, indexes should not be added. This is because the amount of data in these columns is either quite large or has few values.
4. when the modification performance is far greater than the retrieval performance, an index should not be created. This is because modification performance and retrieval performance are contradictory to each other . When the index is increased, the retrieval performance will be improved, but the modification performance will be reduced. When the index is reduced, the modification performance will be improved and the retrieval performance will be reduced. Therefore, when the modification performance is far greater than the retrieval performance, an index should not be created.
A unique index is an index that does not allow any two rows to have the same index value.
When there are duplicate key values in existing data, most databases do not allow the newly created unique index to be saved with the table. The database may also prevent the addition of new data that will create duplicate key values in the table. For example, if a unique index is created on the last name (lname) of an employee in the employee table, no two employees can have the same last name. Primary key index Database tables often have a column or combination of columns whose value uniquely identifies each row in the table. This column is called the primary key of the table. Defining a primary key for a table in the database diagram will automatically create a primary key index, which is a specific type of unique index. The index requires that each value in the primary key is unique. When the primary key index is used in the query, it also allows quick access to the data. Clustered index In a clustered index, the physical order of the rows in the table is the same as the logical (index) order of the key values. A table can only contain one clustered index.
If an index is not a clustered index, the physical order of the rows in the table does not match the logical order of the key values. Compared with non-clustered indexes, clustered indexes usually provide faster data access speed.
The principle of locality and disk read-ahead
Due to the characteristics of the storage medium, the disk access is much slower than the main memory. In addition to the cost of mechanical motion, the access speed of the disk is often a few hundredths of the main memory. Therefore, in order to improve efficiency, it is necessary to minimize the disk I/O. In order to achieve this goal, the disk is often not read strictly on-demand, but read ahead every time. Even if only one byte is needed, the disk will start from this position and sequentially read a certain length of data backwards and put it into the memory. The theoretical basis for this is the well-known principle of locality in computer science : when a piece of data is used, the nearby data will usually be used immediately. The data needed during the running of the program is usually concentrated.
Since the disk sequential read efficiency is very high (no seek time is required, only a small rotation time is required), for localized programs, pre-reading can improve I/O efficiency.
The length of pre-reading is generally an integral multiple of a page. A page is a logical block of computer management memory. The hardware and operating system often divide the main memory and the disk storage area into continuous blocks of equal size. Each memory block is called a page (in many operating systems, the page size is usually 4k), the main memory and disk exchange data in units of pages. When the data to be read by the program is not in the main memory, it will trigger a page fault exception. At this time, the system will send a disk read signal to the disk, and the disk will find the starting position of the data and read one or several pages continuously. Load into the memory, and then return abnormally, the program continues to run.
Performance analysis of B-/+Tree index
At this point, we can finally analyze the performance of the B-/+Tree index.
As mentioned above, generally use the number of disk I/O to evaluate the pros and cons of the index structure. Analyzing from B-Tree first, according to the definition of B-Tree, it can be seen that at most h nodes need to be visited at a time. The designer of the database system cleverly used the principle of disk read-ahead, setting the size of a node equal to one page, so that each node only needs one I/O to be fully loaded. In order to achieve this goal, in the actual implementation of B-Tree also need to use the following skills:
Every time you create a new node, you directly apply for a page space, which ensures that a node is also physically stored in a page. In addition, the computer storage allocation is page-aligned, so that a node only needs one I/O.
**One search in B-Tree requires at most h-1 I/O (root node resident memory), and the progressive complexity is O(h)=O(logdN). **In general practical applications, the degree d is a very large number, usually more than 100, so h is very small (usually not more than 3).
With this structure of red-black trees, h is obviously much deeper. Since the logically close nodes (parent and child) may be physically far away and locality cannot be used, the I/O progressive complexity of the red-black tree is also O(h), and the efficiency is obviously much worse than that of the B-Tree.
In summary, the efficiency of using B-Tree as an index structure is very high.
6. Type of connection
Execute in the query analyzer:
--Create table table1, table2:
create table table1(id int,name varchar(10))
create table table2(id int,score int)
insert into table1 select 1,'lee'
insert into table1 select 2,'zhang'
insert into table1 select 4,'wang'
insert into table2 select 1,90
insert into table2 select 2,100
insert into table2 select 3,70
as in the table
table1 | table2 |
id name |id score |
1 lee |1 90|
2 zhang| 2 100|
4 wang| 3 70|
The following are executed in the Query Analyzer
1. Concept: including left outer join, right outer join or complete outer join
2. Left join: left join or left outer join
(1) The result set of the left outer join includes all rows of the left table specified in the LEFT OUTER clause, not just the rows matched by the join column. If a row of the left table does not have a matching row in the right table, all select list columns of the right table in the associated result set row are null.
(2) sql statement
select * from table1 left join table2 on table1.id=table2.id
Note: Include all clauses of table1, return the corresponding fields of table2 according to the specified conditions, and display the non-compliant ones as null
3. Right connection: right join or right outer join
(1) Right outer join is the reverse connection of left outer join. All rows of the right table will be returned. If a row of the right table has no matching row in the left table, a null value will be returned for the left table.
(2) sql statement
select * from table1 right join table2 on table1.id=table2.id
Note: Contains all clauses of table2, returns the corresponding field of table1 according to the specified conditions, and displays the non-compliant ones as null
4. Complete outer join: full join or full outer join
(1) Complete outer join returns all rows in the left and right tables. When a row has no matching row in another table, the select list column of the other table contains a null value. If there are matching rows between the tables, the entire result set row contains the data values of the base table.
(2) sql statement
select * from table1 full join table2 on table1.id=table2.id
Note: return the sum of the left and right connections (see the left and right connections above)
2. inner join
1. Concept: inner join is a join that uses comparison operators to compare the values of the columns to be joined
2. Inner join: join or inner join
select * from table1 join table2 on table1.id=table2.id
Note: Only the columns of table1 and table2 that meet the conditions are returned
4. Equivalent (same effect as the following)
A:select a.*,b.* from table1 a,table2 b where a.id=b.id
B:select * from table1 cross join table2 where table1.id=table2 .id ( Note: You can only use where if you add conditions after cross join, not on )
3. cross-connect **(complete)**
1. Concept: A cross join without a WHERE clause will produce a Cartesian product of the tables involved in the join. The number of rows in the first table multiplied by the number of rows in the second table is equal to the size of the Cartesian product result set. (The cross connection of table1 and table2 produces 3*3=9 records)
2. Cross connection: cross join (without conditions where...)
select * from table1 cross join table2
Note: Return 3*3=9 records, that is, Cartesian product
4. Equivalent (same effect as the following execution)
A: select * from table1, table2
7. Database paradigm
1 First Normal Form (1NF)
In any relational database, the first normal form (1NF) is the basic requirement of the relational model. A database that does not meet the first normal form (1NF) is not a relational database.
The so-called first normal form (1NF) means that each column of the database table is an inseparable basic data item. There cannot be multiple values in the same column, that is, an attribute in an entity cannot have multiple values or duplicate attributes . If there are duplicate attributes, it may be necessary to define a new entity. The new entity is composed of duplicate attributes, and there is a one-to-many relationship between the new entity and the original entity. In the first normal form (1NF), each row of the table contains only one instance of information. In short, the first normal form is a column without repetition.
2 2.Normal Form (2NF)
The second normal form (2NF) is established on the basis of the first normal form (1NF), that is, to satisfy the second normal form (2NF), the first normal form (1NF) must be satisfied first. The second normal form (2NF) requires that each instance or row in the database table must be uniquely distinguishable. In order to realize the distinction, it is usually necessary to add a column to the table to store the unique identification of each instance. This unique attribute column is called the primary key or primary key, primary code.
The second normal form (2NF) requires that the attributes of entities are completely dependent on the primary key. The so-called complete dependence means that there can be no attributes that only depend on a part of the primary key. If they exist, then this attribute and this part of the primary key should be separated to form a new entity. The new entity and the original entity are one-to-many. relationship. In order to realize the distinction, it is usually necessary to add a column to the table to store the unique identification of each instance. In short, the second paradigm is that non-primary attributes are not partially dependent on primary keywords.
3 Third Normal Form (3NF)
To meet the third normal form (3NF), you must first meet the second normal form (2NF). In short, the third normal form (3NF) requires that a database table does not contain non-primary key information that is already contained in other tables. For example, there is a department information table, where each department has information such as department number (dept_id), department name, and department profile. Then after the department number is listed in the employee information table, department-related information such as department name and department profile can no longer be added to the employee information table. If there is no department information table, it should be constructed according to the third normal form (3NF), otherwise there will be a lot of data redundancy. In short, the third paradigm is that attributes do not depend on other non-primary attributes. (My understanding is to eliminate redundancy)
8. The idea of database optimization
1. SQL statement optimization
1) Try to avoid using the != or <> operator in the where clause, otherwise the engine will give up using the index and perform a full table scan.
2) Try to avoid the null value judgment of the field in the where clause, otherwise it will cause the engine to give up using the index and perform a full table scan, such as:
select id from t where num is null
You can set the default value of 0 on num to ensure There is no null value in the num column in the table , and then query like this:
select id from t where num=0
3) Many times it is a good choice to replace in with exists
4) Replace the HAVING clause with the Where clause because HAVING will only be retrieved Filter the result set after all records
2. Index optimization
See above index
3. Database structure optimization
1) Paradigm optimization: For example, eliminate redundancy (save space...) 2) Anti-paradigm optimization: For example, add redundancy appropriately (reduce join) 3) Split table: Partition physically separates the data, different partitions Data can be programmed and saved in data files on different disks. In this way, when querying this table, you only need to scan in the table partition instead of the full table scan, which significantly shortens the query time. In addition, the partitions on different disks will also spread the data transmission to this table in different Disk I/O, a carefully set partition can evenly spread the competition between data transmission and disk I/O. This method can be used for time-to-time tables with large data volumes. Table partitions can be automatically built on a monthly basis.
4) Splitting is actually divided into vertical splitting and horizontal splitting: Case: The simple shopping system temporarily involves the following tables: 1. Product table (data volume 10w, stable) 2. Order table (data volume 200w, and there is a growing trend ) 3. User table (data volume is 100w, and there is a growing trend) Take mysql as an example to describe horizontal split and vertical split, the order of magnitude that mysql can tolerate can reach tens of millions of static data **Vertical split:* * solve the problem: io competition between the table and the table does not solve the problem: the amount of data in a single table increase pressure schemes appear: the product tables and user tables placed on the orders table into a server on a separate server level split: Solve the problem: the pressure of the increase in the amount of data in a single table does not solve the problem: the io competition between the table and the table
: the user table is split into a male user table and a female user table by gender, and the order table is split through completed and completed For completed orders and uncompleted orders, place the product list on a server for uncompleted orders, and place a completed order list box on a server for male users. Place a server for female users on a server (women love shopping haha)
4. Server hardware optimization
This is so much money!
9. The difference between stored procedures and triggers
Triggers are very similar to stored procedures. Triggers are also a set of SQL statements. The only difference between the two is that the trigger cannot be called with the EXECUTE statement, but is automatically triggered (activated) when the user executes the Transact-SQL statement. A trigger is a stored procedure that is executed when the data in a specified table is modified. Through normally to enforce different logical data tables created by trigger referential integrity and consistency. Since users cannot bypass triggers, they can be used to enforce complex business rules to ensure data integrity. Triggers are different from stored procedures, triggers are mainly executed by event execution triggers , and stored procedures can be directly called by the name of the stored procedure . When operations such as UPDATE, INSERT, and DELETE are performed on a table, SQLSERVER will automatically execute the SQL statements defined by the trigger to ensure that the data processing must comply with the rules defined by these SQL statements.
================================================= ================================================== ==