HashMap (only for JDK8)

HashMap (only for JDK8)


Before understanding HashMap, first understand its top-level interface Map

Map: It is a set of mappings, composed of key-value pairs, and Value can be obtained based on Key. The Key in the Map is unique and non-repeatable, but the value can be repeated. A key corresponds to a unique value.

Common implementation classes of Map include HashMap, Hashtable, Properties, TreeMap, LinkedHashMap, etc.

HashMap: 1, Map class based on hash table implementation

2. Data access is disordered

3. Both Key and Value support null values

4. Not safe in a multi-threaded environment

Hashtable: 1, Map class based on hash table implementation

2. Data access is disordered

3. Neither Key nor Value support null values

4. Safe in a multi-threaded environment

Properties: 1. Properties is the only set combined with IO streams, and also represents a persistent set of properties. ProPerties can be saved in the stream or loaded from the stream

2, Key and Value are fixed to String type

3. Use the load method to read the files in the hard disk (the file must be in the form of key-value pairs) into the collection for use

4. Use the store method to write the data in the collection to the specified file (in the form of key-value pairs)

TreeMap: 1. Implementation of NavigableMap class based on red-black tree

2, can be sorted and output according to Key's natural sort or compareTo method

3. Not safe in a multi-threaded environment

LinkedHashMap: 1, Map class based on hash table and doubly linked list

2. Ordered, output according to the order of insertion

3. Both Key and Value support null values

4. Not safe in a multi-threaded environment

HashMap analysis

Before parsing the HashMap, first learn about the hash table

Hash table (also called hash table) is a data structure that is directly accessed based on the key value. In other words, it accesses the record by mapping the key code value to a location in the table to speed up the search. This mapping function is called a hash function, and the array storing records is called a hash table. Given table M, there is a function f(key). For any given key value key, if the address of the record in the table containing the key can be obtained after substituting the function into the function, then table M is called a hash (Hash) Table, the function f (key) is a hash (Hash) function. ---Baidu Encyclopedia

To put it simply, the key in the key-value pair to be stored is mapped to a certain position in the array through the hash function, and the value is placed in that position for storage

As shown below

From the above, it can be seen that the quality of the hash function design directly affects the quality of the hash table. A good hash function can ensure simple calculation and uniform hash address distribution as much as possible, but the best hash function It must be guaranteed that the key address is unique. The answer is no . It is precisely because of this that hash collisions , also called hash collisions , will occur .

There are several ways to resolve hash conflicts:

1. Open address law

2. Re-hashing

3. Chain address method

4. Public overflow zone law

The method of HashMap to resolve hash conflicts is to use the chain address method.

Chain address method : Each hash table node has a next pointer, multiple hash table nodes can use the next pointer to form a singly linked list, and multiple nodes assigned to the same index can be connected by this singly linked list stand up.

HashMap creates a Node class to store data and Node addresses that point to the same hash value

![]( gulizackking.oss-cn-beijing.aliyuncs.com/Network/Screenshot... 2021-06-15 155428.png)

From the source code, we can see that the Node class implements the Map.Entry interface and has four internal attributes

1. Hash: This attribute is used to store the hash value corresponding to the key

2. Key: used to store keys

3. Value: used to store the value

4. next: used to point to the next Node address with the same hash value

Through the above, we can know the underlying data structure of HashMap as shown below

In general, HashMap is a collection of data structures composed of an array + a linked list. The array is the most important part of the HashMap, and the linked list exists to solve the problem of hash conflicts. If there is no linked list found in a certain position of the array, the time complexity of its query is O(1). If there is a linked list, the time complexity of its query becomes O(n). Because the existence of the linked list is to solve the problem of hash conflicts, but if the linked list is too long, will its efficiency become low? The answer is yes, because the linked list is not continuous in physical space, it is only continuous in logical space, and the next element is pointed to by a pointer, so its random query efficiency will be low. So how to solve this problem? In JDK8, HashMap adds the data structure of red-black tree. The increase of red-black tree is to solve the problem of inefficient query caused by too long linked list.

Red black tree

Red Black Tree is a self-balancing binary search tree. The red-black tree is a specialized AVL tree (balanced binary tree), which maintains the balance of the binary search tree through specific operations during insertion and deletion, so as to obtain higher search performance.

Although it is complicated, its worst-case running time is also very good, and it is efficient in practice: it can do search, insert and delete in O(log n) time, where n is in the tree The number of elements. ---Baidu Encyclopedia

From the definition of Baidu Encyclopedia, we can get a cure. The red-black tree is a special balanced binary tree, which needs to meet some additional requirements.

Nature 1. The node is red or black.

Property 2. The root node is black.

Nature 3. All leaves are black. (Leaves are NIL nodes)

Property 4. The two child nodes of each red node are black. (There cannot be two consecutive red nodes on all paths from each leaf to the root)

Property 5. All paths from any node to each leaf contain the same number of black nodes.

Properties of HashMap

/* Number of key-value pairs in HashMap */ transient int size; /* The number of times the structure of the HashMap has been modified. The structure is modified to change the number of key-value pairs in the HashMap or modify its internal structure in other ways (such as re-hashing). Since HashMap is an unsafe collection in a multi-threaded environment, when the HashMap is modified in structure in a multi-threaded environment, ConcurrentModificationException will be thrown according to this attribute. */ transient int modCount; /* Threshold, threshold=capacity * load factor. When this threshold is exceeded, HashMap will automatically expand */ int threshold; /* Load factor, the default is 0.75. The load factor exists to alleviate the inefficiency caused by hash conflicts. Use with threshold and capacity */ final float loadFactor; /* Node array used to store HashMap data */ Transient the Node <K, V> [] Table; duplicated code

HashMap constructor

/* Empty parameter structure, initial value is assigned to loadFactor, so that loadFactor=0.75 */ public HashMap () /* This construction method first assigns an initial value to loadFactor to make loadFactor, and then calls the putMapEntries(Map<? extends K,? Extends V> m, boolean evict) method. This method is to make a copy of the passed map, so that the object created using this construction method has an initial value */ public HashMap (Map<? extends K,? extends V> m) /* initialCapacity: the initial bucket size loadFactor: load factor This method first judges whether the initialCapacity is less than 0, and throws an exception. If it is greater than MAXIMUM_CAPACITY, then initialCapacity=MAXIMUM_CAPACITY will be set. After judging the initialCapacity, and then judging whether loadFactor is less than or equal to 0 or judging whether it is NaN, an exception will be thrown. If the judgment is unsuccessful, assign a value to loadFactor and threshold */ public HashMap ( int initialCapacity, float loadFactor) /* initialCapacity: the initial bucket size The realization of the construction method is to call the public HashMap(int initialCapacity, float loadFactor) method, initialCapacity=initialCapacity loadFactor=DEFAULT_LOAD_FACTOR(DEFAULT_LOAD_FACTOR=0.75) */ public HashMap ( int initialCapacity) Copy code

In the construction method, the Node array is not initialized, but for example, loadFactor and threshold are initialized. The assignment of the Node array occurs in the put method.

put method

public V put (K key, V value) { return putVal(hash(key), key, value, false , true ); } //When the put method is called, the putval method is actually called Copy code
/* hash: This parameter is the hash value calculated by the Key value according to the hash method key: the key value passed in by put value: the value corresponding to the key passed in by put onlyIfAbsent: Do you want to change the existing value evict: whether it is in creation mode */ final V putVal ( int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0 ) //Judge HashMap Whether the object has a Node array, that is, whether the current HashMap is the first put value. If it is, it will perform a resize operation, which is an initialization operation. n = (tab = resize()).length; if ((p = tab[i = (n- 1 ) & hash]) == null ) //If the Node array in the current HashMap is not null, then the hash value corresponding to the key put in will be calculated. The index of the Node array. Use (n-1)&hash to calculate the index position of the Key in the Node array. Determine whether the current Node is null according to the current index position. If it is null, create a new Node directly at the current index and assign the value tab[i] = newNode(hash, key, value, null ); else { //If the last judgment is not true, then it means that a hash conflict has occurred, because as mentioned before, the method of HashMap to solve the hash conflict is the chain address method, that is, the method of using a linked list to store the same value of the hash value. Node<K,V> e; K k; if (p.hash == hash &&((k = p.key) == key || (key != null && key.equals(k)))) //First determine the current inserted value and the first node value The judgment basis is: //1, whether the hash value of the first node is the same as the inserted hash value //2, whether the key value of the first node is the same as the inserted key value (the comparison is the address ) Or whether the key value of the first node is the same as the inserted key value (the comparison is the specific value) //When the above two judgment conditions are met at the same time, it means that the inserted value is the same as the value of the first node, then The first node is assigned to the temporary node, and the newly inserted value value will be overwritten on the first node later e = p; else if (p instanceof TreeNode) //As mentioned earlier, HashMap became an array + linked list + red-black tree in JDK8. When the length of the linked list is greater than 7 and the array length is greater than or equal to 64, it will be converted to a red-black tree, but If the length of the linked list is greater than 7 but the length of the array is less than 64, only the expansion operation will be performed, and the linked list will not be converted into a red-black tree. This judgment is to check whether the current data structure is a linked list or a red-black tree. It is the red-black tree to insert the red-black tree //The returned TreeNode is assigned to the temporary node, and the corresponding operation will be made according to the temporary node. e = ((TreeNode<K,V>)p).putTreeVal ( this , tab, hash, key, value); else { //If the above two conditions are not satisfied, then it means that the current link is still used to resolve hash conflicts for ( int binCount = 0 ;; ++binCount ) { //Loop through the linked list until the last node position if ((e = p.next) == null ) { //Insert a new node at the end of the linked list p.next = newNode(hash, key, value, null ); //Judging whether the length of the linked list after insertion is >=7, if it is >=7, then convert the linked list to the red-black tree if (binCount >= TREEIFY_THRESHOLD- 1 ) //-1 for 1st treeifyBin(tab, hash); break ; } //When looping through the linked list, it will judge whether the looped node is the same as the inserted value, and if it is the same, it will jump out of the loop. Since the temporary node is used to store the nodes of the current linked list when the node is circulating, the temporary node will be assigned to the node if the current condition is judged to be true (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break ; p = e; } } //From here, you can know the role of the temporary node. If the temporary node is not empty, it means that there is a node with the same key value as the inserted node in the current HashMap if (e != null ) { //existing mapping for key //Remove the value of the temporary node V oldValue = e.value; //Determine whether onlyIfAbsent is false, or whether oldValue is null. //onlyIfAbsent value is false by default when it is passed in, so the condition must be true if (!onlyIfAbsent || oldValue == null ) //overwrite the inserted value to the same node as the inserted value e.value = value; //This method has no effect in HashMap afterNodeAccess(e); //Return the value before the node coverage return oldValue; } } //The put operation that did not find the same node as the inserted value will make modCount+1 ++modCount; //First carry out size+1, and then determine whether the value of size is greater than the threshold. threshold=capacity * loadfactor, without modifying the capacity and loadfactor, the threshold default size is 12 if (++size> threshold) //If it is larger than the threshold, perform the expansion operation resize(); //This method has no effect in HashMap afterNodeInsertion(evict); return null ; } Copy code

Through a general explanation of the put method in HashMap, you will have a general understanding of the put method. Next, I will sort out the other methods in this method (the method involving red-black trees will not be available for the time being, because it will not yet...)

hash method

//Calculate the hash value of the key static final int hash (Object key) { int h; //It can be found by looking at the source code that because the HashMap can insert a null value, the HashMap puts the value of the key as null into the Node array 0 index on //(h = key.hashCode()) ^ (h >>> 16) just draw a picture to understand return (key == null )? 0 : (h = key.hashCode()) ^ (h >>> 16 ); } Copy code

In JDK8, the XOR operation is performed through the high 16 bits and low 16 bits of hashCode() (h = key.hashCode()) ^ (h >>> 16). This calculation takes into account that when the length of the Node array is small, it can also ensure that both the high and low bits can participate in the calculation.

The reason for using & to calculate the index of the array is because the length of the Node array is always the n-th power of 2, so (n-1)&hash==hash%(n-1), but the & operation is faster than the% operation.

resize method

final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; //Mainly used to determine whether the current operation is an initialization operation or an expansion operation int oldCap = (oldTab == null )? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0 ; //To determine whether the current operation is Initialization or expansion operation if (oldCap> 0 ) { //If the current bucket size is greater than the maximum bucket size, then let the threshold of HashMap be the largest. When the threshold is exceeded next time, directly report an error and not perform expansion operations if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } //Double the current bucket size. If the size of the bucket size after expansion is less than the maximum bucket size and the bucket size before the expansion is greater than or equal to the default initialization bucket size //then double the threshold value else if ((newCap = oldCap << 1 ) <MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1 ; //double threshold } //Initial capacity was placed in threshold. This situation occurs when the object is created by the parameterized construction method. else if (oldThr> 0 ) //initial capacity was placed in threshold newCap = oldThr; //When both the threshold and the bucket size are 0, it means that there is no specified value. The bucket size will be assigned to the default value of 16. The threshold is assigned to the default load factor * default initialization bucket size=12. This situation occurs when no parameters are used. When constructing an object, else { //zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = ( int )(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } //If the new threshold is still 0 after the above judgment, it means that the parameterized construction method is used to create the object, and the new threshold needs to be assigned if (newThr == 0 ) { float ft = ( float )newCap * loadFactor; newThr = (newCap <MAXIMUM_CAPACITY && ft <( float )MAXIMUM_CAPACITY? ( int )ft: Integer.MAX_VALUE); } threshold = newThr; //Set the threshold to the new threshold @SuppressWarnings({"rawtypes","unchecked"}) //Initialize the Node array Node<K,V>[] newTab = (Node<K,V>[] ) new Node[newCap]; table = newTab; //Assign the new Node array to table //This judgment is used to determine whether it is an initialization operation or an expansion operation. The initialization operation does not perform the operation in the if statement if (oldTab != null ) { //Expansion operation for ( int j = 0 ; j <oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null ) { //Judging whether the looped Node is empty oldTab[j] = null ; //Empty the original hash bucket for GC if (e.next == null ) //If the current Node node is not connected to the linked list or the red-black tree, recalculate the index position of the node after the expansion //The index position is calculated with the hash value of the node and the length of the expanded new Node array newTab[e.hash & (newCap- 1 )] = e; else if (e instanceof TreeNode) //The current node is a red-black tree node, then the current Node before expansion is an array + red-black tree //red Black tree expansion assignment ((TreeNode<K,V>)e).split( this , newTab, j, oldCap); else {//preserve order //Entering this statement means that the underlying structure of the current HashMap before expansion is array + linked list //Because the expansion is doubled, the nodes on the original linked list may be in the original position or in the new position //The head node and tail node of the low linked list Node<K,V> loHead = null , loTail = null ; //The head node and tail node of the high linked list Node<K,V> hiHead = null , hiTail = null ; Node<K,V> next; //The temporary node stores the next node of e do { next = e.next; //Using the efficient nature of & operation, calculate the result of the hash value and oldCap de-modulo operation. If it is equal to 0, it means that the result is less than oldCap and put in the low position, otherwise it will be put in the high position if ((e.hash & oldCap) == 0 ) { if (loTail == null ) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null ) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null ); //The low-order linked list remains unchanged if (loTail != null ) { loTail.next = null ; newTab[j] = loHead; } //High-level linked list index position is the original index +oldCap if (hiTail != null ) { hiTail.next = null ; newTab[j + oldCap] = hiHead; } } } } } return newTab; } Copy code

The unsafe thread of HashMap lies in the expansion operation, so thread-safe ConcurrentHashMap needs to be used in a concurrent environment.

get method

public V get (Object key) { Node<K,V> e; //Get the Node node through the getNode method, if not, return null return (e = getNode(hash(key), key)) == null ? Null : e.value; } Copy code
final Node<K,V> getNode ( int hash, Object key) { //tab: Temporary storage bucket. //first: Temporarily store the hash value to calculate the first node corresponding to the index position //e: Temporary node //n: The size of the bucket //k: Store the val corresponding to the first node Node<K,V>[] tab ; Node<K,V> first, e; int n; K k; //Perform bucket size judgment and assign values to tab, n, first if ((tab = table) != null && (n = tab.length )> 0 && (first = tab[(n- 1 ) & hash]) != null ) { //By checking whether the hash value is the same and the key value is the same to judge whether the frist is the node to be looked up if (first.hash == hash && //always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; //assign the temporary node e to the next node first if ((e = first.next) != null ) { //Determine whether the current is a red-black tree if (first instanceof TreeNode) //Search the red-black tree according to the getTreeNode method return ((TreeNode<K,V>) first).getTreeNode(hash, key); do { //traverse the linked list to find if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null ); } } return null ; } Copy code


1. Try to use bit operators to perform operations, which is more efficient

2. There are two conditions for converting a linked list to a red-black tree . One is that the length of the linked list is greater than or equal to 8, and the second condition is that the current bucket size is greater than 64. Only when these two conditions are met at the same time, will the linked list conversion red-black tree be carried out

3. The capacity is doubled every time the capacity is expanded, so the node is either still in the original position after doubling, or in the original position + the original bucket size. Whether the location changes is related to the node hash value

4. When expanding the capacity, the original bucket will be set to null to facilitate GC

5. The expansion operation is not safe in a concurrent environment, so use ConcurrentHashMap in a concurrent environment

6. LoadFactor and initialCapacity can be specified according to actual requirements to improve certain efficiency

7. HashMap allows null values, and the index position where the null value is stored is stored at index 0