Source code analysis of HashMap collection in JDK1.8

Source code analysis of HashMap collection in JDK1.8

@[toc]

Introduction

  1. HashMap is mainly used to store key-value pairs. It is implemented based on the Map interface of the hash table and is one of the commonly used Java collections.
  2. Before JDK1.8, HashMap is composed of an array + a linked list. The array is the main body of the HashMap, and the linked list exists mainly to resolve hash conflicts (the "zipper method" resolves conflicts).
  3. After JDK1.8, the composition of HashMap has more red-black trees. After the following two conditions are met, the linked list will be converted to red-black trees to speed up the search.
  4. HashMap is not thread safe.
  5. When HashMap saves data, it determines the storage location by calculating the hash value of the key.

public class the HashMap < K , V > the extends AbstractMap < K , V > the implements the Map < K , V >, the Cloneable , the Serializable copy the code
  • Implemented the Cloneable interface, that is, clone
  • AbstractMap provides Map implementation interface
  • Implement the java.io.Serializable interface, which means that HashMap supports serialization and can be transmitted through serialization.

When reading the source code, there is always a problem that I am confused is that HashMap has inherited AbstractMap and the AbstractMap class implements the Map interface, so why does HashMap implement the Map interface? This structure is also used in LinkedList in ArrayList.

According to Josh Bloch, the creator of the java collection framework, this way of writing is a mistake. In the Java collection framework, there are many ways to write like this. When he first wrote the Java collection framework, he thought that writing like this might be valuable in some places, until he realized that he was wrong. Obviously, the maintainer of the JDK did not think that this small mistake was worth modifying, so it just survived.

Internal structure


  • Before JDK 1.8 HashMap bottom arrays and linked lists used together i.e. chain hashing

  • HashMap obtains the hash value through the hashCode of the key after being processed by the perturbation function, and then uses (n-1) & hash to determine the location where the current element is stored (the length of the n index group here), if there is an element at the current position, the element is judged Whether the hash value and key of the element to be stored are the same, if they are the same, directly overwrite, if they are not the same, use the zipper method to resolve the conflict

  • The so-called disturbance function refers to the hash method of HashMap. Using the hash method, that is, the perturbation function, is to prevent some poorly implemented hashcode() methods. After using the perturbation function, collisions can be reduced.

Disturbance function

  • The hash method of JDK 1.8 is more simplified than the JDK 1.7 hash method, but the principle remains the same. The performance of the hash method of JDK 1.7 will be a little bit worse, because after all, it has been disturbed 4 times

  • When storing elements in HashMap, there is such a piece of code to process the hash value. This is the hash value perturbation function of Java 8, which is used to optimize the hashing effect:

Hash source code of HashMap of JDK 1.8

static final int hash (Object key) { int h; //^: bitwise exclusive OR //>>>: unsigned right shift, ignoring the sign bit, and all spaces are filled with 0 return (key == null )? 0 : (h = key.hashCode()) ^ (h >>> 16 ); } Copy code

** Hash source code of HashMap of JDK1.7**

static int hash ( int h) { //This function ensures that hashCodes that differ only by //constant multiples at each bit position have a bounded //number of collisions (approximately 8 at default load factor). h ^= (h >>> 20 ) ^ (h >>> 12 ); return h ^ (h >>> 7 ) ^ (h >>> 4 ); } Copy code

Why use perturbation function

  1. Theoretically speaking, the hashCode of a string is an int type value, which can be directly used as an array subscript without collision. But the value range of this hashCode is [-2147483648, 2147483647], which has a length of nearly 4 billion. No one can initialize the array so large, and the memory can't fit.

  2. The size of the Map we initialized by default is 16 lengths DEFAULT_INITIAL_CAPACITY = 1 << 4, so the obtained Hash value cannot be used directly as a subscript. It needs to be used with the length of the array to get a subscript value.

  3. The hashMap source code here not only directly obtains the hash value, but also performs a disturbance calculation, (h = key.hashCode()) ^ (h >>> 16). Shift the hash value to the right by 16 bits, which is exactly half of its length, and then perform an XOR operation with the original hash value, thus mixing the high and low bits of the original hash value, increasing

Randomness

Demo:

public class test { public static void main (String[] args) { for ( int i = 0 ; i < 10 ; i++) { int hash = hash( new Random().nextInt()); System.out.println(hash); } } static int hash (Object key) { int h; return (key == null )? 0 : (h = key.hashCode()) ^ (h >>> 16 ); } } Copy code

  • The purpose of using the perturbation function is to increase the randomness, make the data elements more evenly hashed, and reduce collisions.
  • Makes the data evenly distributed, that is, the effect of hashing is better, reducing hash collisions, and making data storage and acquisition more efficient.

Zipper method

The so-called "zipper method" is to combine a linked list with an array. In other words, create a linked list array, and each cell in the array is a linked list. If you encounter a hash conflict, add the conflicted value to the linked list.

  • After JDK1.8, there have been major changes in resolving hash conflicts

  • When the linked list length threshold (default is 8), the treeifyBin() method will be called first. This method will determine whether to convert to a red-black tree based on the HashMap array. Only when the length of the array is greater than or equal to 64, the red-black tree installation operation will be executed to reduce the search time, otherwise the expansion mechanism will be executed

public class HashMap < K , V > extends AbstractMap < K , V > implements Map < K , V >, Cloneable , Serializable { private static final long serialVersionUID = 362498820763181265L ; /** * The default initial value is 16 */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4 ; //aka 16 /** * Maximum capacity */ static final int MAXIMUM_CAPACITY = 1 << 30 ; /** * Expansion factor */ static final float DEFAULT_LOAD_FACTOR = 0.75f ; /** * When the number of nodes on the bucket is greater than this value, it will be converted to a red-black tree */ static final int TREEIFY_THRESHOLD = 8 ; /** * When the number of nodes on the bucket is less than this value, the tree will switch to the linked list */ static final int UNTREEIFY_THRESHOLD = 6 ; /** *//The structure in the bucket is converted to the minimum size of the table corresponding to the red-black tree */ static final int MIN_TREEIFY_CAPACITY = 64 ; /** * Array of storage elements */ transient Node<K,V>[] table; /** * Store the set of specific elements */ transient Set<Map.Entry<K,V>> entrySet; /** * Store the number of elements */ transient int size; /** * Each expansion and change of the counter of the map structure */ transient int modCount; /** * Critical value When the actual size (capacity * fill factor) exceeds the critical value, the capacity will be expanded */ int threshold; /** * Load factor */ final float loadFactor; copy the code

loadFactor load factor

  • The given default capacity is 16, and the load factor is 0.75. Map is constantly storing data in it during use. When the number reaches 16 * 0.75 = 12, the current capacity of 16 needs to be expanded. The expansion process involves operations such as rehashing and copying data, so it consumes a lot of performance.

  • In HashMap, the load factor determines the amount of data to be expanded afterwards. Here I want to mention the HashMap example made above. We prepared 7 elements, but at the end there are 3 positions left, and 2 positions store 2 elements. So maybe even if your data is larger than the capacity of the array, you may not be able to just fill the array, but there are a lot of collisions in some small label positions, and you can only store them in a linked list at the same position. Lost the performance of the Map array. Therefore, it is necessary to choose a reasonable size for expansion. The default value of 0.75 means that when the threshold capacity accounts for 3/4 of the capacity, expand the capacity quickly to reduce Hash collisions.

threshold

  • threshold = capacity * loadFactor, when Size>=threshold, then the expansion of the array must be considered, that is to say, this means a standard to measure whether the array needs to be expanded.

Node node class source code:

/** * Node is a singly linked list, which implements the Map.Entry interface */ static class Node < K , V > implements Map . Entry < K , V > { final int hash; //Hash value, store elements in hashMap for comparison with other element hash values final K key; //Key V value; //Value Node<K,V> next; //Next node Node( int hash, K key, V value, Node<K,V> next) { this .hash = hash; this .key = key; this .value = value; this .next = next; } public final K getKey () { return key;} public final V getValue () { return value;} public final String toString () { return key + "=" + value;} //Rewrite the hashcode method public final int hashCode ( ) { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue (V newValue) { V oldValue = value; value = newValue; return oldValue; } //Rewrite equals() method public final boolean equals (Object o) { if (o == this ) return true ; if (o instanceof Map.Entry) { Map.Entry<?,?> e = (Map.Entry<?,?>)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true ; } return false ; } } Copy code

Node node class source code:

static final class TreeNode < K , V > extends LinkedHashMap . Entry < K , V > { TreeNode<K,V> parent; //Parent TreeNode<K,V> left; //Left TreeNode<K,V> right; //Right TreeNode<K,V> prev; boolean red; //Determine the color TreeNode( int hash, K key, V val, Node<K,V> next) { super (hash, key, val, next); } //Return to the root node final TreeNode<K,V> root () { for (TreeNode<K,V> r = this , p;;) { if ((p = r.parent) == null ) return r; r = p; } Copy code

Source code analysis


1. Construction method

4 structures:

//The default construction of public HashMap () { this .loadFactor = DEFAULT_LOAD_FACTOR; //all other fields defaulted } //Specify the size of the capacity public HashMap ( int initialCapacity) { this (initialCapacity, DEFAULT_LOAD_FACTOR); } //Include another "Map" public HashMap (Map<? extends K,? Extends V> m) Extends { this .loadFactor = DEFAULT_LOAD_FACTOR; putMapEntries(m, false ); } //Specify "capacity" and "load factor" public HashMap ( int initialCapacity, float loadFactor) { if (initialCapacity < 0 ) throw new IllegalArgumentException( "Illegal initial capacity: " + initialCapacity); if (initialCapacity> MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException( "Illegal load factor: " + loadFactor); this .loadFactor = loadFactor; this .threshold = tableSizeFor(initialCapacity); } Copy code

putMapEntries method:

final void putMapEntries (Map<? extends K,? extends V> m, boolean evict) { int s = m.size(); if (s> 0 ) { //Determine whether the table has been initialized if (table == null ) {//pre-size //not initialized, s is the actual number of elements in m float ft = (( float )s/loadFactor) + 1.0F ; int t = ((ft <( float )MAXIMUM_CAPACITY)? ( int )ft: MAXIMUM_CAPACITY); //If t is greater than the threshold, initialize if (t> threshold) threshold = tableSizeFor(t); } //table has been initialized, and the number of m elements is greater than the threshold, perform expansion processing else if (s> threshold) resize(); //Add all elements to the HashMap for (Map.Entry<? extends K,? Extends V> e: m.entrySet()) { K key = e.getKey(); V value = e.getValue(); putVal(hash(key), key, value, false , evict); } } } Copy code

2. put method

HashMap only provides put for adding elements, the putVal method just calls a method for the put method, and it is not provided to users.

public V put (K key, V value) { return putVal(hash(key), key, value, false , true ); } Copy code

The putVal method adds element analysis:

  1. If there is no element located at the array position, insert it directly
  2. If there is an element in the positioned array position, it will be compared with the inserted key. If the key is the same, it will be directly overwritten. If the key is different, it will judge whether p is a tree node. If it is, call e = ((TreeNode<K,V> )p).putTreeVal(this, tab, hash, key, value) will add the element, if not, traverse the list and insert (insert the end of the list)

final V putVal ( int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; //If the table is not initialized or has a length of 0, expand the capacity if ((tab = table) == null || (n = tab .length) == 0 ) //Expansion operation n = (tab = resize()).length; //Calculate the subscript according to the hash value, determine which bucket the element is stored in, the bucket is empty, and the newly generated node is placed in the bucket (this node is placed in the array), if ((p = tab[i = (n- 1 ) & hash]) == null ) tab[i] = newNode(hash, key, value, null ); //Otherwise there are elements of the same species else { Node<K,V> e; K k; //Compare whether the hash of the first element in the bucket is equal, and the key is equal, and the key exists directly overwrites the value if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) //assign the element to e and use e to identify e = p; //Determine whether the column is a red-black tree with unequal hash values, that is, the key does not want to wait; it is a red-black tree node else if (p instanceof TreeNode) //Put it into the tree e = ((TreeNode<K,V>) p).putTreeVal( this , tab, hash, key, value); //is the linked list node else { //insert at the end of the linked list for ( int binCount = 0 ;; ++binCount) { //reach the end of the linked list if ((e = p.next) == null ) { //Insert a new node p.next = newNode(hash, key, value, null ); //The number of nodes reaches the threshold of 8 Execute the treeifyBin method. This method determines whether to convert to a red-black tree according to the Hashmap array. Only when the array is greater than or equal to 64, will it be converted to a red-black tree to reduce time search, otherwise it is to the array if (binCount >= TREEIFY_THRESHOLD- 1 ) //-1 for 1st treeifyBin(tab, hash); //Out of the loop break ; } //Determine whether the key value of the node in the linked list is equal to the key value of the inserted element if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) //equal, break out of the loop ; p = e; } } //Indicates that the node with the key value, hash value and the inserted element is found in the bucket if (e != null ) { //existing mapping for key //record the value of e V oldValue = e.value; if (!onlyIfAbsent || oldValue == null ) //The new value replaces the old value e.value = value; //Callback after access afterNodeAccess(e); return oldValue; } } ++modCount; //Expand the capacity if the maximum capacity is exceeded (++size> threshold) resize(); afterNodeInsertion(evict); return null ; } Copy code

summary:

  1. When calling the put method, use the disturbance function (h = k.hashCode()) ^ (h >>> 16) to get the final hash value,

  2. Calculate the index of the bucket array tab[i = (n-1) & hash]) according to key.hash to get the position of the key

    2.1 If there is no data at this location, use the data to generate a new node to save the new data, and return null 2.2 If the data at this location is a red-black tree, then perform the corresponding insert/update operation

    2.3 If the data at this position is a linked list, loop, if the next element of the linked list is null, use the tail interpolation method to add new nodes to save the new data, if it is greater than or equal to 8, expand (treeifyBin(tab, hash); The method determines whether to expand the array or Convert to a red-black tree), judge whether the key in the linked list is equal to the key of the inserted element. If the linked list already has this node, then find the node and update the new data, and return the old data.

Why use the tail plug method?


Regarding the insertion of the HashMap linked list, before java8, it was the header insertion method.

  • Head interpolation: That is to say, the new value will replace the original value, and the original value will be pushed into the linked list.
  • There will be an infinite loop in multithreading

Expansion mechanism of hashmap

  • Create a new Node empty array, the length is twice the original array.
  • Traverse the original Node array and re-Hash all Nodes to the new array.

Why recalculate Hash

Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0 ) n = (tab = resize()).length; if ((p = tab[i = (n- 1 ) & hash]) == null ) tab[i] = newNode(hash, key, value, null ); Copy code

When calculating the subscript of the Node array, it is used (length -1). The original length (Length) is 8 and the value calculated by your bit is 2. The new length is 16 and the value calculated by your bit is obviously different. All the previous data is The position obtained by the hash value needs to be changed.

The head insertion method of singly linked list, the new element at the same position will always be placed at the head of the linked list. Elements on the same Node chain in the old array may be placed in a different array after recalculating the index position Location.

Once the adjustment of several threads is completed, a circular linked list may appear. If the value is taken at this time, an infinite loop will appear...


  • Using the head insert will change the order of the linked list, but if you use the tail insert, the original order of the elements of the linked list will be maintained during expansion, and there will be no problem of the linked list forming a ring.

3. get method

public V get (Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } Copy code
final Node<K,V> getNode ( int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; //Determine whether it is null if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n- 1 ) & hash]) != null ) { //If the first (array element) in the bucket is equal, return the first if (first.hash == hash && //always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; ////There is more than one node in the bucket if ((e = first.next ) != null ) { //For the red-black tree node if (first instanceof TreeNode) //Find return in the red-black tree ((TreeNode<K,V>)first).getTreeNode(hash, key); //Otherwise, look up do in the linked list{ if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null ); } } return null ; } Copy code

4. Resize method

  • Expansion will be accompanied by a redistribution of hash and all elements in the hash table will be traversed, which is very time-consuming. When writing a program, try to avoid resizing.
final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null )? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0 ; //If the maximum value is exceeded, no longer expand, just collide at will if (oldCap> 0 ) { if ( oldCap >= MAXIMUM_CAPACITY) { //The original array length is greater than the maximum capacity (1073741824), then the threshold is set to Integer.MAX_VALUE=2147483647 threshold = Integer.MAX_VALUE; return oldTab; } //If the maximum value is not exceeded, the capacity will be expanded to twice the original. else if ((newCap = oldCap << 1 ) <MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1 ; //double threshold } else if (oldThr> 0 ) //initial capacity was placed in threshold newCap = oldThr; else { //zero initial threshold signifies using defaults //default initialization newCap = DEFAULT_INITIAL_CAPACITY; newThr = ( int )(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } //Calculate the new resize upper limit if (newThr == 0 ) { //If the new capacity == 0 //loadFactor hash load factor defaults to 0.75, which can be passed in during initialization, 16*0.75=12 can put 12 Key-value pair float ft = ( float )newCap * loadFactor; newThr = (newCap <MAXIMUM_CAPACITY && ft <( float )MAXIMUM_CAPACITY? ( int )ft: Integer.MAX_VALUE); } //Set the critical value to the new critical value threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) //Expand Node<K,V>[] newTab = (Node<K,V>[]) new Node[newCap]; table = newTab; //If the original table has data, copy the data to the new table if (oldTab != null ) { //Move each bucket to the new bucket for ( int j = 0 ; j <oldCap; ++j) { Node<K,V> e; //Get the jth element of the array if ((e = oldTab[j]) != null ) { oldTab[j] = null ; //Set the old hash bucket array to be empty at node j, which is convenient for gc //If there is only one linked list, perform direct assignment if (e.next == null ) newTab[e.hash & (newCap- 1 )] = e; //Whether it is a red-black tree expansion else if (e instanceof TreeNode) //If it is a red-black tree node split method expansion ((TreeNode<K,V>) e).split( this , newTab, j, oldCap); else { //preserve order //Linked list optimization heavy hash code block Node<K,V> loHead = null , loTail = null ; Node<K,V> hiHead = null , hiTail = null ; Node<K,V> next; do { next = e.next; if ((e.hash & oldCap) == 0 ) { if (loTail == null ) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null ) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null ); if (loTail != null ) { loTail.next = null ; newTab[j] = loHead; } if (hiTail != null ) { hiTail.next = null ; newTab[j + oldCap] = hiHead; } } } } } return newTab; } Copy code

If Node is TreeNode (red-black tree node) during expansion, the source code is as follows: Author's original link

//The threshold for the red-black tree to return to the linked list static final int UNTREEIFY_THRESHOLD = 6 ; /** This method will be called when HashMap is expanded: ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); * @param map represents the HashMap to be expanded * @param tab represents the newly created array, used to store the data migrated from the old array * @param index represents the index of the old array * @param bit represents the length of the old array, which needs to be used together to do bitwise AND operation */ final void split (HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) { //Make an assignment, because here is ((TreeNode<K,V>)e ) This object calls the split() method, so this refers to the (TreeNode<K,V>)e object, so the type corresponds to the assignment TreeNode<K,V> b = this ; //Set the low first node and low end node TreeNode <K,V> loHead = null , loTail = null ; //Set the high first node and high end node TreeNode<K,V> hiHead = null , hiTail = null ; //Define two variables lc and hc, the initial value 0, the latter will be used for comparison, their size determines whether the red-black tree should be converted back to the linked list int lc = 0 , hc =0 ; //This for loop is to traverse the entire red-black tree from the e node. If you don t understand the loop assignment, you can refer to this imitation blog @1 for (TreeNode<K,V> e = b, next; e != null ; e = next) { //Take the next node of e and assign it to the next traversal next = (TreeNode<K,V>)e.next; //After taking the next node of e, assign it to empty to facilitate GC recycling e.next = null ; //The following operation is to do a bitwise AND operation, and pull out two linked lists according to the results, the specific operation You can refer to this blog @2 if ((e.hash & bit) == 0 ) { if ((e.prev = loTail) == null ) loHead = e; else loTail.next = e; loTail = e; //Make a count and see how many elements will be drawn under the low list ++lc; } else { if ((e.prev = hiTail) == null ) hiHead = e; else hiTail.next = e; hiTail = e; //Make a count and see how many elements will be under the high-level linked list. ++hc; } } //If the first node of the lower linked list is not null, it means that there is this linked list if (loHead != null ) { //If the elements under the linked list are less than or equal to 6 if (lc <= UNTREEIFY_THRESHOLD) //Then switch from the red-black tree Linked list, low-level linked list, migrated to the new array, the subscript remains unchanged, or the original array to the subscript tab[index] = loHead.untreeify(map); else { //Low-order linked list, the subscript remains unchanged in the new array, or it is equal to the original array to the subscript, pull the entire low-order linked list to this subscript, and make an assignment tab[index] = loHead; //If the high-level head node is not empty, it means that the original red-black tree has been split into two linked lists if (hiHead != null ) //Then a new red-black tree needs to be constructed loHead.treeify(tab); } } //If the first node of the high-level linked list is not null, it means that there is this linked list if (hiHead != null ) { //If the elements under the linked list are less than or equal to 6 if (hc <= UNTREEIFY_THRESHOLD) //Then switch from the red-black tree Linked list, high-level linked list, subscript migrated to the new array = [old array + old array length] tab[index + bit] = hiHead.untreeify(map); else { //High-level linked list, the subscript migrated to the new array = [old array + old array length], pull the entire high-level linked list under this new subscript, and do assignment tab[index + bit] = hiHead; If the lower first node is not empty, it means that the original red-black tree has been split into two linked lists if (loHead != null ) //Then you need to build a new red-black tree hiHead.treeify(tab); } } } Copy code

Personal blog address: blog.yanxiaolong.cn/