@[toc]
Introduction
 HashMap is mainly used to store keyvalue pairs. It is implemented based on the Map interface of the hash table and is one of the commonly used Java collections.
 Before JDK1.8, HashMap is composed of an array + a linked list. The array is the main body of the HashMap, and the linked list exists mainly to resolve hash conflicts (the "zipper method" resolves conflicts).
 After JDK1.8, the composition of HashMap has more redblack trees. After the following two conditions are met, the linked list will be converted to redblack trees to speed up the search.
 HashMap is not thread safe.
 When HashMap saves data, it determines the storage location by calculating the hash value of the key.
public class the HashMap < K , V > the extends AbstractMap < K , V >
the implements the Map < K , V >, the Cloneable , the Serializable
copy the code
 Implemented the Cloneable interface, that is, clone
 AbstractMap provides Map implementation interface
 Implement the java.io.Serializable interface, which means that HashMap supports serialization and can be transmitted through serialization.
When reading the source code, there is always a problem that I am confused is that HashMap has inherited AbstractMap and the AbstractMap class implements the Map interface, so why does HashMap implement the Map interface? This structure is also used in LinkedList in ArrayList.
According to Josh Bloch, the creator of the java collection framework, this way of writing is a mistake. In the Java collection framework, there are many ways to write like this. When he first wrote the Java collection framework, he thought that writing like this might be valuable in some places, until he realized that he was wrong. Obviously, the maintainer of the JDK did not think that this small mistake was worth modifying, so it just survived.
Internal structure

Before JDK 1.8 HashMap bottom arrays and linked lists used together i.e. chain hashing

HashMap obtains the hash value through the hashCode of the key after being processed by the perturbation function, and then uses (n1) & hash to determine the location where the current element is stored (the length of the n index group here), if there is an element at the current position, the element is judged Whether the hash value and key of the element to be stored are the same, if they are the same, directly overwrite, if they are not the same, use the zipper method to resolve the conflict

The socalled disturbance function refers to the hash method of HashMap. Using the hash method, that is, the perturbation function, is to prevent some poorly implemented hashcode() methods. After using the perturbation function, collisions can be reduced.
Disturbance function

The hash method of JDK 1.8 is more simplified than the JDK 1.7 hash method, but the principle remains the same. The performance of the hash method of JDK 1.7 will be a little bit worse, because after all, it has been disturbed 4 times

When storing elements in HashMap, there is such a piece of code to process the hash value. This is the hash value perturbation function of Java 8, which is used to optimize the hashing effect:
Hash source code of HashMap of JDK 1.8
static final int hash (Object key) {
int h;
//^: bitwise exclusive OR
//>>>: unsigned right shift, ignoring the sign bit, and all spaces are filled with 0
return (key == null )? 0 : (h = key.hashCode()) ^ (h >>> 16 );
}
Copy code
** Hash source code of HashMap of JDK1.7**
static int hash ( int h) {
//This function ensures that hashCodes that differ only by
//constant multiples at each bit position have a bounded
//number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20 ) ^ (h >>> 12 );
return h ^ (h >>> 7 ) ^ (h >>> 4 );
}
Copy code
Why use perturbation function

Theoretically speaking, the hashCode of a string is an int type value, which can be directly used as an array subscript without collision. But the value range of this hashCode is [2147483648, 2147483647], which has a length of nearly 4 billion. No one can initialize the array so large, and the memory can't fit.

The size of the Map we initialized by default is 16 lengths DEFAULT_INITIAL_CAPACITY = 1 << 4, so the obtained Hash value cannot be used directly as a subscript. It needs to be used with the length of the array to get a subscript value.

The hashMap source code here not only directly obtains the hash value, but also performs a disturbance calculation, (h = key.hashCode()) ^ (h >>> 16). Shift the hash value to the right by 16 bits, which is exactly half of its length, and then perform an XOR operation with the original hash value, thus mixing the high and low bits of the original hash value, increasing
Randomness
Demo:
public class test {
public static void main (String[] args) {
for ( int i = 0 ; i < 10 ; i++) {
int hash = hash( new Random().nextInt());
System.out.println(hash);
}
}
static int hash (Object key) {
int h;
return (key == null )? 0 : (h = key.hashCode()) ^ (h >>> 16 );
}
}
Copy code
 The purpose of using the perturbation function is to increase the randomness, make the data elements more evenly hashed, and reduce collisions.
 Makes the data evenly distributed, that is, the effect of hashing is better, reducing hash collisions, and making data storage and acquisition more efficient.
Zipper method
The socalled "zipper method" is to combine a linked list with an array. In other words, create a linked list array, and each cell in the array is a linked list. If you encounter a hash conflict, add the conflicted value to the linked list.

After JDK1.8, there have been major changes in resolving hash conflicts

When the linked list length threshold (default is 8), the treeifyBin() method will be called first. This method will determine whether to convert to a redblack tree based on the HashMap array. Only when the length of the array is greater than or equal to 64, the redblack tree installation operation will be executed to reduce the search time, otherwise the expansion mechanism will be executed
public class HashMap < K , V > extends AbstractMap < K , V >
implements Map < K , V >, Cloneable , Serializable {
private static final long serialVersionUID = 362498820763181265L ;
/**
* The default initial value is 16
*/
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4 ; //aka 16
/**
* Maximum capacity
*/
static final int MAXIMUM_CAPACITY = 1 << 30 ;
/**
* Expansion factor
*/
static final float DEFAULT_LOAD_FACTOR = 0.75f ;
/**
* When the number of nodes on the bucket is greater than this value, it will be converted to a redblack tree
*/
static final int TREEIFY_THRESHOLD = 8 ;
/**
* When the number of nodes on the bucket is less than this value, the tree will switch to the linked list
*/
static final int UNTREEIFY_THRESHOLD = 6 ;
/**
*//The structure in the bucket is converted to the minimum size of the table corresponding to the redblack tree
*/
static final int MIN_TREEIFY_CAPACITY = 64 ;
/**
* Array of storage elements
*/
transient Node<K,V>[] table;
/**
* Store the set of specific elements
*/
transient Set<Map.Entry<K,V>> entrySet;
/**
* Store the number of elements
*/
transient int size;
/**
* Each expansion and change of the counter of the map structure
*/
transient int modCount;
/**
* Critical value When the actual size (capacity * fill factor) exceeds the critical value, the capacity will be expanded
*/
int threshold;
/**
* Load factor
*/
final float loadFactor;
copy the code
loadFactor load factor

The given default capacity is 16, and the load factor is 0.75. Map is constantly storing data in it during use. When the number reaches 16 * 0.75 = 12, the current capacity of 16 needs to be expanded. The expansion process involves operations such as rehashing and copying data, so it consumes a lot of performance.

In HashMap, the load factor determines the amount of data to be expanded afterwards. Here I want to mention the HashMap example made above. We prepared 7 elements, but at the end there are 3 positions left, and 2 positions store 2 elements. So maybe even if your data is larger than the capacity of the array, you may not be able to just fill the array, but there are a lot of collisions in some small label positions, and you can only store them in a linked list at the same position. Lost the performance of the Map array. Therefore, it is necessary to choose a reasonable size for expansion. The default value of 0.75 means that when the threshold capacity accounts for 3/4 of the capacity, expand the capacity quickly to reduce Hash collisions.
threshold
 threshold = capacity * loadFactor, when Size>=threshold, then the expansion of the array must be considered, that is to say, this means a standard to measure whether the array needs to be expanded.
Node node class source code:
/**
* Node is a singly linked list, which implements the Map.Entry interface
*/
static class Node < K , V > implements Map . Entry < K , V > {
final int hash; //Hash value, store elements in hashMap for comparison with other element hash values
final K key; //Key
V value; //Value
Node<K,V> next; //Next node
Node( int hash, K key, V value, Node<K,V> next) {
this .hash = hash;
this .key = key;
this .value = value;
this .next = next;
}
public final K getKey () { return key;}
public final V getValue () { return value;}
public final String toString () { return key + "=" + value;}
//Rewrite the hashcode method
public final int hashCode ( ) {
return Objects.hashCode(key) ^ Objects.hashCode(value);
}
public final V setValue (V newValue) {
V oldValue = value;
value = newValue;
return oldValue;
}
//Rewrite equals() method
public final boolean equals (Object o) {
if (o == this )
return true ;
if (o instanceof Map.Entry) {
Map.Entry<?,?> e = (Map.Entry<?,?>)o;
if (Objects.equals(key, e.getKey()) &&
Objects.equals(value, e.getValue()))
return true ;
}
return false ;
}
}
Copy code
Node node class source code:
static final class TreeNode < K , V > extends LinkedHashMap . Entry < K , V > {
TreeNode<K,V> parent; //Parent
TreeNode<K,V> left; //Left
TreeNode<K,V> right; //Right
TreeNode<K,V> prev;
boolean red; //Determine the color
TreeNode( int hash, K key, V val, Node<K,V> next) {
super (hash, key, val, next);
}
//Return to the root node
final TreeNode<K,V> root () {
for (TreeNode<K,V> r = this , p;;) {
if ((p = r.parent) == null )
return r;
r = p;
}
Copy code
Source code analysis
1. Construction method
4 structures:
//The default construction of
public HashMap () {
this .loadFactor = DEFAULT_LOAD_FACTOR; //all other fields defaulted
}
//Specify the size of the capacity
public HashMap ( int initialCapacity) {
this (initialCapacity, DEFAULT_LOAD_FACTOR);
}
//Include another "Map"
public HashMap (Map<? extends K,? Extends V> m) Extends {
this .loadFactor = DEFAULT_LOAD_FACTOR;
putMapEntries(m, false );
}
//Specify "capacity" and "load factor"
public HashMap ( int initialCapacity, float loadFactor) {
if (initialCapacity < 0 )
throw new IllegalArgumentException( "Illegal initial capacity: " +
initialCapacity);
if (initialCapacity> MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0  Float.isNaN(loadFactor))
throw new IllegalArgumentException( "Illegal load factor: " +
loadFactor);
this .loadFactor = loadFactor;
this .threshold = tableSizeFor(initialCapacity);
}
Copy code
putMapEntries method:
final void putMapEntries (Map<? extends K,? extends V> m, boolean evict) {
int s = m.size();
if (s> 0 ) {
//Determine whether the table has been initialized
if (table == null ) {//presize
//not initialized, s is the actual number of elements in m
float ft = (( float )s/loadFactor) + 1.0F ;
int t = ((ft <( float )MAXIMUM_CAPACITY)?
( int )ft: MAXIMUM_CAPACITY);
//If t is greater than the threshold, initialize
if (t> threshold)
threshold = tableSizeFor(t);
}
//table has been initialized, and the number of m elements is greater than the threshold, perform expansion processing
else if (s> threshold)
resize();
//Add all elements to the HashMap
for (Map.Entry<? extends K,? Extends V> e: m.entrySet()) {
K key = e.getKey();
V value = e.getValue();
putVal(hash(key), key, value, false , evict);
}
}
}
Copy code
2. put method
HashMap only provides put for adding elements, the putVal method just calls a method for the put method, and it is not provided to users.
public V put (K key, V value) {
return putVal(hash(key), key, value, false , true );
}
Copy code
The putVal method adds element analysis:
 If there is no element located at the array position, insert it directly
 If there is an element in the positioned array position, it will be compared with the inserted key. If the key is the same, it will be directly overwritten. If the key is different, it will judge whether p is a tree node. If it is, call e = ((TreeNode<K,V> )p).putTreeVal(this, tab, hash, key, value) will add the element, if not, traverse the list and insert (insert the end of the list)
final V putVal ( int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
//If the table is not initialized or has a length of 0, expand the capacity
if ((tab = table) == null  (n = tab .length) == 0 )
//Expansion operation
n = (tab = resize()).length;
//Calculate the subscript according to the hash value, determine which bucket the element is stored in, the bucket is empty, and the newly generated node is placed in the bucket (this node is placed in the array),
if ((p = tab[i = (n 1 ) & hash]) == null )
tab[i] = newNode(hash, key, value, null );
//Otherwise there are elements of the same species
else {
Node<K,V> e; K k;
//Compare whether the hash of the first element in the bucket is equal, and the key is equal, and the key exists directly overwrites the value
if (p.hash == hash &&
((k = p.key) == key  (key != null && key.equals(k))))
//assign the element to e and use e to identify
e = p;
//Determine whether the column is a redblack tree with unequal hash values, that is, the key does not want to wait; it is a redblack tree node
else if (p instanceof TreeNode)
//Put it into the tree
e = ((TreeNode<K,V>) p).putTreeVal( this , tab, hash, key, value);
//is the linked list node
else {
//insert at the end of the linked list
for ( int binCount = 0 ;; ++binCount) {
//reach the end of the linked list
if ((e = p.next) == null ) {
//Insert a new node
p.next = newNode(hash, key, value, null );
//The number of nodes reaches the threshold of 8 Execute the treeifyBin method. This method determines whether to convert to a redblack tree according to the Hashmap array. Only when the array is greater than or equal to 64, will it be converted to a redblack tree to reduce time search, otherwise it is to the array
if (binCount >= TREEIFY_THRESHOLD 1 ) //1 for 1st
treeifyBin(tab, hash);
//Out of the loop
break ;
}
//Determine whether the key value of the node in the linked list is equal to the key value of the inserted element
if (e.hash == hash &&
((k = e.key) == key  (key != null && key.equals(k))))
//equal,
break out of the loop ;
p = e;
}
}
//Indicates that the node with the key value, hash value and the inserted element is found in the bucket
if (e != null ) { //existing mapping for key
//record the value of e
V oldValue = e.value;
if (!onlyIfAbsent  oldValue == null )
//The new value replaces the old value
e.value = value;
//Callback after access
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
//Expand the capacity
if the maximum capacity is exceeded (++size> threshold)
resize();
afterNodeInsertion(evict);
return null ;
}
Copy code
summary:

When calling the put method, use the disturbance function (h = k.hashCode()) ^ (h >>> 16) to get the final hash value,

Calculate the index of the bucket array tab[i = (n1) & hash]) according to key.hash to get the position of the key
2.1 If there is no data at this location, use the data to generate a new node to save the new data, and return null 2.2 If the data at this location is a redblack tree, then perform the corresponding insert/update operation
2.3 If the data at this position is a linked list, loop, if the next element of the linked list is null, use the tail interpolation method to add new nodes to save the new data, if it is greater than or equal to 8, expand (treeifyBin(tab, hash); The method determines whether to expand the array or Convert to a redblack tree), judge whether the key in the linked list is equal to the key of the inserted element. If the linked list already has this node, then find the node and update the new data, and return the old data.
Why use the tail plug method?
Regarding the insertion of the HashMap linked list, before java8, it was the header insertion method.
 Head interpolation: That is to say, the new value will replace the original value, and the original value will be pushed into the linked list.
 There will be an infinite loop in multithreading
Expansion mechanism of hashmap
 Create a new Node empty array, the length is twice the original array.
 Traverse the original Node array and reHash all Nodes to the new array.
Why recalculate Hash
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null  (n = tab.length) == 0 )
n = (tab = resize()).length;
if ((p = tab[i = (n 1 ) & hash]) == null )
tab[i] = newNode(hash, key, value, null );
Copy code
When calculating the subscript of the Node array, it is used (length 1). The original length (Length) is 8 and the value calculated by your bit is 2. The new length is 16 and the value calculated by your bit is obviously different. All the previous data is The position obtained by the hash value needs to be changed.
The head insertion method of singly linked list, the new element at the same position will always be placed at the head of the linked list. Elements on the same Node chain in the old array may be placed in a different array after recalculating the index position Location.
Once the adjustment of several threads is completed, a circular linked list may appear. If the value is taken at this time, an infinite loop will appear...
 Using the head insert will change the order of the linked list, but if you use the tail insert, the original order of the elements of the linked list will be maintained during expansion, and there will be no problem of the linked list forming a ring.
3. get method
public V get (Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
Copy code
final Node<K,V> getNode ( int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
//Determine whether it is null
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n 1 ) & hash]) != null ) {
//If the first (array element) in the bucket is equal, return the first
if (first.hash == hash && //always check first node
((k = first.key) == key  (key != null && key.equals(k))))
return first;
////There is more than one node in the bucket
if ((e = first.next ) != null ) {
//For the redblack tree node
if (first instanceof TreeNode)
//Find
return in the redblack tree ((TreeNode<K,V>)first).getTreeNode(hash, key);
//Otherwise, look up
do in the linked list{
if (e.hash == hash &&
((k = e.key) == key  (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null );
}
}
return null ;
}
Copy code
4. Resize method
 Expansion will be accompanied by a redistribution of hash and all elements in the hash table will be traversed, which is very timeconsuming. When writing a program, try to avoid resizing.
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null )? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0 ;
//If the maximum value is exceeded, no longer expand, just collide at will
if (oldCap> 0 ) {
if ( oldCap >= MAXIMUM_CAPACITY) {
//The original array length is greater than the maximum capacity (1073741824), then the threshold is set to Integer.MAX_VALUE=2147483647
threshold = Integer.MAX_VALUE;
return oldTab;
}
//If the maximum value is not exceeded, the capacity will be expanded to twice the original.
else if ((newCap = oldCap << 1 ) <MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1 ; //double threshold
}
else if (oldThr> 0 ) //initial capacity was placed in threshold
newCap = oldThr;
else { //zero initial threshold signifies using defaults
//default initialization
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = ( int )(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
//Calculate the new resize upper limit
if (newThr == 0 ) {
//If the new capacity == 0
//loadFactor hash load factor defaults to 0.75, which can be passed in during initialization, 16*0.75=12 can put 12 Keyvalue pair
float ft = ( float )newCap * loadFactor;
newThr = (newCap <MAXIMUM_CAPACITY && ft <( float )MAXIMUM_CAPACITY?
( int )ft: Integer.MAX_VALUE);
}
//Set the critical value to the new critical value
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
//Expand
Node<K,V>[] newTab = (Node<K,V>[]) new Node[newCap];
table = newTab;
//If the original table has data, copy the data to the new table
if (oldTab != null ) {
//Move each bucket to the new bucket
for ( int j = 0 ; j <oldCap; ++j) {
Node<K,V> e;
//Get the jth element of the array
if ((e = oldTab[j]) != null ) {
oldTab[j] = null ; //Set the old hash bucket array to be empty at node j, which is convenient for gc
//If there is only one linked list, perform direct assignment
if (e.next == null )
newTab[e.hash & (newCap 1 )] = e;
//Whether it is a redblack tree expansion
else if (e instanceof TreeNode)
//If it is a redblack tree node split method expansion
((TreeNode<K,V>) e).split( this , newTab, j, oldCap);
else { //preserve order
//Linked list optimization heavy hash code block
Node<K,V> loHead = null , loTail = null ;
Node<K,V> hiHead = null , hiTail = null ;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0 ) {
if (loTail == null )
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null )
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null );
if (loTail != null ) {
loTail.next = null ;
newTab[j] = loHead;
}
if (hiTail != null ) {
hiTail.next = null ;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
Copy code
If Node is TreeNode (redblack tree node) during expansion, the source code is as follows: Author's original link
//The threshold for the redblack tree to
return to the linked list static final int UNTREEIFY_THRESHOLD = 6 ;
/** This method will be called when HashMap is expanded: ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
* @param map represents the HashMap to be expanded
* @param tab represents the newly created array, used to store the data migrated from the old array
* @param index represents the index of the old array
* @param bit represents the length of the old array, which needs to be used together to do bitwise AND operation
*/
final void split (HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
//Make an assignment, because here is ((TreeNode<K,V>)e ) This object calls the split() method, so this refers to the (TreeNode<K,V>)e object, so the type corresponds to the assignment
TreeNode<K,V> b = this ;
//Set the low first node and low end node
TreeNode <K,V> loHead = null , loTail = null ;
//Set the high first node and high end node
TreeNode<K,V> hiHead = null , hiTail = null ;
//Define two variables lc and hc, the initial value 0, the latter will be used for comparison, their size determines whether the redblack tree should be converted back to the linked list
int lc = 0 , hc =0 ;
//This for loop is to traverse the entire redblack tree from the e node. If you don t understand the loop assignment, you can refer to this imitation blog @1
for (TreeNode<K,V> e = b, next; e != null ; e = next) {
//Take the next node of e and assign it to the next traversal
next = (TreeNode<K,V>)e.next;
//After taking the next node of e, assign it to empty to facilitate GC recycling
e.next = null ;
//The following operation is to do a bitwise AND operation, and pull out two linked lists according to the results, the specific operation You can refer to this blog @2
if ((e.hash & bit) == 0 ) {
if ((e.prev = loTail) == null )
loHead = e;
else
loTail.next = e;
loTail = e;
//Make a count and see how many elements will be drawn under the low list
++lc;
}
else {
if ((e.prev = hiTail) == null )
hiHead = e;
else
hiTail.next = e;
hiTail = e;
//Make a count and see how many elements will be under the highlevel linked list.
++hc;
}
}
//If the first node of the lower linked list is not null, it means that there is this linked list
if (loHead != null ) {
//If the elements under the linked list are less than or equal to 6
if (lc <= UNTREEIFY_THRESHOLD)
//Then switch from the redblack tree Linked list, lowlevel linked list, migrated to the new array, the subscript remains unchanged, or the original array to the subscript
tab[index] = loHead.untreeify(map);
else {
//Loworder linked list, the subscript remains unchanged in the new array, or it is equal to the original array to the subscript, pull the entire loworder linked list to this subscript, and make an assignment
tab[index] = loHead;
//If the highlevel head node is not empty, it means that the original redblack tree has been split into two linked lists
if (hiHead != null )
//Then a new redblack tree needs to be constructed
loHead.treeify(tab);
}
}
//If the first node of the highlevel linked list is not null, it means that there is this linked list
if (hiHead != null ) {
//If the elements under the linked list are less than or equal to 6
if (hc <= UNTREEIFY_THRESHOLD)
//Then switch from the redblack tree Linked list, highlevel linked list, subscript migrated to the new array = [old array + old array length]
tab[index + bit] = hiHead.untreeify(map);
else {
//Highlevel linked list, the subscript migrated to the new array = [old array + old array length], pull the entire highlevel linked list under this new subscript, and do assignment
tab[index + bit] = hiHead;
If the lower first node is not empty, it means that the original redblack tree has been split into two linked lists
if (loHead != null )
//Then you need to build a new redblack tree
hiHead.treeify(tab);
}
}
}
Copy code
Personal blog address: blog.yanxiaolong.cn/