HashTable analysis of JDK collection source code

HashTable analysis of JDK collection source code

Comparison of HashTable and HashMap

For the analysis of HashMap source code, you can refer to: HashMap analysis of JDK collection source code (top) and HashMap analysis of JDK collection source code (bottom) , HashMap bottom red-black tree implementation (implement a simple red-black tree by yourself)

1. The inheritance system of the two is different

HashTable

HashMap

It can be compared from the figure that both are derived from the Map interface, and both implement the Cloneable and Serializable interfaces, and both can be cloned and serialized. But the parent class of HashMap is AbstractMap, and the parent class of HashTable is Dictionary.

The Dictionary class is a deprecated class (see comments in its source code). The parent class is abandoned, and naturally its subclass Hashtable is also used less.

2. HashTable provides more elments() and contains() methods than HashMap

  • elments()
    The method is inherited from the parent class Dictionary.
    elements()
    The method is used to return an enumeration of values in this Hashtable.
  • contains()
    Method to determine whether the Hashtable contains the passed value. Its role and
    containsValue()
    Unanimous. In fact,
    contansValue()
    Just called
    contains()
    method.
//Determine whether there is a value in HashTable, return true if it exists, otherwise return false public synchronized boolean contains (Object value) { if (value == null ) { throw new NullPointerException(); } Entry<?,?> tab[] = table; for ( int i = tab.length; i--> 0 ;) { for (Entry<?,?> e = tab[i]; e != null ; e = e.next) { if (e.value. equals(value)) { return true ; } } } return false ; } public boolean containsValue (Object value) { return contains(value); } Copy code

3. The two have different support for Null key and Null value

Neither Key nor Value in Hashtable can be

NULL
From which we can
put()
From the method:

public synchronized V put (K key, V value) { //value cannot be null, otherwise a null pointer exception will be thrown if (value == null ) { throw new NullPointerException(); } //Makes sure the key is not already in the hashtable. Entry<?,?> tab[] = table; //The key cannot be null, because when the key is null, null calling hashCode() will report a null pointer exception int hash = key.hashCode(); int index = (hash & 0x7FFFFFFF )% tab.length; @SuppressWarnings (" unchecked") Entry<K,V> entry = (Entry<K,V>)tab[index]; for (; entry != null ; entry = entry.next) { if ((entry.hash == hash) && entry.key.equals(key)) { V old = entry.value; entry.value = value; return old; } } addEntry(hash, key, value, index); return null ; } Copy code

In HashMap,

NULL
Can be used as a key, and only one such key can exist. There can be one or more keys corresponding to the value
NULL
. When the get() method returns
NULL
Value, it may be that the key does not exist in the HashMap, or the value corresponding to the key may be
NULL
. Therefore, in the HashMap, the get() method cannot be used to determine whether a key exists in the HashMap. Instead, use
containsKey()
Method to judge
.

4. HashMap thread is not safe, HashTable thread safe

  • Hashtable is thread-safe , and the synchronize keyword is added to each of its methods . In a multi-threaded concurrency environment, you can use Hashtable directly, and you don t need to synchronize its methods yourself.
  • HashMap is not thread-safe , and problems such as deadlock may occur in a multi-threaded concurrent environment. When using HashMap, you must increase the synchronization point yourself.

Although HashMap is not thread-safe, its efficiency is much better than Hashtable. This design is reasonable. In our daily use, most of the time is single-threaded operation. HashMap liberates this part of the operation. When multi-threaded operation is required, thread-safe ConcurrentHashMap can be used. Although ConcurrentHashMap is also thread-safe, its efficiency is many times higher than Hashtable. Because ConcurrentHashMap uses segment locks, the entire data is not locked.

5. The difference between the initial capacity of the two and the capacity of each expansion

  • The default initial size of Hashtable is 11 , after each expansion, the capacity becomes the original
    2n+1
    .
  • The default initial size of HashMap is 16 . After each expansion, the capacity becomes the original
    2
    Times.

When creating, if the initial value of the capacity is given, then the Hashtable will directly use the size you give, and the HashMap will expand it to the power of two. In other words, Hashtable will try to use prime and odd numbers. HashMap always uses the power of 2 as the size of the hash table.

The reason for this difference is that Hashtable and HashMap are designed with different focuses. The focus of Hashtable is that the result of the hash is more uniform, which reduces hash collisions. When the size of the hash table is prime, the result of simple modular hashing will be more uniform. HashMap pays more attention to the computational efficiency of hash. When calculating the modulus, if the modulus is a power of 2, then we can directly use bit operations to get the result, which is much more efficient than doing division. In order to speed up the hash speed, HashMap fixes the size of the hash table to a power of 2. Of course, this introduces the problem of uneven hash distribution, so HashMap has made some changes to the hash algorithm to solve this problem. This has led to different methods of calculating hash values between Hashtable and HashMap.

6. The two methods of calculating the hash value are different

In order to get the position of the element, you first need to calculate a hash value based on the KEY of the element, and then use this hash value to calculate the final position.

Hashtable directly uses the hashCode of the object. HashCode is an int type value calculated by the JDK based on the address or string or number of the object . Then use the division and leave remainder method to get the final position. Hashtable needs to perform a division operation when calculating the position of the element, and the division operation is relatively time-consuming.

In order to improve calculation efficiency, HashMap fixes the size of the hash table to a power of 2, so that when the modulo budget is taken, there is no need to do division, only bit operations. Bit operations are much more efficient than division.

Although the efficiency of HashMap has been improved, hash conflicts have also increased. Because it has a higher probability that the low bits of the hash value are the same.

In order to solve this problem, HashMap recalculates the hash value based on the hashcode, and then performs some operations on the hash value to break up the data. Make the obtained positions more scattered, thereby reducing hash conflicts. Of course, in order to be efficient, HashMap only does some simple bit processing. So as not to offset the efficiency improvement brought by the use of the power of two.

HashTable source code

Properties of HashTable

//Entry[] array type, Entry represents a "zipper" node, each Entry represents a key-value pair, and the "key-value key-value pairs" of the hash table are all private transient Entry stored in the Entry array <?,?>[] table; //The size of the HashTable, note: this size is not the size of the HashTable container, but the number of Entry key-value pairs it contains private transient int count; //The threshold of Hashtable is used to determine whether the capacity of HashTable needs to be adjusted. The value of threshold = "capacity * load factor" private int threshold; //load factor private float loadFactor; //Used to implement the "fail-fast" mechanism (that is, fail fast). The so-called fast failure is in a concurrent collection. When it is performing iterative operations, if other threads make structural modifications to it, the iterator will immediately perceive it and throw ConcurrentModificationException immediately instead of waiting until the iteration is completed. Tell you (you have made a mistake) private transient int modCount = 0 ; copy code

HashTable constructor

public Hashtable ( int initialCapacity, float loadFactor) { //Verify the initial capacity if (initialCapacity < 0 ) throw new IllegalArgumentException( "Illegal Capacity: " + initialCapacity); //Verify the load factor if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException( "Illegal Load: " +loadFactor); if (initialCapacity== 0 ) initialCapacity = 1 ; this .loadFactor = loadFactor; //Initialize the table and obtain a table array with the size of initialCapacity table = new Entry<?,?>[initialCapacity]; //Calculate the threshold value ---> the value of threshold = "Capacity *Load factor" threshold = ( int )Math.min(initialCapacity * loadFactor, MAX_ARRAY_SIZE + 1 ); } public Hashtable ( int initialCapacity) { this (initialCapacity, 0.75f ); } //Empty parameter construction method, the initial capacity is 11 public Hashtable () { this ( 11 , 0.75f ); } //Initialize the HashTable directly according to the incoming Map collection, the capacity is the incoming Map collection capacity size *2 and compare with 11, the larger value public Hashtable (Map<? extends K,? Extends V> t) { this ( Math.max( 2 *t.size(), 11 ), 0.75f ); putAll(t); } private int hash (Object k) { return hashSeed ^ k.hashCode(); } Copy code

Several commonly used member methods of HashTable

1. put method

//Acquire a synchronized lock public synchronized V put (K key, V value) { //If value is null throw an exception if (value == null ) { throw new NullPointerException(); } //Makes sure the key is not already in the hashtable. Entry<?,?> tab[] = table; //Calculate the hash value and index of the key int hash = key.hashCode(); int index = (hash & 0x7FFFFFFF )% tab.length; ("unchecked") Entry<K,V> entry = (Entry<K,V>)tab[index]; //Traverse the linked list corresponding to the index position for (; entry != null ; entry = entry.next) { //If the key already exists, update the data if ((entry.hash == hash) && entry.key.equals( key)) { V old = entry.value; entry.value = value; return old; } } //Add new Entry element addEntry(hash, key, value, index); return null ; } Copy code
step
  • 1. Obtain a synchronized lock.
  • 2.
    put
    Method not allowed
    null
    Value, if found to be
    null
    , Throw an exception directly.
  • 3. Calculate the hash value and index of the key
  • 4. Traverse the linked list at the corresponding position, and if it is found that the same hash and key already exist, update the value and return the old value.
  • 5. If there is no Entry node with the same key, call
    addEntry
    Method to increase nodes.

2. addEntry method

private void addEntry ( int hash, K key, V value, int index) { //Number of revisions++ modCount++; Entry<?,?> tab[] = table; //The current capacity exceeds the threshold and needs to be expanded if (count >= threshold) { //Rehash the table if the threshold is exceeded //Rehash the table if the threshold is exceeded Hope, time consuming rehash(); tab = table; //Get the hash value according to the key hash = key.hashCode(); //Fetch operation (equivalent to the slot of the HashMap positioning bucket) //int index = (hash & 0x7FFFFFFF)% tab.length;? ? ? //(The first 31 1 represents the value, the highest bit (32-bit) of the integer in the computer is the sign bit 0 represents a positive number, 1 represents a negative number) //0x7FFFFFFF: hexadecimal representation of the integer, which is the integer inside Maximum value //0x7FFFFFFF: 0111 1111 1111 1111 1111 1111 1111 1111: All 1 except the sign bit //(hash & 0x7FFFFFFF) will produce a positive integer //(hash & 0x7FFFFFFF)% tab.length will be in the range of the label length Inside //All hashtables using odd numbers will cause fewer hash conflicts, and even numbers will cause more conflicts //h & (length-1); hashmap adopts the method of phase and, all the lengths of the power of 2 will be Less conflicts caused index = (hash & 0x7FFFFFFF )% tab.length; } //Creates the new entry. @SuppressWarnings("unchecked") //Get the Entry linked list in the bucket of the index slot, the head node is e Entry<K,V> e = (Entry<K,V>) tab[index]; //Generate a new node, insert the new node into the head of the linked list, the next node is e tab[index] = new Entry<>(hash, key, value, e); //Number of valid elements++ count++; } Copy code
step
  1. The current capacity exceeds the threshold and needs to be expanded
  2. Generate a new node, insert the new node into the head of the linked list

3. Rehash method (expansion method)

Equivalent to the hashmap

resize()
method

protected void rehash () { //Get the current table array capacity (array length) int oldCapacity = table.length; //oldMap: current table array Entry<?,?>[] oldMap = table; //Expand the capacity to twice the original size + 1 ---> That is: expand the capacity to the original 2n + 1 This can ensure that the capacity is an odd number //Odd number is good for index = (hash & 0x7FFFFFFF)% tab.length Find the bucket position When reducing hash conflicts //This method is similar to HashMap: hash & (tab.length-1), to ensure that the capacity is a power of 2, can reduce hash conflicts int newCapacity = (oldCapacity << 1 ) + 1 ; //Determine whether the maximum capacity is exceeded if (newCapacity-MAX_ARRAY_SIZE> 0 ) { if (oldCapacity == MAX_ARRAY_SIZE) //Keep running with MAX_ARRAY_SIZE buckets return ; newCapacity = MAX_ARRAY_SIZE; } //newMap: New table array Entry<?,?>[] newMap = new Entry<?,?>[newCapacity]; //HashTable structure modification times++ modCount++; //Calculate the next rehash threshold threshold = ( int )Math.min(newCapacity * loadFactor, MAX_ARRAY_SIZE + 1 ); table = newMap; //Re-hash the key-value pairs of the old hash table into the new hash table (double loop, time-consuming!) //This method is similar to assigning the old array data to the new array after HashMap expansion, //In the HashMap, each bucket position of its old array will be further'disintegrated' and placed on the corresponding bucket position of the new array (there is a process of recalculating the bucket position) for ( int i = oldCapacity; i--> 0 ; ) { for (Entry<K,V> old = (Entry<K,V>)oldMap[i]; old != null ;) { Entry<K,V> e = old; old = old.next; //Recalculate the bucket position according to the new table array capacity newCapacity int index = (e.hash & 0x7FFFFFFF )% newCapacity; e.next = (Entry<K,V>)newMap[index]; //Put the linked list into the bucket newMap[index] = e; } } } Copy code
step
  1. The length of the array is doubled (if it exceeds the upper limit, it is set to the upper limit).
  2. Update the expansion threshold of the hash table.
  3. Traverse the nodes in the old table, calculate the index in the new table, and insert it into the head of the linked list at the corresponding position.

4. get method

public synchronized V get (Object key) {//Get the corresponding index according to the key Entry tab[] = table; int hash = hash(key); //first calculate the hash value according to the key int index = (hash & 0x7FFFFFFF )% tab.length; //find the index according to the hash value for (Entry<K,V> e = tab[index ]; e != null ; e = e.next) { //Traverse the entry chain if ((e.hash == hash) && e.key.equals(key)) { //If the key is found return e.value ; //return the corresponding value } } return null ; //otherwise return null } Copy code
step
  1. Get the synchronized lock first .
  2. Calculate the hash value and index of the key.
  3. Find the node with the same hash and key in the linked list at the corresponding position, and return the value of the node.
  4. If no node is found at the end of the traversal, return
    null
    .

5. remove method

public synchronized V remove (Object key) { Entry<?,?> tab[] = table; //Calculate the hash value and index of the key int hash = key.hashCode(); int index = (hash & 0x7FFFFFFF )% tab.length; @SuppressWarnings ("unchecked") Entry<K,V> e = (Entry<K,V>)tab[index]; for (Entry<K,V> prev = null ; e != null ; prev = e, e = e.next) { //Traverse the linked list at the corresponding position and find the node to be deleted if ((e.hash == hash) && e.key.equals(key)) { modCount++; //Update the next of the predecessor node to the next of e if (prev != null ) { prev.next = e.next; } else { tab[index] = e.next; } count--; V oldValue = e.value; e.value = null ; //Return the value of the node to be deleted return oldValue; } } //If it does not exist, return null return null ; } Copy code
step
  1. Get the synchronized lock first .
  2. Calculate the hash value and index of the key.
  3. Traverse the linked list of the corresponding position to find the node to be deleted. If it exists, use e to represent the node to be deleted, and pre to represent the predecessor node. If it doesn't exist, return
    null
    .
  4. Update the next of the predecessor node to point to the next of e. Returns the value of the node to be deleted.