(Java intern) 10 daily interview questions to punch in-Java simple collection

(Java intern) 10 daily interview questions to punch in-Java simple collection

  • Approaching the autumn recruitment, prepare for the summer internship, I wish everyone a little better every day!
  • This article summarizes the interview questions related to Java collection knowledge, which will be updated daily~


1. How to make ArrayList thread safe?

The answer is as follows:

//Method 1: //The bottom layer of synchronizedList is equivalent to adding the set add remove method of the collection to the synchronized lock List<Object> list = Collections.synchronizedList( new ArrayList<>()); //Method two: //Use thread-safe CopyOnWriteArrayList, and its bottom layer is also to lock the addition, deletion and modification methods: final ReentrantLock lock = this.lock; //Method three: //Write a wrapper class by yourself, inherit ArrayList according to the business, lock the add set remove method to control the copy code

2. Tell me about the difference between Vector and ArrayList?

  • The initial capacity of both is 0 , that is, when the empty parameter constructor is called for instantiation, the capacity of both is 0 , and the initial capacity value of 10 is attached when the element data is added for the first time .
  • Vector expansion, if the expansion increment value is not specified
    capacityIncrement
    , Or the value is not greater than 0 , the expansion of each original 1 times the amount of expansion or
    capacityIncrement
    Value.
  • When the ArrayList is expanded, each expansion is 1.5 times the original .
  • Vector is a thread-safe collection, through the
    remove
    ,
    add
    Wait for the method to add
    synchronized
    Keywords to achieve. ArrayList is a non-thread-safe collection.

Reference articles: ArrayList analysis of JDK collection source code (with example interview questions) , Vector analysis of JDK collection source code


3. Does CopyOnWriteArrayList need to expand when adding new elements? How is it done?

  • CopyOnWriteArrayList bottom and non-array dynamic expansion , the dynamic expansion is not, which is a thread-safe lock by adding reentrant ReentrantLock ensured.
  • When adding elements to CopyOnWriteArrayList , after the thread acquires the execution right of the lock,
    add
    A new capacity will be created in the method (
    Old array capacity +1
    ), copy the old array data into the array, and put the newly added data at the end of the new array.

code show as below:

public boolean add (E e) { final ReentrantLock lock = this .lock; lock.lock(); try { Object[] elements = getArray(); int len = elements.length; Object[] newElements = Arrays.copyOf(elements, len + 1 ); newElements[len] = e; setArray(newElements); return true ; } finally { lock.unlock(); } } Copy code

CopyOnWriteArrayList is suitable for the case of more reading and less writing (separation of reading and writing)! Because every time you call the method of modifying the array structure, you need to re-create the array, and the performance is low !

Article reference: CopyOnWriteArrayList analysis of JDK collection source code


4. What is the difference between HashMap and HashTable?

HashMap:

  • HashMap : The bottom layer is based on array + linked list + red-black tree , which is not thread-safe . The default capacity is 16 , and free health values are allowed .
  • initial
    size
    For 16, expansion:
    newsize = oldsize << 1
    , Size must be 2 to the power of n .
  • When the total number of elements in the Map exceeds 75% of the Entry array , the expansion operation is triggered. In order to reduce the length of the linked list, the element distribution is calculated more evenly
    index
    method:
    index = hash & (tab.length 1).
  • The expansion is for the entire Map . Each time the expansion is performed, the elements in the original array are sequentially recalculated and re-inserted.

HashTable:

  • HashTable : The underlying array + linked list implementation, neither the key nor the value can be
    null
    , Thread safety , the way to achieve thread safety is to lock when the data is modified (
    synchroized
    ) Living in the entire HashTable , the efficiency is low, and ConcurrentHashMap has been optimized.
  • initial
    size
    For 11, expansion:
    (tab.length << 1) + 1.
  • Calculation
    index
    Methods:
    index = (hash & 0x7FFFFFFF)% tab.length
    .

The difference between the two:

  • HashMap is not thread-safe, HashTable is thread-safe (using

    synchronized
    Modification).

  • HashMap allows

    null
    as a
    entry
    of
    key
    or
    value
    , And Hashtable does not allow it.

  • The hash value of HashMap has been recalculated (as in the code below), and Hashtable uses hashCode directly .

    //The algorithm for recalculating the hash value in HashMap static final int hash (Object key) { int h; return (key == null )? 0 : (h = key.hashCode()) ^ (h >>> 16 ); } Copy code
  • Hashtable inherits from the Dictionary class, and HashMap is an implementation class of the Map interface.

HashMap and HashTable to find the addressing algorithm of bucket index:

  • HashMap :
    index = hash & (tab.length 1)
  • HashTable :
    index = (hash & 0x7FFFFFFF)% tab.length

The two formulas for finding the bucket index are to make the bucket index calculated each time more scattered, which can reduce hash collisions.

In HashTable :

  • 0x7FFFFFFF
    Yes
    0111 1111 1111 1111 1111 1111 1111 1111
    : All 1 except the sign bit .
  • (hash & 0x7FFFFFFF)
    The result will produce a positive integer.
  • (hash & 0x7FFFFFFF)% tab.length
    The calculated index result is always a positive integer, and make sure that the value of index is in
    tab.length
    Within range!
  • The use of odd numbers for the array length of HashTable will cause fewer hash conflicts, and the use of even numbers will cause more conflicts! So the initial capacity is 11 and the expansion is
    newsize = olesize << 1+1
    (Ie 2n + 1 times), to ensure that each expansion result is an odd number.

In HashMap :

  • The initial capacity is 16 , when the number of valid data reaches 0.75 times the array capacity , the expansion is triggered.
  • Bucket position calculation formula:
    index = hash & (tab.length 1)
    When the bit bucket index is calculated, the capacity must be as 2 to n -th power (i.e. even), this is to reduce the hash conflicts, expansion:
    newsize = oldsize << 1
    (Ie 2n times), the result obtained is also an even number.
  • In addition, when the length of the linked list in the bucket is greater than 8 and the length of the array reaches 64 , the linked list is treeized, and when it is less than 6 , de-treeing is performed.
  • Before JDK1.8 , the linked list in HashMap uses the head interpolation method . The advantage is that the efficiency is higher than the tail interpolation method because there is no need to traverse the linked list once and then insert data.
  • After JDK1.8, the tail interpolation method is used . The reason why the tail interpolation method is used is to determine whether the length of the segment linked list is greater than 8. In this case, treeization should be considered.
  • The method used by HashMap to resolve hash conflicts is the linked list method .
  • HashMap first inserts data and then judges whether storage capacity is needed!

Article reference: HashMap analysis of JDK collection source code (top) , HashMap analysis of JDK collection source code (bottom) , HashTable analysis of JDK collection source code


5. What is the difference between HashMap and TreeMap?

  • HashMap was introduced above, just look at TreeMap .
  • The bottom layer of TreeMap is based on a balanced binary tree (red-black tree), which can customize the sorting rules. To implement the Comparator interface, various sorts of internal elements can be conveniently realized
    TreeMap(Comparetor c)
    , But the performance is worse than HashMap .

6. The relationship between Set and Map

  • Both cores do not store duplicate elements, but store a set of unique objects.
  • Each realization of Set corresponds to a package in Map .
  • For example, the bottom layer of HashSet is to encapsulate HashMap , and the bottom layer of TreeSet is to encapsulate TreeMap .

7. Why choose the red-black tree at the bottom of HashMap instead of other trees, such as binary search tree, why not use red-black tree at the beginning, but only when the length of the linked list reaches 8 and the array capacity is greater than 64?

  • The binary search tree will also become a linear structure under special circumstances. It has the same deep traversal problem as the original long linked list, and the search performance is slow, for example:
  • The use of red-black trees is mainly to improve the speed of finding data. Red-black trees are a kind of balanced binary tree. After inserting new data ( new data is initially inserted by red nodes ), it will be balanced by left-handed, right-handed, and color-changing operations. , Solve the problem of singly linked list query depth.
  • The reason why the red-black tree was not used at the beginning is because when the amount of data in the linked list is small, traversing the linear linked list consumes less resources than traversing the red-black tree. ), so the linear table is used in the early stage.
  • The reason why 8 is the tree threshold is that after a lot of testing, the value of 8 is the most appropriate. Ideally, using a random hash code, the frequency of nodes distributed in the hash bucket follows the Poisson distribution . According to the Poisson distribution formula, the probability when the number of nodes in the linked list is 8 is 0.00000006 , which is low enough And when it reaches 8 nodes, the performance advantages of the red-black tree will begin to show, so 8 is a more reasonable number.

Since the red-black tree is mentioned, it is bound to ask about the 5 properties of the red-black tree:

Red-black tree in nature
Property 1: Each node is either black or red .
Property 2: The root node is black .
Property 3: Each leaf node ( NIL ) is black .
Property 4: The two child nodes of each red node must be black . You can not have two red colored section connected to the point.
Property 5 : The path from any node to each leaf node contains the same number of black nodes . Commonly known as: Hei Gao !

Red-black tree example diagram :

The red-black tree is not a perfectly balanced binary search tree. It can be seen from the figure that the left subtree of the root node P is obviously higher than the right subtree.

But the number of layers of black nodes in the left subtree and the right subtree is equal, that is, the path from any node to each leaf node contains the same number of black nodes (Property 5) .

So we call the red-black tree the black perfect balance .

So when a hash conflict occurs in a bucket bit, instead of using the red-black tree directly, use the linked list first?

  • First of all, we need to know that the linked list is not an array, and its storage address is not continuous. When retrieving data, you need to use pointers one by one
    next
    Until the target data is found.
  • If the number of hash conflicts in the bucket is small, the time spent traversing the linked list is not much. However, once the number of hash conflicts is relatively large, resulting in a long linked list formed in the bucket, it takes a lot of time to traverse the long linked list!
  • And if the long-linked list tree is transformed into a red-black tree, the red-black tree is a special binary tree . The binary tree can be searched in half. Ideally, the query time can be directly halved!
  • The reason why the red-black tree was not used directly at the beginning, because after all, the storage space occupied by the tree structure must be much larger than that of the linked list, so when the length of the linked list is short, there is no need to tree!

In short, the choice between the linked list and the red-black tree is entirely based on a trade-off between time efficiency and space size ~

Reference article: HashMap bottom red-black tree implementation (implement a simple red-black tree yourself)


8. Why must the capacity of HashMap be 2 to the power of N? What if the input value is not a power of 2, such as 10?

Question 1: Why must the capacity of HashMap be 2 to the power of N ?

  • The core purpose is to evenly distribute the inserted nodes and reduce hash conflicts .

The HashMap construction method can specify the initial capacity of the collection, such as:

//Construct an empty HashMap with specified initial capacity and default load factor (0.75). HashMap( int initialCapacity) Copy code

When adding an element to the HashMap , you need to determine its specific bucket position in the array (addressing algorithm) according to the hash value of the key . In order to store data efficiently and reduce collisions, HashMap must distribute the data as evenly as possible , and the length of each linked list is roughly the same ==. The key to this realization is the algorithm of storing the data in which linked list.

This algorithm is actually a modulo operation:

hash% tab.length
, And the direct remainder calculation in the computer is not as efficient as the displacement calculation . So the source code is optimized, using
hash & (tab.length- 1)
Come to find the bucket. And actually
hash% length
equal
hash & (length-1)
The premise is that length must be 2 to the power of n
.

For example, array length

tab.length = 8
when,
3 & (8-1) = 3, 2 & (8-1) = 2
, The position of the bucket is (array index) 3 and 2 , and no hash collision occurs at different positions .

Let's look at a length of the array (the number of bits barrel) is not 2 of n -th power of the situation:

As can be seen from the above figure, when the length of the array is 9 (not 2 to the power of n ), different hash values are hashed ,

hash & (length-1)
The resulting array subscripts are equal (hash collision is easy to occur).

So let's summarize HashMap array capacity using 2 of n reasons of power:

  • When the bucket subscript index is determined according to the hash value addressing calculation of the key , if the length of the HashMap array is
    tab.length
    It is the power of 2 to the nth power, then it can ensure that the data newly inserted into the array is evenly distributed, and each bucket bit may be allocated to the data, and if the length of the array is not the power of 2 to the nth power, then some buckets the position will never be inserted into the data, but some frequent barrel-bit hash conflict, resulting in a waste of space array, the red hash increase the probability of conflict.
  • general we find people logic array subscripts bit bucket index , tend to use modulo arithmetic to determine the index , i.e.,
    index = hash% length
    , However, the efficiency of the computer s modulo budgeting is far less than that of bit operations, so it needs to be
    hash & (length-1)
    Way of addressing. Essentially, the results obtained by the two methods are the same, namely:
    hash & (length-1) = hash% length
    .

Therefore, the reason why the capacity of the HashMap array uses the n- th power of 2 is to ensure that the new data can be evenly distributed in each bucket when the newly inserted data is determined by the addressing algorithm. The probability of frequent hash collisions on each bucket . After all, the more hash conflicts in a bucket, the longer the length of the linked list in the bucket, which greatly reduces the efficiency of data retrieval (because array linear query is definitely much faster than linked list).

Question 2: What if the length of the input array is 10 when creating a HashMap object instead of 2 to the power of n ?

E.g:

HashMap<String, Integer> hashMap = new HashMap( 10 ); Copy code

In this case, the HashMap two-parameter constructor will pass

tableSizeFor(initialCapacity)
Method to get the closest
length
And greater than
length
The n-th power of 2
(such as the closest 10 and greater than 10 to 2 to n number of times power is 16 )

  • The calculation method of the smallest number of 2 to the power of N that is greater than or equal to the capacity is as follows:
static final int tableSizeFor ( int cap) { int n = cap- 1 ; n |= n >>> 1 ; n |= n >>> 2 ; n |= n >>> 4 ; n |= n >>> 8 ; n |= n >>> 16 ; return (n < 0 )? 1 : (n >= MAXIMUM_CAPACITY)? MAXIMUM_CAPACITY: n + 1 ; } Copy code

this way

tableSizeFor(initialCapacity);
It is used to obtain greater than or equal initialCapacity minimum of 2 of the n number of times power.

The calculation is designed in place here. Since it is an interview question, I will not explain step by step how to calculate recursion. A case diagram is directly attached. Please refer to the following source code analysis article for details:

Article reference: JDK collection source code analysis-HashMap (on)


9. How is HashMap designed to calculate the hash value of the key? Why should the high 16 bits of hashCode participate in the operation?

The method of recalculating the hash value in HashMap is as follows:

static final int hash (Object key) { int h; //If the key is null, the hash value is 0, //Otherwise, call the hashCode() method of the key to calculate the hash value of the key and assign it to h, //Then Perform a bitwise XOR with the binary number unsigned by 16 bits to the right of h to get the final hash value. //This is to make the calculated hash more scattered, //why is it more scattered? Because the more scattered, the shorter the length of the linked list of a bucket, the fewer red-black trees will be generated later, the higher the retrieval efficiency! return (key == null )? 0 : (h = key.hashCode()) ^ (h >>> 16 ); } Copy code
  • As shown in the code: the key is hashCode high 16 -bit and hashCode lower 16 bits exclusive-OR (XOR) operation, the finally obtained the new hash value.

The analysis of the method is as follows:

  1. key.hashCode();
    The returned hash value is also the hashcode , assuming a random value generated.
  2. ^
    (Bitwise XOR operation) Operation rules: On the same binary digit, the number is the same, the result is 0 , and the difference is 1 .
  3. h >>> 16
    Shift the value of h to the right by 16 bits unsigned .

Question: Why do you want to do this?

We know that the newly inserted data in HashMap needs to go through the addressing algorithm

index = hash & (tab.length-1)
To determine the bucket subscript.
tab.length
It is the length of the array, we set it to n here .

If n is the array length is very small, suppose it is n = 16 , then n-1 is 15 , and its binary number is 1111 , such a value and hashCode directly do the bitwise AND operation, in fact only the hash value is used The last 4 digits. If the high bit of the hash value changes a lot and the low bit changes very little, it is easy to cause a hash conflict, so the high and low bits are used here to solve this problem .

Let's look at an analysis chart:

From the picture above, you can know if you only use

key.hashCode()
The hash value calculated by the method , then when the high-level change of the hash value is large, and the low-level change is small, the addressing algorithm is used
hash & (tab.length-1)
The obtained bucket index is more prone to hash conflicts!


10. Tell me about your understanding of the hash algorithm? And what is hash conflict?

  • The basic concept of hash is to map an input of any length into a fixed-length output after passing the hash algorithm .
  • In the program, you may encounter two value values after the hash algorithm is calculated, and the same hash value is calculated . This situation is called a hash conflict.

Finally, I recommend a harder HashMap analysis point: Why should the load factor be set to 0.75 ?


Supplementary question: talk about how to migrate the elements of the old array to the new array during resize expansion?

When the HashMap expands, it will be accompanied by a re-hash allocation , and it will traverse all the elements in the old array and migrate them to the expanded new array. There are three cases of data migration in the old array, which are analyzed separately below:

Case 1: There is no hash conflict in the current bucket, and there is only one element:

In this case, HashMap uses

rehash
The method is very clever, because each expansion is doubled, which is different from the original calculation
(n-1) & hash
Compared with the result, there is only one bit more , so the node is either in the original position or allocated to the "original position + old capacity" position.

For example, when we expand from 16 to 32, the specific changes are as follows:

After the element recalculates the hash , the array length n becomes 2 times the original , and the mark range of n-1 is 1 bit higher (red mark), so the new index will change like this.

Description :

In the above figure, 5 is the original index calculated by the assumption. This verifies the above description: After the expansion, the node is either in the original position or allocated to the "original position + old capacity" position.

Therefore, when we expand the HashMap , we don t need to recalculate the hash . We just need to see whether the bit added by the original hash value is 1 or 0. If it is 0 , the index has not changed. If it is 1 , the index becomes " Original location + old capacity" . We can look at the next picture shows the 16 expanded to 32 of

resize
Schematic diagram:

Precisely because of this ingenious

rehash
This way, it saves the time to recalculate the hash value, and at the same time, because the newly added 1bit is 0 or 1, it can be considered as random.
resize
Guaranteed during the expansion process of
rehash
After that, the number of nodes on each bucket must be less than or equal to the number of nodes on the original bucket, ensuring
rehash
After that, there will be no more serious hash conflicts, and the nodes of the previous conflicts will be evenly distributed to the new buckets.

Case 2: A hash conflict occurs in the current bucket position, and a linked list is formed, but it is not a red-black tree:

At this time, the bucket chain is split into upper strand and lower strand two lists into a new array sequentially after the expansion. The text description is not as good as directly uploading the code:

//Note: Before the expansion of hashMap, the table is not null if (oldTab != null ) { //Move the data of each bucket to the new hash table//Traverse each of the old hash tables Bucket, recalculate the new position of the element in the bucket for ( int j = 0 ; j <oldCap; ++j) { //current node node Node<K,V> e; //Note: There is data in the current bucket at this time, but the specific data is //1. A single data, 2. It is a linked list, 3. It is a red-black tree, and it is not certain if ((e = oldTab[j])! = null ) { //The original data is assigned to null to facilitate GC recovery oldTab[j] = null ; //The first case: determine whether the array has the next reference (whether it is a single data) if (e.next == null ) //There is no next reference, indicating that it is not a linked list. //There is only a single data key-value pair on the current bucket. //You can put the data directly into the new hash table //e.hash & (newCap-1) find There are two index results obtained by the address formula: //1. Same as the index position in the original old hash table, //2. Index position in the original old hash table i + old capacity oldCap newTab[e.hash & (newCap - . 1 )] = E; //second case: the tub has been formed bit red-black tree the else IF (Einstanceof TreeNode) //Explain that it is the red-black tree to handle the conflict, then call the relevant method to separate the tree. //Red-black tree, I will write a separate blog for you to analyze it in detail. //Red-black tree can be related first Skip ((TreeNode<K,V>)e).split( this , newTab, j, oldCap); //The third case: the bucket has formed a linked list else { //Use the linked list to handle conflicts //Low-level linked list: //The subscript position of the array after expansion, use Node<K,V> when it is the same as the subscript position of the current array. loHead = null , loTail = null ; //High linked list: The subscript position of the array after expansion is equal to //current Array subscript position + the length of the array before expansion, oldCap, use Node<K,V> hiHead = null , hiTail = null ; Node<K,V> next; //Calculate the new position of the node through the principle explained above do { //original index next = e.next; //Here to judge if it is true //e this node does not need to move position after resize //For example: //If hash1 -> ...... 0 1111 //If oldCap=16 -> ... ... 1 0000 //The result of e.hash & oldCap is 0, then //the subscript position j of the array after expansion is consistent with the subscript position of the current array //use the low-order linked list if ((e.hash & oldCap) == 0 ) { if (loTail == null ) loHead = e; else loTail.next = e; loTail = e; } //Example: //if hash2 -> ...... 1 1111 //if oldCap=16 -> ...... 1 0000 //e.hash & oldCap result is not 0, then //expand Then the subscript position of the array is: //current array subscript position j + the length of the array before expansion oldCap //use high-order linked list else { if (hiTail == null ) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null ); //Put the low linked list in the bucket if (loTail != null ) { loTail.next = null ; //index position = current array subscript position j newTab[j] = loHead; } //Put the high-level linked list in the bucket if (hiTail != null ) { hiTail.next = null ; //Index position = current array subscript position j + the length of the array before expansion oldCap newTab[j + oldCap] = hiHead; } } } } } Copy code

Situation 3: A red-black tree is formed in the bucket position:

If the interviewer wants to ask about the migration of red-black trees, if it is me, I choose to just give up. The block is really complicated.

If you want to fully understand the HashMap source code, please refer to these articles: HashMap analysis of JDK collection source code (top) , HashMap analysis of JDK collection source code (bottom) , HashMap bottom red-black tree implementation (implement a simple red-black tree yourself )


The summarized interview questions are also time-consuming. The articles will be updated from time to time, sometimes a few more a day. If you help you review and consolidate your knowledge points, please support three consecutive times, and you will be updated with billions of points in the future!