Analysis of linux slub allocator

Analysis of linux slub allocator

In the Linux kernel, SLAB has been replaced by its simplified version-SLUB. I recently took time to look at the code of SLUB and remember some of my own understanding.
Although SLUB is implemented in the kernel, user-mode object pools can actually learn from this approach.  
The overall idea of ​​SLUB is similar to that of SLAB. The memory in the object pool is allocated and recycled in units of "large blocks". Then each "big block" is divided into "small blocks" according to the size of the object, and the user allocates and recycles the objects in the unit of "small blocks".
In addition, kmem_cache has the following parameters (explained later): int size;/* space occupied by each object*/int objsize;/* object size*/
int offset;/* in the space occupied by the object, Store the offset of the next pointer*/int refcount;/* Reference count*/
int inuse;/* The size of the object excluding the next pointer*/int align;/* Alignment bytes*/void (*ctor)( void *);/* Constructor*/
unsigned long min_partial;/* The minimum number of pages saved in kmem_cache_node*/struct kmem_cache_order_objects oo;/* Preferred page allocation strategy (how many consecutive pages are allocated and how many objects can be divided) */
struct kmem_cache_order_objects min;/* Second-choice page allocation strategy*/
(There are also some members, or support some options, or support DEBUG, or service for periodic memory recovery. I will not list them here. )
General structure
kmem_cache is an object pool manager, and each kmem_cache manages one type of object. All managers are connected by the header slab_caches through a doubly linked list.
This is the same as the previous SLAB.  
The memory management of kmem_cache is done through the member kmem_cache_node. In the NUMA environment (non-homogeneous storage structure), a kmem_cache maintains a set of kmem_cache_node, corresponding to each memory node. kmem_cache_node is only responsible for managing the memory under the corresponding memory node. (If it is not a NUMA environment, then there is only one kmem_cache_node.)
The actual memory is still managed by the page structure, kmem_cache_node concatenates a set of pages through the partial pointer (nr_partial represents the length of the linked list), and they represent the memory in the object pool . The page structure not only represents the memory, there are also some union variables in the structure to record the object allocation of the corresponding memory (only valid when the page is added to the SLUB allocator, in other cases these variables have another explanation).
The original SLAB is more complicated. The page in the SLAB only manages memory and does not maintain the concept of "object". Instead, an additional SLAB control structure (slab) manages the object and delimits the object's boundary through some pointer arrays in the slab structure.
As mentioned earlier, the memory in the object pool is allocated and recycled in units of "large blocks", and page is such a large block. The page is divided into several small blocks, and each block is used to hold an object. These objects are stored in the form of a linked list, page->freelist is the head of the linked list, and only unallocated objects will be placed in the linked list. The next pointer of the object is stored at its offset kmem_cache->offset. (See the figure above)
In SLAB, the "bulk" is the slab structure that provides control information. The page structure only represents memory, it is only the resource referenced by slab.
Each page does not only represent one page, but 2^order consecutive pages. The order value here is determined by oo or min in kmem_cache. When allocating pages, first try to use the order value in oo to allocate consecutive pages of a more appropriate size (this value is calculated when kmem_cache is created. When using this value, you need to allocate a certain number of consecutive pages to make the memory divided into " There is less scrap leftover after “small pieces”). If the allocation is unsuccessful (long running time, more memory fragmentation, it is not easy to allocate a large number of continuous pages), then use the order value in min to allocate the smallest number of continuous pages that meet the object size (this value is also when creating kmem_cache Calculated).
kmem_cache_node series by a set of partial page pointer, page which must not be filled in (a page is divided into a page-> objects object size spaces that have page-> inuse has been
over used. If page->objects==page->inuse, then page is full). If a page is full, it will be removed from the linked list. And if the page is free (page->inuse==0), it will also be released under normal circumstances, unless the nr_partial (linked list length) here is less than the min_partial in kmem_cache. (Since it is a pool, there should be a certain amount of stock, min_partial represents the lowest stock. This value is also calculated when the kmem_cache is created. When the size of the object is larger, you will get a larger min_partial value. Because of the larger size value More consecutive pages are needed when allocating pages, and allocating consecutive pages is not as easy as a single page, so you should cache more.)
The original SLAB has three linked lists, which maintain "full", "partial", and "free" slabs respectively. "Free" and "partial" are merged into one in SLUB and become the previous partial list. The "full" page is not maintained. In fact, there is no need to maintain, because the page is already full and can no longer satisfy the allocation of objects, and can only respond to the recycling of objects. When the object is recycled, the corresponding page structure can be obtained through the address of the object (the address of the page structure corresponds to the memory address, see "Linux Memory Management Analysis"). Maintaining the full page can facilitate the viewing of the state of the allocator, so in the DEBUG mode, a full linked list is still provided in kmem_cache_node.
Allocation and release
allocation and deallocation of objects is not directly above the kmem_cache_node operation, but in kmem_cache_cpu. A kmem_cache maintains a set of kmem_cache_cpu, corresponding to each CPU in the system. kmem_cache_cpu is equivalent to providing an allocation cache for each CPU to avoid the CPU always going to kmem_cache_node to do operations, and competition. And kmem_cache_cpu can make the objects cached by it be fixed on a CPU, thereby improving the cache hit rate of the CPU. kmem_cache_cpu only provides a page cache.
The original SLAB provided an array_cache structure for each CPU to cache objects. The organization of an object in the array_cache structure is different from its organization in the slab, which adds complexity. And SLUB organizes objects through the page structure, and the organization forms are the same.
When performing object allocation, first try to allocate on kmem_cache_cpu. If the allocation is unsuccessful, go to kmem_cache_node and move a page to kmem_cache_cpu. There are two reasons for unsuccessful allocation: the page on kmem_cache_cpu is already full, or the node that needs to be allocated now is different from the node corresponding to the cache page on kmem_cache_cpu. For the case where the page is full, the page is removed from kmem_cache_cpu (or moved to the full linked list corresponding to kmem_cache_node in DEBUG mode); and if the node does not match, the cached page on kmem_cache_cpu will be moved first Go back to the partial linked list corresponding to kmem_cache_node (Furthermore, if the page is free and the length of the partial linked list is not less than min_partial, the page is released).
Conversely, when the object is released, the address of the corresponding page can be found through the address of the object, and the object can be placed on the page. But there is also some special logic. If the page is being cached by kmem_cache_cpu, there is nothing to process; otherwise, when the object is placed on the page, the page needs to be locked (because other CPUs may also be allocated or Release the object). In addition, if the page is
null before the object is recycled, the page becomes partial after the object is released, and it should also be added to the partial linked list of the corresponding kmem_cache_node. If the page becomes free after the object is reclaimed, it should be released.
There is one more detail about the release of the object. Since the object will be placed back on the corresponding page, what if this page is being cached by other CPUs (kmem_cache_cpu of other CPUs refers to using this page)? In fact, it doesn't matter, kmem_cache_cpu and page each have a freelist pointer. When the page is cached by a CPU, all objects on the page's freelist are moved to the kmem_cache_cpu freelist (in fact, it is a pointer assignment), and the page's freelist becomes NULL. When it is released, it is released to the freelist of the page. The two freelists do not affect each other. But this place seems to have a problem. If the freelist of a cached page becomes non-NULL due to the release of the object, then this page may be cached to the kmem_cache_cpu of other CPUs, so multiple kmem_cache_cpu may cache the same one. page. This will cause the internal cache of one CPU to be cached to objects on other CPUs (because the CPU cache and the object are not aligned), and the object write operation on one CPU may cause the cache of another CPU to be invalid.
When kmem_cache is created, SLUB will calculate a reasonable layout of the object pool based on various information (see the figure above). objsize is the size of the object itself; this size becomes inuse after alignment processing; the next pointer of the object (marked by offset) may be stored immediately behind the inuse, thereby expanding the actual size of the object to size.
In fact, the offset here does not always point to the position behind inuse (otherwise the offset can be replaced by inuse). There are two possible values ​​for offset, one is inuse and the other is 0. The offset here specifies the position of the next pointer, and next is for concatenating the object in the free linked list. Then, when the next pointer is needed, the object must be free, and the space inside the object is not used. So just right, the first word-length space in the object is used as the next pointer, and the offset is 0 at this time. But in some special cases, the space in the object cannot be reused as the next pointer. For example, if the object provides a constructor function ctor, then the space of the object has been constructed. At this time, offset is equal to inuse, and the next pointer has to be stored behind the object's space. Compared with SLAB,
vs slab
, SLUB has another interesting feature. When a new object pool is created, if it is found that the size of a previously created kmem_cache is equal to or slightly larger than the new size, the new kmem_cache will not be created, but the size of kmem_cache will be reused. Therefore, kmem_cache also maintains a refcount (reference count), indicating the number of times it has been reused.
In addition, SLUB also removed an interesting feature in SLAB, Coloring.
What is coloring? When a "big block" of memory is divided into "small blocks" according to the size of the object, it may not be exactly that, and there will be some corners. Coloring is to use these corners to make a fuss, so that the starting address of the "small block" is not always equal to the 0 address in the "big block", but floats between the 0 address and the vacant size. In this way, each object of the same type has more changes in the lower bits of its address.
Why do you want to do this? This is considering the CPU cache. We have all heard that when learning operating system principles, in order to improve the efficiency of CPU access to memory, CPU provides cache. So there is from memory to cache
Mapping between. When the CPU instruction requests to access a memory address, the CPU will first check whether the address has been cached.
How is the memory-to-cache mapping achieved? Or how does the CPU know whether a certain memory address is cached?
An extreme design is "fully connected mapping", any memory address can be mapped to any cache location. Then when the CPU gets an address, it may be cached too many cache locations, it needs to maintain a huge mapping table and spend a lot of query time to determine whether an address is cached. This is not very desirable.
Therefore, cache mapping always has such a restriction, a memory address can only be mapped to certain cache locations. In general, the lower bits of the memory address determine where the memory is cached (for example, cache_location = address% cache_size).
Well, back to the coloring of SLAB, coloring can reduce the probability that the lower bits of the same type of object have the same address, thereby reducing the probability of conflicting mapping of these objects in the cache.
What use is this? In fact, there are many situations where many objects of the same type are used together, such as arrays, linked lists, vectors, and so on. When we are traversing these collections of objects, if each object can be cached by the CPU, then the processing efficiency of this traversal code is bound to be improved. This is the meaning of coloring.
SLUB removed the coloring because it was more stingy about the memory usage, and minimized the corners and corners as much as possible, so you don't want to color it at all. Also, since kmem_cache can be reused by multiple objects of similar size, the more reused, the less meaningful the coloring becomes.

Reprinted at: https://blog.51cto.com/10495372/1670843


Reference : https://blog.csdn.net/weixin_34255793/article/details/92702440