[Python basics] inventory of Python 10 commonly used data structures (part 2)

[Python basics] inventory of Python 10 commonly used data structures (part 2)

My road to construction

Although it is hard, there are readers to accompany you

Python commonly used data structures

This topic "Inventory of Python10 Common Data Structures" Directory:

  • Learning purpose
  • learning target
  • 1 list
  • 2 tuple
  • 3 set
  • 4 dict
  • 5 deque
  • 6 Counter
  • 7 OrderedDict
  • 8 heapq
  • 9 defaultdict
  • 10 ChainMap
  • summary

Learning purpose

For this topic, try to use the most concise text, with the help of typical cases to inventory the data structures commonly used in Python.

If you are still in the introductory stage of Python, you usually only need to master

list
,
tuple
,
set
,
dict
This kind of data structure can be used flexibly.

However, with the deepening of learning, the actual scenes usually become complicated, and it is necessary to understand the more powerful built-in data structure of Python

deque
,
heapq
,
Counter
,
OrderedDict
,
defaultDict
,
ChainMap
, Mastering them often allows you to write less code and implement functions more efficiently.

learning target

The first stage of learning data structures: master their basic usage and use them to solve some basic problems;

The second stage of learning: know which scenario to choose which is the most appropriate data structure to solve the problem;

The third stage of learning: understand the source code implementation behind the built-in data structure, connect with the knowledge in "Algorithm and Data Structure", and get through the two veins of Ren and Du.

Based on the three stages defined below, the following 10 most commonly used data structures are summarized :

1 list

Basic usage Not much nonsense, there is a separate topic in the front detailing the use of list.

Use Scenario List Use in scenarios that need to be queried and modified, and are very bad at scenarios that require frequent insertion and deletion of elements.

The realization principle The linear table corresponding to the data structure of the list. The length of the list does not need to be specified in the initial state. When the inserted element exceeds the initial length, dynamic expansion is started. When deleting, the element is especially located at the beginning of the list. The time complexity is O(n)

2 tuple

A tuple is a special list that does not allow adding or deleting elements, that is, once it is created, it is never allowed to add, delete, or modify.

Basic usage Tuples are widely used in packing and unpacking. For example, when a function has multiple return values, it is packed into a tuple, and when assigned to the variable on the left side of the equal sign, it is unpacked.

In [ 22 ]: t = 1 , 2 , 3                                           In [ 23 ]:  type (t)                               OUT [ 23 is ]: tuple Copy Code

Actually creates a tuple instance

Use scenarios If you are sure that your object will not be modified behind, you can use tuples boldly. why? This is especially important because it saves more memory than lists, tuple instances.

In [ 24 ]: from sys  import  getsizeof                                               In [ 25 ]: getsizeof(list())                                                       Out[ 25 ]:  72  # A list instance occupies 72 bytes In [ 26 ]: getsizeof(tuple())                                                      OUT [ 26 is ]:  56 is  # a tuple example occupies 56 is bytes duplicated code

So creating 100 instances, tuple can save 1K more bytes.

3 set

Basic usage A set is a data structure that cannot contain repeating elements. This feature is naturally used for deduplication of lists.

In [ 27 ]: a=[ 3 , 2 , 5 , 2 , 5 , 3 ]                                                         In [ 28 ]: set(a)                                                                  OUT [ 28 ]: { 2. 3. 5 } copy the code

In addition, it is also known that the set structure can be used for operations such as intersection, union, and difference of two set instances.

In [ 29 ]: a = { 2 , 3 , 5 }                                                             In [ 30 ]: b = { 3 , 4 , 6 , 2 }                                                           In [ 31 ]: a.interp(b) # Find the intersection                                                       OUT [ 31 is ]: { 2. 3 } copy the code

Usage scenarios If you just want to cache certain element values and require that the element values cannot be repeated, this structure is suitable for use. And the addition and deletion of elements are allowed in the set, and the efficiency is high.

Implementation principle Set internally hashes the value into an index, and then obtains the data according to the index, so the effect of deleting, adding, and querying elements is very high.

4 dict

Basic usage: dict is one of the most frequently used data structures in Python. The creation of a dictionary is through the dict function, {} writing, dictionary generation, etc. The efficiency of adding and deleting elements is very high.

d = { 'a' : 1 , 'b' : 2 } # {}Create a dictionary # List generation In [ 38 ]: d = {a:b  for  a,b in zip([ 'a' , 'b' ],[ 1 , 2 ])}                               In [ 39 ]: d                                                                       OUT [ 39 ]: { 'A'. 1'B'2 } copy the code

Usage scenarios The dictionary is especially suitable for scenarios where there are many queries, and the time complexity is O(1). For example, when the first problem of leetcode solves the sum of two numbers, the O(1) query time complexity of dict will be used.

At the same time, information such as attribute values in Python classes are also cached

__dict__
This dictionary data structure.

However, it is worth noting that dict occupies three or four times the number of bytes of list and tuple, so you should carefully consider scenarios with demanding memory.

In [ 40 ]: getsizeof(dict())                                                       OUT [ 40 ]:  248 Copy Code

Implementation principle A dictionary is a kind of hash table, which also stores key-value pairs.

I believe everyone is familiar with the above four data structures, so I will introduce them briefly and concisely. Next, we will introduce the following 6 data structures and their respective usage scenarios in detail, and will list more examples.

5 deque

Basic usage deque double-ended queue, based on the list to optimize the addition and deletion of data at both ends of the list. Basic usage:

from collections  import  deque In [ 3 ]: d = deque([ 3 , 2 , 4 , 0 ])                                                     In [ 4 ]: d.popleft() # Remove elements from the left, O( 1 ) time complexity                                                             Out[ 4 ]:  3 In [ 5 ]: d.appendleft( 3 ) # Add elements to the left, O( 1 ) time complexity                                                        In [ 6 ]: d                                                                        OUT [ . 6 ]: the deque ([ . 32. 40 ]) Copy the code

The time complexity of adding and deleting elements on the left side of the scene list is O(n), so do not use list when simulating a queue in Python. On the contrary, using a deque double-ended queue is very suitable for scenes that frequently operate at both ends of the list. However, the enhanced deque sacrifices space complexity, so the nested deque must be trade-off carefully:

In [ 9 ]: sys.getsizeof(deque())                                                   Out[ 9 ]:  640 In [ 10 ]: sys.getsizeof(list())                                                   OUT [ 10 ]:  72 Copy Code

Implementation principle Cpython implements deque to use an array of default length 64, remove 1 element from the left each time, and add 1 to leftindex. If the original memory block exceeds 64, the original memory block is released, and then a 64-length array is re-applied and the double-ended linked list block is used. Manage memory blocks.

6 Counter

Basic usage Counter is a data structure inherited from dict for counting the number of elements, also known as bag or multiset. Basic usage:

from collections  import  Counter In [ 14 ]: c = Counter([ 1 , 3 , 2 , 3 , 4 , 2 , 2 ]) # Count the number of occurrences of each element In [ 17 ]: c                                                                       Out[ 17 ]: Counter({ 11322341 }) # In addition, you can also count the most common items # Such as statistics on 1 of the most common items, and returns the number of elements in a tuple In [ 16 ]: c.most_common( 1 )                                                        OUT [ 16 ]: [( 2. 3 )] to copy the code

Use Scenarios Do n t use Counter for problems that basic dict can solve, but if you encounter a scenario where statistical elements appear frequently, don t use dict to implement it yourself, and use Counter decisively.

It should be noted that the elements counted by Counter must be hashable. In other words, it is not feasible to count the number of occurrences of the list, but can it be hashed if the list is transformed into a tuple?

Implementation principle Counter implementation is based on dict. It stores elements on keys and the number of occurrences is values.

7 OrderedDict

Basic usage It is a data structure inherited from dict, which can ensure that the values of keys are taken out in order. Basic usage:

In [ 25 ]: from collections  import  OrderedDict                                     In [ 26 ]: od = OrderedDict({ 'c' : 3 , 'a' : 1 , 'b' : 2 })                                   In [ 27 ]:  for  k,v in od.items():      ...:      print (k,v)      ...:                                                                         C  . 3. 12 duplicated code

Usage scenarios The basic dict cannot guarantee the order, the keys are mapped to hash values, and this value is not stored in the hash table in order. So when you encounter a scene to ensure that the dictionary keys are in order, you must use OrderedDict.

Implementation principle You will be curious about how OrderedDict ensures the order of the keys. Look at cpython and see that it maintains a doubly linked list.

self.__root
, It maintains the order of keys. Since a doubly linked list is used, careful readers may have questions: how to delete key-value pairs to ensure O(1) time to complete?

Cpython uses space in exchange for time and maintains one internally

self.__map
Dictionary, the key is key, and the value points to the node of the doubly linked list
link
. In this way, when a key-value pair is deleted, the link is found in O(1) through __map, and then it is removed from the doubly linked list __root in O(1).

8 heapq

Basic usage A data structure optimized based on list: heap queue, also known as priority queue. The characteristic of heap queue is that the smallest element is always at the root node: heap[0] Basic usage:

import  heapq In [ 41 ]: a = [ 3 , 1 , 4 , 5 , 2 , 1 ]                                                       In [ 42 ]: heapq.heapify(a) # Build a heap, and finish sorting a in place after building a heap In [ 43 ]: a[ 0 ] # a[ 0 ] must be the smallest element In [ 44 ]: a Out[ 44 ]: [ 113524 ] In [ 46 ]: heapq.nlargest( 3 ,a) # the first 3 largest elements of a                                                     Out[ 46 ]: [ 543 ] In [ 47 ]: heapq.nsmallest( 3 ,a) # the first 3 smallest elements of a                                                   OUT [ 47 ]: [ . 1. 12 ] copy the code

Usage scenario If you want to count the first few smallest (large) elements in the list, it is very convenient to use heapq, and it also provides the function of merging multiple ordered small lists into a large list.

Basic principle The heap is a binary tree, and the value of each parent node of it will only be less than or greater than (the value of) all child nodes. The principle is very similar to that of heap sorting.

9 defaultdict

Basic usage The defaultdict is a dict with a default factory. Readers who don't know much about design patterns may be confused about the term factory. To be precise, the factory is called an object factory. Let's experience its basic usage.

The value of the basic dict key does not have a default data type. If the value is a list, it must be created manually:

words=[ 'book' , 'nice' , 'great' , 'book' ] d = {} for  i,word in enumerate(words):      if  word in d:         d[word]. append (i)      else :         d[word]=[i] # display to create a list Copy code

But use defaultdict:

from collections  import  defaultdict d = defaultdict(list) # Create a dictionary whose dictionary value defaults to list for  i,word in enumerate(words):     d[word] = i  Copy code

Without a layer of if logic judgment, the code is clearer. The above defaultdict(list) line of code creates a dictionary whose value is list by default. You can also construct defaultdict(set), defaultdict(dict), etc. This mode is an object factory. Various objects can be manufactured in the factory: list, set, dict...

Usage scenarios As mentioned above, it is clear that it is applicable to scenarios where a default value must be specified for the value of the key, such as list, set, dict, etc. for the value of the key.

Implementation principle The basic principle is to call the factory function to provide the value of the missing key. The topic of design patterns will be discussed in detail later.

10 ChainMap

Basic usage If you want to merge multiple dicts into one big dict, ChainMap will be your choice, and its convenience is reflected in the synchronization changes. Let's look specifically at an example:

In [ 55 ]: from collections  import  ChainMap                                        In [ 56 ]: d1 = { 'a' : 1 , 'c' : 3 , 'b' : 2 }                                                In [ 57 ]: d2 = { 'd' : 1 , 'e' : 5 }                                                      In [ 58 ]: dm = ChainMap(d1,d2)                                                    In [ 59 ]: dm                                                                      OUT [ 59 ]: ChainMap ({ 'A'. 1'C'. 3'B'2 }, { 'D'. 1'E'. 5 }) copy the code

After ChainMap returns a big dict view, if the corresponding key-value pair is modified, the original small dict will also change:

In [ 86 ]: dm.maps # Return a dictionary list                                                                Out[ 86 ]: [{ 'a'2'c'3'b'2'd'10 }, { 'd'1'e'5 }] In [ 87 ]: dm.maps[ 0 ][ 'd' ]= 20    # Modify the key of the first dict to be equal to the value of'd ' as 20                                                    In [ 88 ]: dm                                                                      Out[ 88 ]: ChainMap({ 'a'2'c'3'b'2'd'20 }, { 'd'1'e'5 }) The In [ 89 ]: # original D1 becomes small dict keys 20 is                                                                      Out [ 89 ]: { 'A'2'C'. 3'B'2'D'20 is } copy the code

The  specific usage scenario is that we have multiple dictionaries or mappings and want to merge them into a single mapping. Some readers may say that they can be merged with update. The problem with this is that a new memory structure is created. In addition to wasting space, Another disadvantage is that our changes to the new dictionary will not be synchronized to the original dictionary.

Implementation principle Through maps, it can be observed that ChainMap combines multiple small dicts into the list. In fact, this is indeed achieved. An instance of lis is maintained internally, and its elements are small dicts.

summary

The above are 10 data structures commonly used in Python, 4 basic structures commonly used, and 6 structures adapted to specific scenarios based on their optimization. I will summarize them into three steps for learning them.

When I am determined to truly dedicate some fine originals to readers, and are actively praised, watched, and forwarded by readers, I will have greater motivation to continue writing. I hope this topic is helpful to everyone, welcome to like, watch, and forward.

Wonderful review of past issues Suitable for beginners to get started with artificial intelligence routes and data downloads, machine learning and deep learning notes and other materials, printers, learning online manuals, deep learning notes album, "statistical learning methods" code reproduction album AI Basics Download the mathematics foundation album of machine learning to get a discount coupon of Knowledge Planet on this site, copy the link and open it directly: https://t.zsxq.com/662nyZF qq group on this site 1003271085. To join the WeChat group, please scan the code to enter the group (if you are a Ph.D. or plan to read a Ph.D., please specify): Copy