In-depth understanding of send and recv in TCP network programming again

In-depth understanding of send and recv in TCP network programming again

Article source: click to open the link

blog.csdn.net/yusiguyuan/...

blog.csdn.net/yusiguyuan/...

 

First clarify a concept: each TCP socket has a sending buffer and a receiving buffer in the kernel. The full-duplex working mode of TCP and the sliding window of TCP are dependent on these two independent buffers and this buffer. The filling state. The receive buffer caches the data into the kernel. If the application process has not called read for reading, the data will always be cached in the receive buffer of the corresponding socket. To be more verbose, regardless of whether the process reads the socket, the data sent by the peer will be received by the kernel and buffered in the socket's kernel receiving buffer. What read does is to copy the data in the kernel buffer to the buffer of the application layer user, nothing more. When the process calls send to send the data, in the simplest case (and the general case), the data is copied into the kernel sending buffer of the socket, and then send will return in the upper layer. In other words, when send returns, the data will not necessarily be sent to the opposite end (similar to writing files), send just copies the data of the application layer buffer into the kernel sending buffer of the socket. Later, I will use an article to introduce the kernel actions associated with read and send. Every UDP socket has a receiving buffer and no sending buffer. Conceptually, it is sent as long as there is data, regardless of whether the other party can receive it correctly, so it does not buffer and does not need to send buffer.

The receiving buffer is used by TCP and UDP to buffer data from the network, and it is kept until the application process reads it away. For TCP, if the application process has not read, after the buffer is full, the action that takes place is to notify the opposite end of the window in the TCP protocol to close. This is the realization of the sliding window. Ensure that the TCP socket receiving buffer will not overflow, thereby ensuring that TCP is reliable transmission. Because the other party is not allowed to send data that exceeds the advertised window size. This is TCP's flow control. If the other party ignores the window size and sends data that exceeds the window size, the receiver TCP will discard it. UDP: When the socket receiving buffer is full, the new datagram cannot enter the receiving buffer and the datagram is discarded. UDP has no flow control; a fast sender can easily overwhelm a slow receiver, causing the receiver's UDP to drop datagrams.

The above is the realization of TCP reliability and UDP unreliability.

TCP_CORK TCP_NODELAY

These two options are mutually exclusive, open or close the nagle algorithm of TCP, the following is a scenario to explain

The typical webserver response to the client, the application layer code implementation process is roughly as follows:

if(condition 1){

Fill buffer_last_modified with protocol content "Last-Modified: Sat, 04 May 2012 05:28:58 GMT";

send(buffer_last_modified);

}

if(condition 2){

Fill buffer_expires with the agreement content "Expires: Mon, 14 Aug 2023 05:17:29 GMT";

send(buffer_expires);

}

. . .

if(condition N){

Fill buffer_N with protocol content "...";

send(buffer_N);

}

For such an implementation, when the current http response executes this code, assuming that M (M<=N) conditions are met, then there will be consecutive M send calls, is that the lower layer will send to the client in turn What about M TCP packets? The answer is no. The number of packets cannot be controlled at the application layer, and the application layer does not need to be controlled.

I use the following four hypothetical scenarios to explain this answer

Since TCP is streaming, for TCP, each TCP connection has only the beginning of syn and the end of fin. The data sent in the middle has no boundaries. The only thing that multiple consecutive send does is:

If the file descriptor of the socket is set to blocking mode, and the send buffer has enough space to accommodate all the data in the application layer buffer indicated by the send, then copy these data from the application layer buffer to the kernel send buffer , And then return.

If the file descriptor of the socket is set to blocking mode, but the send buffer does not have enough space to accommodate all the data in the application layer buffer indicated by the send, then copy as much as possible, and then the process hangs until the TCP peer receives the data. When there is free space in the buffer, the sliding window protocol (another function of the ACK packet-opening the window) informs the TCP local end: "Dear, I am ready, you can continue to send me X bytes The data in the application layer", then the local kernel wakes up the process, continues to copy the remaining data to the send buffer, and the kernel sends TCP data to the TCP peer, if the data in the application layer buffer indicated by send still cannot be copied all this time , Then the process repeats. . . Until all data is copied, return.

Please note that for the behavior of send, I used "copy once". Send has nothing to do with whether the lower layer sends data packets.

If the file descriptor of the socket is set to non-blocking mode, and the send buffer has enough space to accommodate all the data in the application layer buffer indicated by the send, then copy these data from the application layer buffer to the kernel send buffer Area, and then return.

If the file descriptor of the socket is set to non-blocking mode, but the send buffer does not have enough space to accommodate all the data in the application layer buffer indicated by the send, then copy as much as possible, and then return the number of bytes copied. More involved, there are two ways to deal with it after returning:

1. Infinite loop, call send all the time, continue testing until the end (basically not doing this).

2. Non-blocking collocation with epoll or select, use these two things to test whether the socket reaches the active state that can be sent, and then call send (a necessary processing method for high-performance servers).

In summary, and please refer to the aforementioned SO_RCVBUF and SO_SNDBUF in this article, you will find that in actual scenarios, how many TCP packets you can send and how much data each packet carries, in addition to being affected by your own server configuration and environmental bandwidth, the receiving status of the peer It can also affect your delivery status.

As for why it is said that "the application layer does not need to control the sending behavior", the reason for this statement is:

The software system processes various software behaviors in layers and modules in order to perform their duties and divide labor. The application layer only cares about business realization and control business. Data transmission is handled by a special level, so the scale and complexity of application layer development will be greatly reduced, and development and maintenance costs will be reduced accordingly.

Back to the topic of sending:) I said that the application layer cannot precisely control and completely control the sending behavior, is it just not controlled? No! Although it is impossible to control, but try to control it!

How to control as much as possible? Now introduce the topic of this section-TCP_CORK and TCP_NODELAY.

cork: stopper, plug

nodelay: don't delay

TCP_CORK: Try to accumulate data in the sending buffer, and then send it after saving too much, so that the effective load of the network will increase. Explain briefly and rudely the problem of this payload. If there is only one byte of data in each packet, in order to send this byte of data, and then wrap this byte with a thick TCP header, then almost all the headers are running on the network, which is effective. Data only occupies a small part of it, and bandwidth can be exhausted easily for many servers with high traffic. Then, in order to increase the effective load, we can instruct the TCP layer through this option to save as much data as possible when sending, fill them into a TCP packet and send it out. This is contradictory to improving sending efficiency. Space and time are always a bunch of enemies!!

TCP_NODELAY: Try not to wait, as long as there is data in the sending buffer and the sending window is open, try to send the data to the network.

Obviously, the two options are mutually exclusive. How to choose these two options in the actual scene? Let me illustrate again

webserver, download server (ftp sending file server), a server that requires a relatively large amount of bandwidth, use TCP_CORK.

Servers that involve interaction, such as ftp's server that receives commands, must use TCP_NODELAY. The default is TCP_CORK. Imagine that the user types a few bytes of commands each time, and the lower layer is accumulating the data, and wants to wait until the amount of data is large before sending it, so that the user will wait until he goes crazy. There is a special vocabulary to describe this terrible scene-sticky (nian pinyin two tone) package

==========

In this article, we use a blocking socket instance on a test machine to illustrate the topic. All the pictures in the article are now intercepted on the test system.
\

3 concepts to understand

1. TCP socket buffer

Each TCP socket has a sending buffer and a receiving buffer in the kernel. The full-duplex working mode of TCP and the flow (congestion) control of TCP are dependent on these two independent buffers and the filling status of the buffers. . The receive buffer caches the data into the kernel. If the application process has not called recv() for reading, the data will always be cached in the receive buffer of the corresponding socket. To be more verbose, regardless of whether the process calls recv() to read the socket, the data sent by the peer will be received by the kernel and buffered in the kernel receiving buffer of the socket. What recv() does is to copy the data in the kernel buffer to the buffer of the application layer user and return, nothing more. When the process calls send() to send the data, the simplest case (and the general case) is to copy the data into the socket's kernel sending buffer, and then send will return in the upper layer. In other words, when send() returns, the data will not necessarily be sent to the opposite end (similar to write files), send() just copies the data of the application layer buffer into the kernel sending buffer of the socket and sends It's TCP, and it doesn't really have much to do with send. The receive buffer is used by TCP to buffer data from the network, and it is kept until the application process reads it away. For TCP, if the application process has not read, after the receiving buffer is full, the action that occurs is: the receiving end informs the sending end, and the receiving window is closed (win=0). This is the realization of the sliding window. It is ensured that the receiving buffer of the TCP socket will not overflow, thus ensuring reliable transmission of TCP. Because the other party is not allowed to send data that exceeds the advertised window size. This is TCP's flow control. If the other party ignores the window size and sends data that exceeds the window size, the receiver TCP will discard it.
Check the socket sending buffer size of the test machine, cat/proc/sys/net/ipv4/tcp_wmem

 

 

The first value is a limit value, the minimum number of bytes in the socket sending buffer; the
second value is the default value; the
third value is a limit value, the maximum number of bytes in the socket sending buffer;
according to actual tests, The size of the send buffer is set to 16384 bytes by default, which is 16k.
On the test system, the default value of the send buffer is 16k.
The value under the proc file system and the value in sysctl are global values. The application can use setsockopt() in the program to individually modify the send buffer size of a socket as needed. For details, please refer to the article "Send in TCP And recv", but this is a digression.
\

2. Receive window (sliding window)

The initial receiving window size of the receiving end when the TCP connection is established is 14600, as shown in Figure 2 (129 is the receiving end and 130 is the sending end)

figure 2

The receiving window is a sliding window in TCP. The receiving end of TCP uses this receiving window----win=14600 to notify the sending end. My current receiving capacity is 14600 bytes.
In the subsequent sending process, the receiving end will continuously use ACK (for the full effect of ACK, please refer to the blog post "TCP's ACK Sending Scenario") to inform the sender of the size of its own receiving window, as shown in Figure 3. The amount of data sent by the sender is According to the size of the receiving window, the sender will not send more data than the receiver can receive. This plays a role of flow control.

image 3\

Figure 3 shows that the
two packets 21 and 22 are
the 21st packet of the ACK packet sent by the receiver to the sender . The receiver confirms the first 7240 bytes of data received. 7241 means that the expected packet starts from 7241. , The serial number is increased by 1. At the same time, the receiving window has been increased from the initial 14656 (as shown in Figure 2) to 29120 through the slow start phase. It is used to indicate that the receiving end can receive 29120 bytes of data, and the sender sees this window notification. When no new ACK is received, the sender can send 29120 bytes of data to the receiving end.
In the 22nd packet, the receiving end confirms the received first 8688 bytes of data, and announces that its receiving window continues to grow to 32,000.

3. The relationship between the load of a single TCP and MSS

The size of MSS on Ethernet is usually 1460 bytes, and the maximum data carrying capacity of a single TCP packet in the subsequent sending process is 1448 bytes. For the relationship between the two, please refer to the blog post "TCP 1460MSS and 1448 Load".
\

Detailed examples send()

Example function description: The receiving end 129 acts as a client to connect to the sending end 130. After connecting, it does not call recv() to receive, but sleep(1000) to pause the process and prevent the process from receiving data. The kernel will buffer the data to the receive buffer. After receiving the TCP request as the server, the sender immediately uses ret = send(sock,buf,70k,0); this C statement to send 70k data to the receiver.
Let's observe this process now. See what happened. The screenshot of the wireshark capture packet is as shown in Figure 4\

Figure 4


Figure 4 shows that the packet sequence number is equivalent to the sequence
1. The client sleeps before recv() in order to push the data into the receive buffer. The server calls "ret = send(sock,buf,70k,0);" this C statement to send 70k data to the receiver. Because the size of the send buffer is 16k, send() cannot copy all 70k data into the send buffer, so first copy 16k into the send buffer, there is data in the lower send buffer to send, and the kernel starts sending. The upper layer send() is in a blocking state at the application layer;
2. TCP packet No. 11, the sender sends 1448 bytes of data to the receiver from here;
3. TCP packet No. 12, the sender did not receive the 1448 data previously sent ACK packet, still continue to send 1448 bytes of data to the receiving end;
4. For TCP packet No. 13, the receiving end sends a 1448-byte acknowledgement packet to the sending end, indicating that the receiving end has successfully received a total of 1448 bytes. At this time, the receiving end does not call recv() to read, and 1448 bytes are currently pressed into the sending buffer. Due to the slow start state, the win receiving window continues to increase, indicating that the receiving capacity is increasing, and the throughput continues to rise;
5. For TCP packet No. 14, the receiving end sends a 2896-byte confirmation packet to the sending end, indicating that the receiving end has successfully received a total of 2896 Bytes. At this time, the receiving end does not call recv() to read, and currently 2896 bytes are pressed into the sending buffer. Due to the slow start state, the win receiving window continues to increase, indicating that the receiving capacity is increasing, and the throughput continues to rise;
6. For TCP packet No. 15, the sender continues to send 1448 bytes of data to the receiver;
7. TCP packet No. 16 , The receiving end sends a 4344-byte confirmation packet to the sending end, indicating that the receiving end has successfully received a total of 4344 bytes. At this time, the receiving end does not call recv() to read, and 4344 bytes are currently pushed into the sending buffer. Due to the slow start state, the win receiving window continues to increase, indicating that the receiving capacity is increasing and the throughput continues to rise;
8. From here on, I will omit a lot of packages, the process is similar to the above process. At the same time, because the continuously sent data is confirmed by the receiving end with ACK, the space of the send buffer is gradually vacated, and the data in the application layer buf is continuously copied to the send buffer inside send(), thereby continuously sending , The process is repeated. 70k data has not been completely sent to the kernel, send() does not matter whether it is sent or not, send does not matter whether the sent is confirmed, send() only cares whether the data in buf has been sent to the send buffer. If the data in buf is not all sent to the send buffer, send() is blocked at the application layer, and is responsible for gradually copying the data in buf when there is free space in the sending buffer; if all the data in buf is copied into Send the buffer, send() returns immediately.
9. After the slow start phase, the receiving window increases to the stable phase, the TCP throughput rises to the stable phase, the receiving end has been in sleep state, and recv() is not called to copy the data in the receiving buffer in the kernel to the application layer. At this time, a large amount of data is pressed into the receiving buffer of the receiving end;
10. No. 66 and No. 67 TCP data packets, the sender continues to send data to the receiving end;
11. No. 68 TCP data packet, the receiving end sends an ACK packet to confirm the received Data, ACK=62265 indicates that the receiving end has received 62265 bytes of data, and these data are currently being pressed in the receiving buffer of the receiving end. win=3456, compare the win=23296 of the previous TCP packet No.16, which shows that the window at the receiving end has been in a contracted state, and the data in the receiving buffer at the receiving end has not been read by the application layer for a long time, resulting in a tight receive buffer space. Shrink the window, control the sending volume of the sender, and perform flow control;
12. For TCP data packets No. 69 and 70, the sender continues to send 2 segments of 1448-byte data to the receiver within the range of the data volume allowed by the receiving window;
13. TCP data packet No. 71. So far, the receiving end has successfully received 65160 bytes of data, all of which are pressed in the receiving buffer, and the receiving window continues to shrink, with a size of 1600 bytes;
14. TCP data packet No. 72, the sender continues to send data with a length of 1448 bytes to the receiving end within the data volume allowed by the receiving window;
15. TCP data packet No. 73, so far, the receiving end has successfully received 66609 bytes All the data is compressed in the receiving buffer, and the receiving window continues to shrink, with a size of 192 bytes.
16. TCP data packet No. 74 has nothing to do with our example, it is a packet sent by another application;
17. TCP data packet No. 75, the sender sends 192 bytes to the receiver within the data volume allowed by the receiving window Length of data;
18. TCP data packet No. 76, so far, the receiving end has successfully received 66609 bytes of data, all of which are pressed in the receiving buffer, win=0, the receiving window is closed, the receiving buffer is full, and it can no longer be received Any data;
19. No. 77, No. 78, No. 79 TCP data packets, data packets triggered by keepalive, the response ACK holds the status of the receiving window win=0, in addition, ACK=66801 indicates the backlog in the receiving buffer of the receiving end 66800 bytes of data.
20. From the above process, we should be familiar with the problems explained by the sliding window notification field win, as well as the ACK confirmation data and so on. A conclusion can now be drawn that the size of the receive buffer at the receiving end should be 66800 bytes (this conclusion is not the subject of this article).
The data to be sent by send() is 70k. Now 66800 bytes have been sent, and there are still 16k in the sending buffer. The remaining amount of data to be copied into the kernel in the application layer is N=70k-66800-16k. The receiving end is still in the sleep state and cannot recv() data, which will cause the receiving buffer to be in a state of being full of backlogs, and the window will always announce 0 (win=0). The sender is completely unable to send data in this state, and the remaining data of send() cannot continue to be copied into the kernel's send buffer, which eventually causes send() to be blocked at the application layer;
21. send() is always blocked. . .

The relationship between Figure 4 and send() is complete.
When will send return? There are 3 return scenarios
\

send() returns the scene

Scenario 1, we continue the example shown in Figure 4, but here we jump out of the process shown in Figure 4

22. The receiving end sleep (1000) time is up, the process is awakened, the code snippet is shown in Figure 5

Figure 5
As the process continues to use "recv (fd, buf, 2048, 0);" to copy the data from the kernel receiving buffer to the application layer buf, after using win=0 to close the receiving window, now the receiving buffer The ability to buffer is gradually restored. Under this condition, the receiver will actively send an ACK packet carrying "win=n (n>0)" to notify the sender that the receiving window has been opened;

23. After receiving the ACK packet carrying "win=n(n>0)", the sender starts to send data within the data volume range of the window operation. The data in the sending buffer is sent;
24. The receiving end continues to receive data and confirms the data with ACK;
25. After receiving the ACK, the sender can clear some of the sending buffer space, and the remaining data in the application layer send() can be continuously used Copy into the kernel's send buffer;
26. Repeat the above sending process continuously;
27. All 70k data of send() enter the kernel, and send() returns successfully.

Scenario 2, we continue the example shown in Figure 4, but from the beginning we jumped out of the process shown in Figure 4.
22. If there is a problem with the receiving process or socket, send a RST to the sender, please refer to the blog post "";
23. The kernel receives RST, send returns -1.

Scenario 3, it has nothing
to do with the above example . After connecting, send (1k) immediately. In this way, the sent data can be copied into the sending buffer at one time, and send() will return immediately after copying the data.
\

send() send conclusion

In fact, scenario 1 and scenario 2 illustrate a problem
send() is only responsible for copying, and it will return immediately after copying, and will not wait for sending and ACK after sending. If there is a problem with the socket, the RST packet is fed back. When the RST packet returns, if send() has not put all the data into the kernel or sent out, then send() returns -1, and errno is set to an error value; if the RST packet returns, send() has already returned, Then the error caused by RST will be returned immediately the next time send() or recv() is called.
Scenario 3 fully shows that send() will return successfully as long as the copy is completed. If various errors occur during the process of sending data, it will be returned immediately when the next send() or recv() is called.
\

Conceptually confusing

1. The TCP protocol itself is to ensure reliable transmission, which does not mean that the application program using tcp to send data must be reliable and must be fault-tolerant;
2. There is no fixed correspondence between send() and recv(), and an indefinite number of send( ) Can trigger an indefinite number of recv(), this is not professional, but it must be said, beginners are easy to wonder;
3. The key point, send() is only responsible for copying, copying to the kernel and returning, I am talking about copying throughout After returning, many articles say that send() returns after successfully sending data. Successful sending means that the sent thing has been confirmed by ACK. send() only copies and does not wait for ACK;
4. The program error triggered by this send() call may return this time, or it may be returned the next time the network IO function is called./

 

In fact, if you understand blocking, you can understand non-blocking.

\