外文翻譯--以太網(wǎng)和IEEE802_第1頁(yè)
外文翻譯--以太網(wǎng)和IEEE802_第2頁(yè)
外文翻譯--以太網(wǎng)和IEEE802_第3頁(yè)
外文翻譯--以太網(wǎng)和IEEE802_第4頁(yè)
外文翻譯--以太網(wǎng)和IEEE802_第5頁(yè)
已閱讀5頁(yè),還剩20頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

TCP/IP Illustrated Volume 1 The Protocols Chapter 1 .Introduction 1.3 TCP/IP Layering There are more protocols in the TCP/IP protcol suite. Figure 1.4 shows some of additional protocols that we talk about in this text. TCP and UDP are the two precominant transport ;ayer protocols. Both use OP the network layer. TCP provides areliable transport ;ayer,even though the service it uese(IP) is unrelitable. Chapters 17 througe 22 provike a detailed look at the operation of TCP. We then look at some TCP applications. Telnet and Tlogin in Chapter 26,FTP in Chapter 27,and SMTP in Chapter 28. The applications user processes. UDP sends and recieves datagrams for applications. A datagram is a unit of information(i.e., a ceertain number of bytes of information that is specified by the sender) that travels from the sender to the recerver. Unlike TCP, however,UDP is unreliable. There is no guarantee that the datafram ever gets to its final destination. Chapter 11 looks at UDP,and then Chapter 14(the Domain Name System),Chapter 15(the Trivial File Transfer Protocol),and Chapter 16(the Bootstrap Protocol)look at some applications that use UDP. SMNP(the Simp;e Nerwork Managemennt Protocol) also uses UDP, but since it deals with many of the other protocols,we save a discussion if it until Chapter 25. IP is the main protocol at the network layer. It is used by both TCP and UDP.Every piece of TCP and UDP data that gers transferred around an internert goes through the IP layer at both end systems and at every intermediate router.OnFigure 1.4 we also show an application accessing OP directly. This is rare,but possible.(Some older routing protocols were implementde thes way.Also, it is possible to wxperiment with new transport layer protocols using this feature.)Chapter 3 look at IP,but we save some of the details for later chapters where their discussion makes more sense.Chapters 9 and 10 look at how IP performw routing. ICMP is an adjunct to IP. It is used by the IP layer to cxchange error messages and other vital information with the IP layer in another host ir router.Chapter 6 looks at ICMP in mire detail.Although ICMP is used primarily by IP,it is possible for an application to also access it.Indeed well see that two popular diagnostic tools,ping and traceroute(Chapters 7 and 8),both use ICMP. IGMP is the Internet Group management Protocol.It is used with multicasting:sending aUDP datagram to multiple hosts. We describe the general propertise of broadcasting(sendihng aUDP datagramto every host on aspecified nerwork) and multicasting in Chapter 12,and then descrebe IGMP itself in Chapter 13. ARP(Address Resolution Protocol) and RARP (Reverse Address Resolution Protocol )are specialized protocols used only with certain types of nerwerk interfaces(such as Ethernet and token ring )to convert between the addresses used by the IP layer and the addresses used by the network interface.We examine these protocols in Chapters 45 and 5,respectively. 1.8 Client-Server Model Most netwirking applications are written assuming are side is the dirent and the client and the other the server. The purpose of the application is for the server to provide some defined service for clients. We can categorize servers into two classes:iterative or concurrent. An iterative server itsrates through the following steps. 1. Wait for a client requset to arrive 2. Process the client request. 3. Send the response back to the client that sent the request. 4. Go back to step. The problem with an iterative server is when step 2 takes a while.During this time no other clients are serviced. A concurrent server, on the other hand,performs the following steps. 1. Waet for a client request to arrive. 2. Start a new server to handle this clients rquest. This may involve creating new process,task,or thread,depending on what the underlying operating systim. This new server handles thiis clients entire requist.When complete,this new server timinates. 3. Go back to step C1. The advantage of aconcurrent server is that the server just spawns other servers to handle the client reqsets. Each client has,inessence, its own server.Assuming the operating systim allows multiprogramming,multiple clients are serviced concurrintly. The reason we categorize servers ,and not clients ,is because a client nomally cant tell whether its talking to an iterative server or a concurrent server. As a general rule,TCP servers are concurrent, and UDP servers are iterative,but there are a few wxceptions.Will look in detail at the impact of UDP in its servers Section 11.12, and the impact of TCP on its servers in Section 18.11. Chapter 2. Link Layer 2.1 Introduction From Figure 1.4 we see that the purpose of the link layer in the TCP/IP protocol suite is to send and receive IP datagrams for the IP module ARP requests and replies for the ARP module and RARP requests and replies for the RARP module TCP/IP supports many different link layers depending o the type of networking hardware being used: Ethernet token ring ,FDDI(Fiber Distributed Data Interface),RS-232 serial lines,and the like. In this chapter well lool at some of the details involved in the Ethernet link layer two specialized link layers for serial interfaces(SLIP and PPP),and the loopback driver thats part of most implementations Ethernet and SLIP are the link layers used for most of the examples in the book.Wi aloe talk about the MTU(Macimim Transmission Unit ),a characteristic of the link layer that wi encounter numerous times in the remaining chapters We also show some calculations lof how to choose the MTU for a serial line. 2.2 Ethernet and IEEE 802 Encapsulation The term Ethernet generally refers to a standard published in 1982 by Digital Equipment Corp., Intel Corp., and Xerox Corp. It is the predominant form of local area network techonology used with TCP/IP today. It uses an access method called CSMA/CD, which stands for Carrier Sense Multiple Access with Collision Detection. It operates at 10 Mbits/sed and uses 48-bit addresses. A few years later the IEEE 802 Committee published a sightly differebt set of standards.802.3 covers an entire set of CSMA/CD networks,802.4 covers token bus networks, and 802.5 covers token ring networks. Common to all three of these is the 802.2 standard that defines the logical link control common to many of the 802 networks ,Unfortunately the combination covers all the details of these IEEE802 standards. In the TCP/IP world ,the encapsulation of IP datagrams is defined in RFC 894 for Ethernets and in RFC requires that every Internet host connected to a 10Mbits/sec Ethernet cable: 1. Must be able to send and receive packets using RFC 8j94(Ethernet) encapsulation. 2. Should be able to receive RFC 1042 packets intermixed with RFC 894 packets. 3. May be able to send packets using RFC 1042 encapsulation. If the host can send both types of packets the packet sent must be configurable and the configuration option must default to RFC 894 packets. RFC 894 encapsulation is most commonly used. Figure 2.1 shows the two different forms of encapsulation. The number below each box in the figure is the size of that box in bytes. Both frame formats use 48-bit destination and source address .(802.3 allows 16-bit addresses to be used ,bit 48-bit addresses are nirmal.) These are what we call hardware addresses throughout the text. The ARP and RARP protocols(Chapters and 5)map between the 32-bit IP addresses and the 48-bit hardware addresses. The next 2 bytes are different in the two frame formats. The 802 length field says how many bytes follow,up to but not includeing the CRC at the end. The Ethernet type later in the SNAP haeder. Fortunately none of the nalid 802 length values is the same as the Tthenet type values ,making the two frame formats distinguishable. In the Ethernet frame the data immediately follows the type field while in the frame format 3 bytes of 802.2 LL Cand 5 bytes of 802.2 SNAP follow. The DSAP and SSAP are both set to the same 2-byte type field that we had with the Ethernet frame format. The CRC field is a cyclic redundancy check that detects errors in rest of the frame. There is a minimum size for 802.3 and Ethernet frames. Thos minimum requires that the data portion be at least 38 bytes for 802.3 or 46 bytes for Ethernet. To handle this ,pad bytes are inserted to assure that the frame is long enough. Well encounter this minimum when we start watching packets on the wire. In this text well display the Ethernet encapsulation when we need to ,because this is the most commonly used form of encapsulation. Chapter 3 IP: Internet Protocol 3.1 Inttoduction IP is the workhorse protocol of the TCP/IP protocol suite. All TCP, UDP, ICMP, and IGMP data gets transmitted as IP datagrams. Afact tht amazes many newcomers to TCP/IP, especially those from an X.25 or SNA background, is that IP provides an unreliable, connectionless datagram delivery service. By unreliable we mean there re no guarantees that an IP datagram successfully gets to its destination. IP provides a best effort service. When something goes wrong, such as a router temporarily running ort of buffers , IP has a simple error handling algorithm:throw away the datagram and try to send an ICMP message back to the source. Any required reliability must be provided by the upper layers. The term connectionless means that IP does net maintain any state information about successive datagrams. Each datagram is handled independently from all other datagrams. Those also means that IP datagrams can get delivered out of order. If a sou sends two consecutive datagrams to the same destination,each is routed independly and can take different routes ,with B arriving before A. In this chapter we take a brief look at the fields in the IP header,describe IP routing, and cover subnetting . we also liik at two useful commands: ifconfig and netstat. We leave adetailed discussion of some of the fields in the IP header for later when wi can see exactly how the fields are used. Chapter 18 . TCP Connection Establishment and Termination 18.3 Timeout of Connectio Establishment There are several instances when the connection cannot be established. In one example the server host is down To simulate this scenario we issue our telnet command after disconnecting the Ethernet cable from the servers host. Figure 18.6 shows the tcpdump output. The interesting point on this output os how frequently the clients TCP sends a SYN to try to establish the connection. The second segment is sent 5.8 seconds after the first, and the third is sent 24 seconds after the second. The time difference is 76 seconds.Most Berkeley-derived systems set a time limit of 75 seconds on the establishment of a new connection. Well see in Section 21.4 that the third packet sent by the client would have timed out around 16:25:29,48 seconds after it wes sent ,had the client not fiven up after 75 seconds. First Timeout Period One puzzling itwm in Figure 18.6 is that the first timeout period,5.8 seconds ,is close to 6 seconds,but not exact,while the second period is almost exactly 24 seconds. Ten more of these tests were run and the first timeout period took on various values between 5.59 seconds and 5.93 seconds. The second timeout period,however,was always 24.00. Whats happening here is that BSD implementations of TCP run a timer that goes off every 500 ms. This 500-ms timer is used for various TCP timeouts, all of which we cover in later chapters. When we type in the telnet command, an initial 6-second timer is established ,but it may expire any where betwiin 5.5 and 6 seconds in the future. Figure 18.7 shows whats happening. Although the timer is initialized to 12 ticks ,the first decrement of the timer can occur between 0 and 500 ms after it is set. From that point on the timer is decremented about every 500 ms, but the first period can e variable. When that 6-second timer expires at the tick labeled 0 in Figure 18.7, the timer is reset for 24 seconds in the future. This next timer will be close to 24 seconds, since it was set at a time when the TCPs 500-ms timer handler was called by the kernel. Type-of-Service Field In Figure 18.6, the notation appears. This is the type0of-service field in the IP datagram. The BSD/386 Telnet client sets the field for minimum delay. 18.4 Maximum Segment Size The macimum segment size is the largest “chunk” of data that TCP will send to the other end.When a connection is established ,each end can announce its MSS. The values weve seen have all been 1024. The resulting IP datagram is normally 40 bytes large:20 bytes for the TCP header and 20 bytes for the IP header. Some texts refer to this as a “negotiated” option. It is not negotiated in any way. When a connection is established,each end has the option of announcing the MSS it expects to receive. If one end does not receive an MSS option from the other end, adefault of 536 bytes is assumed. In general, the larger the MSS the better, until fragmentation occurs. A large segment size allows more data to be sent in each segment, amortizing the cost of the IP and TCP headers. When TCP sends a SYN segment, either because a local application wants to send an MSS value up to the outgoing interfaces MTU, minus the size of the fixed TCP and IP headers. For an Ethernet thos implies an MSS of up to 1460 bytes. Using IEEE802.3 encapsulation, the MSS could go up to 1452 bytes. The values of 1024 that weve seen in this chapter, for connections involving BSD/386 and SVR4, are because many BSD implementations require the MSS to be a announce an MSS of 1460 when both ends are on a local Ethernet. Measurements in show how an MSS of 1460 provides better performance on an Ethernet than an MSS of 1024. If the destination IP address is “nonlocal,” the MSS normally defaults to 536. Ehile its easy to say that a destination whose IP address has the same network ID and the ent network ID from ours is nonlocal, a destination with the same network ID but a different subnet ID could be erther local or nonlocal. Most implementations provide a configuration iption that lets the system adminestrator mines whether the announced MSS is as large as possible or the default of 536. The MSS lets a host limet the size of datagrams that the other end sends it . When combined with the fact that a host can also limet the size of the datagrams that or sends, this lets a host avoid ftagmentation whaen the host os connected to a network with a small MTU. Consider our host slip ,which has a SLIP link with an MTU of 296 to the router bsdi. Figure 18.8 shows these systems and the host sun. The important fact here is that sun cannot send a segment with more than 256 bytes of data, since it received an MSS option of 256. Furthermore ,since slip knows that the outgoing interfaces MTU os 296 bytes of data, to avoid fragmentation. Its OK for a system to send less than the MSS announced by the other end. This avoidance of fragmentation works only if either host is directly connected to a network with an MTU of less than 5756. If both hosts are connected to Ethernets, and both announce an MSS of 536, but an intermediate network has an MTU of 296, fragmentation will occur. The only way around this is to use the path MTU discovery mechanism. 18.11 TCP Server Design We said in Section 1.8 that most TCP servers are concurrent. When a new connection request arrives at a server, the server accepts the connection and invokes a new process to handle the new client. Depending on the iperating system, various techniques are used to in voke the new server. Under Unixx the connom technique is to create a new process using the fork function. Lightweight processes can also be used ,if supported. What were interested in is the interaction of TCP with concurrent server. We need to answer the following questions:how are the port numbers handled when aserver accits anew connection request from a client ,and what happens if multiple connection requests arrive at about the same time? 18.11.1 TCP Server Port Numbers We can see how TCP handles the port numbers by watching any TCP server. Well watch the Telnet server using the netstat command. The following output is on a system with no active Telnet xonnections. The a flag reports on all network end points, not just those that are ESTABLISHED. The n flag prints IP addresses as dotted-decimal numbers,instead of trying to use the instead of service names. The f inet option reports only TCP and UDP end points. The local address is output as*.23. where the asterisk is normally called the wildcard character. This means that an incoming connection request will be accepted on any local intercface4. If the host were multihomed, we could specify asingle IP address for the local IP address ,and only connections received on that interface would be accepted. The local port is 23, the well-kown port number for Telnet. The foreign address is outputs as *.*, which means the foreign IP address and forrign port number are not known yet ,because the end point I in the LISTEN state ,waeting for aconnection toarrive. We now start a Telnet client on the host slip that connects to this server. Here are the relevant lines from the netstat output: The first line for port 23 is the ESTABLISHED connection. All four elements of the local and foreign address are filled in for this connection: the local IP address and port number,and the foreign IP address and port number. The local IP address corresponds to the interface on which the connection request arrived. The end point in the LISTEN state is left alone. This is the end point that the concurrent server uses to accept future connection requests. It is the TCP module in the kernel that creates the new end point in the ESTABLISHED state, when the incoming connection requewt arrines and is accepted. Also notice that the port number for the ESTABLISHED connection doesnt change: its 23, the same as the LISTEN end point. We now initiate another Telnet client from the same client to this server. Here is the relevant netstat output: We now have two ESTABLISHED connections from the same host to the same serer. Both have a local port number of 23. This is not a problem for TCP sincethe foreign port numbers are different. They must be different beacarse each of the Telnet clients rses an wpheneral port, and the definition of an ephemeral port os ene that os net currently in use on that host. This example reiterates that RCP demultiplexs incoming segments using all four values that comprise the local and foreign addresses:destination IP address, destination port number, source IP address, and source portnumber. TCP cannot detemine which the only one of the three end points at port 23 that will receive incoming connection requests is the one in the LISTEN state. The end points in the ESTABLISHED state can not receive SYN segments ,and the end point in the LISTEN state cannot receive datasegments. Next we initate a third Telnet client, from the host solaris that is across the SLIP link from sun, and not on its Ethernet. The local IP address of the first ESTABLISHED connedtion now corresponds to the interface address of SLIP link on the multihomed host sun. Chapter 21 . TCP Timeout and Retransmission 21.1 Introduction TCP provides a reliable transport layer. one of the ways it provides reliability is for each end to acknowledgments can get lost. TCP handles these by setting a timeout when to sends data, and of the data isnt acknowledge when the timeout expires, to retransmits the data. A critical element of any complementation is the timeout and retransmission strategy. How is the timeout interval determined, and how frequently does a retransmission occur? Weve already seen two examples of timeout and retransmission(1)In the ICMP port unreachable example in Section 6.5 we saw the TFTP client using UDP employing a simple timeout and retransmission strategy: it assumed 5 seconds was an adequate timeout period and retransmitted every 5 seconds.(2) In the ARP example to anonexistent host , wi saw that when TCP tried to establish the connection it retransmitted its SYN using a longer delay between each retransmisson. TCP manages four different timers for each connection. 1. A retransmisson timer is used when expectiong an acknowledgment from the other end. Thos chapter looks at this timer in detail, along with related issues such as congestion avoedance. 2. A persist timer keeps window size information flowing even if the other end closes its receive window. Chapter 22 describes this timer. 3. A keepalive timer detects when the other end on an otherwise idle connection crashes or reboots. Chapter 23 describes this timer. 4. A 2MSL timer measures the time a connection has been in the TOME_WAIT state. We described these state in Section 18.6. In this chapter wi start with a simple example of TCPs tomeout and retransmission and then move to a larger example that lets us liik at all the details involved in TCPs timer management. We liik at how typical implementations measure the round-trip time of TCP segments and how TCP uses these measurements to estimate the retransmission timeout of the next segment it transmits. We then look at TCPs congestion avoidance-what TCP does when packets are lost-and follow through an actual example where packets are lost. We also look at the newer fast retransmit and fast recovery algorithms, and see how they let TCP detect lost packets faster than waiting for a timer to expire. 21.2 Simple Timeout and Retransmission Example Lets first look at the retransmission strategy used by TCP. Well establish a connection, send some data to verify that everything is OK, disconnect the cable, send some more data, and watch what TCP does: Figure 21.1 shows the tcpdump output. Lines 1, 2, and 3 correspond to the normal TCP connection establishment. Line 4 is the transmission of “hello, world” and line 5 is its acknowledgment. We then disconnect the Ethernet cable from svr4. Line 6 shows “and hi” being sent .Lines 7-18 are 12 retransmissions of that segment, and line 19 is when the sending TCP finally gives up and sends a reset. Ecamine the time difference betwiin successive retransmissions: with rounding thru occur 1,3,6,12,24,48, and the 64 wsconds apart. Well see later in this chapter that the first timeout is actually set for 1.5 seconds after the first transmission After this the timeout value is doubled for each retransmission, with an upper limit of 64 seconds. This doubling is called an exponetial backoff. Compare thos to the TFTP example in Section 6.5,where every retransmission occurred 5 seconds after the previous. The time difference betwoon the first ransmission of the packet is about 9 minutes. Modern TCPs are persistent when trying to send data! 21.3 Round-Trip Time Measurement Fundamental to TCPs timeout and retransmission is the measurement of the round-trip time experienced on a given connection. We expect this can change over time, as routes might change and as network traffic changes, and TCP should track these changes and modify its timeout accordingly.; First TCP must measure the RTT between sendong abute with aparticular sequence number and receiving an acknowledgment that covers that sequence number. Recall from the prevous chapter that normally there is not a one-to-one correspondence between data segments and ACKs. In Figure 20.1 this means that one RTT that can be meacured by the sender is the time betwiin the transmission of segment 4 and the reception of segment 7, even though this ACK is for an additional 1024 bytes. Well use M to denote the measured RTT. The original TCP specification had TCP update a smoothed RTT estimator using the low-pass filter R R+(1- )M Where is a smoothing factor with a recommended value of 0.9. Thos smoothed RTT os updated every time a new measurement is made. Ninety percint of each new estimate is from the previous estimate and 10% is from the new measurement. Given this smoothed estimater, which changes as the RTT changes, RFC 793 recommended the retransmission timeout value (RTO) be set to RTO=R Where is a delay variance factor with a recommended value of 2. Jacobson 1988 details the problems with this approach, basically that it cant keep up with wide fluctuations in the RTT, causing unnecessary retransmissions. As Jacobson notes, unnecessary retransmissions add to the network load, when the network is already liaded. It is the network equivalent of pouring gasoline on a fire. What.s needed is to keep rack of the variance in the RTT meacurements, in addition to the smoothed RTT estimator. Calculating the RTO based on both the ean and variance provides much better response to wide fluctuations in the round-trip times, than just calculating the RTO as a calculations we show below, which take into account the variance of the round-trip times. As described by Jacobsonm, the mean deviation is a good approximation to the standard deviation, but easier to compute. This leads to the following equations that are applied to each RTT measurement M. Err = M A A A + gErr D D + d(|Err|-D) RTO = A + 4D Where A is the smoothed RTT and D is the smoothed mean deviation. Rtt is the difference betwiin the measured value just obtained and the current RTT estimator. Both A and D are used o calculate the next retransmission timeout. The gain g is for the average and os set to 1/8. The gain for the deviation is and is set to 0.25. The larger gain for the deviation makes the RTO go up faster when the RTT changes. TCP/IP Illustrated Volume 3 TCP for Transaction, HTTP, NNTP, and the UNIX Domain Protocols Chapter 1. T/TCP Introduction 1.2 UDP Client-Server We begin with asimple UDP client=-server example, shoeing the client source code in Figure 1.1. The client sends a request to the server,the server processes the reques and sends back a reply. Create a UDP socket The socket function creates a UDP socket, returning a ninnegative descriptor to the process. The error-handling function err-sys kiss shown in Appendix B.2of Steven 1992.It accepts any number of arguments,formats them using vsprintf,prints the Unix error message corresponding to the errno value from the system call, and then terminates the process. Fill in servers address An Internet socket address structrure is first zeroed out using memset and then filled with the IP address and port number of the server. For simplicity we require the user to enter the IP address as a dotted-decimal number on the command line when the program is rn (argv1).We #define the servers port number(UDP_SERV_PORT)in the cliserv.h header, which is included at the beginning of all the programs in this chapter. This is done for simplicity and to avoid complicating the code with calls to gethostbyname and gerservbyname. Form request and send it to server The client forms arequest (which we show only as a comment)and sends it to the server using sendto. This causes asingle UDP datagram to e sent to the server.Once again, or simplicity, we assume afixed-sized request(REQUEST)and a fixed-sized reply(REPLAY).A real application would allocate room for its maximum-sized request and rely, but the actual request and reply would vary and would normally be smaller. Read and process reply from server The call to recvfrom blicks the process(i.e., puts it to sleep) until a datagram arrives for the client. The client then processes the reply(which we show as acomment)and terminates. Create UDP socket and bind local address The call to socket creates a UDP socket, and an Internet socket address structure is filled in with the servers local address . The local IP address is set to the wildcard interface(in case the servers host is multihomed,that is, has mire than one network interface). The pert number is set to the servers well-known port(UDP_SERV_PORT)which we said earlier is defined in the cliserv. header. This local IP address and well-known port are bound to the socket by bind. Process client requests The server then enters an infinite loop, waiting for aclient request to arrive(recvfrom),processing that request(which we show only as acomment),and sending back a reply(sendto). 1.3 TCP Client-Server Our next example of aclient-erver transaction appkication uses TCP. Figure 1.5 shows the client program. Create TCP socket and connect to server A TCP socket is created but socket and then an Internet socket address structure is filled in with the IP address and port number of the server. The call to connect causes TCPs tree-way handshake to occur, establishing a connection between the client and werver. Chapter 18 of Volume 1 provides additional details in the packet exchanges when TCP connections are established and terminated. Send request and half-close the connection The clients request is sent to the server by write. The client then clises one-half of the connection, the direction of data flow from the client to the server, by calling shutdown with a second argument of 1. This tells the server that the client is done sending data: it passes an end-of=file notification from the client to the server .A TCP segment containing the FIN flag is sent to the server . The client can still read from the connectiong-only one direction of data flow is closed. This is called TCPs half-close. Section 18.5 of Volume 1 provides additional details. Read reply The reply is read by our function read_ stream,shown in Figure 1.6.Since TCP is a byte-stream protocol, without any from of record markers, the reply from the servers TCP can be returned in one or mire TCP segments, This can be returned to the client process in one or mire reads. Furthermire we know that when the server has sent the complete reply, the server process clises the connection, causing its TCP to send a FIN segment to the client, which is retruned to the client process by read returning an dedn-of-file (areturn value of 0).To handle these detail, the function read_ stream calls read as many times as necessary, until either the input buffer is full, or an end-of-file is returned by read. The return value of the function is the number of bytes read. Create listening TCP socket A TCP socket is created and the servers ell-known port is bouned to the socket. As with the UDP server, the TCP server binds the wildcard as its local IP address. The call to listen makes the socket a listening socket on which incoming connections will be accepted ,and the second argument of SOMAXCONN sepcifies the maximum number if pending connections the kernel will queue for the socket. Accept a connection and process request The server blocks in the call to accept until aconnes action is established by the clients connect. The new socket descriptor returned by accept ,sockfd, refers to the connection to the client. The clients request is read by read_ stream and the reply is returned by write. TCPs TIME_WAIT State TCP requires that the endpoint that sends the first FIN, which in our example is the client, must remain in the TIME_WAIT state for twice the maximun segmeng lifetine onece he connection is completely closed by both ends. The recommended value for the MSL is 120 seconds ,implying a TIME_ WAIT delay if 4 minutes. While the connection is in the TIME_WAIT state ,that same connection cannot be opened again. Reducing the Number if Segments with TCP TCP can reduce the number of segments in the transaction shown in Figure 1.8 by combining data with the control segments ,as we shoe in Figure1.9. Notice that the fist segment now contains the SYN, data, and FIN, not just the SYN as we saw in Figure 1.8.Similarly the servers reply is combined with the servers FIN. Although this sequence of packets is legal under the rules of TCP, the author is not aware of a method for an application to cause TCP to generate these sequende of segments using the sockets API(hence the question mark that generates the first segment from the clien, and the question mark that generates the final segment from the serner)and knows of no implenmentations that actually generate this sequence of segments. 卷一:協(xié)議 第 1 章 概述 1.3 TCP/IP 的分層 在 T C P / I P協(xié)議族中,有很多種協(xié)議。圖 1 - 4給出了本書將要討論的其他協(xié)議。 T C P和 U D P是兩種最為著名的運(yùn)輸層協(xié)議,二者都使用 I P作為網(wǎng)絡(luò)層協(xié)議。 雖然 T C P使用不可靠的 I P服務(wù),但它卻提供一種可靠的運(yùn)輸層服務(wù)。本書第 1 7 2 2章將詳細(xì)討論 T C P的內(nèi)部操作細(xì)節(jié)。然后,我們將介紹一些 T C P的應(yīng)用,如第 2 6章中的 Te l n e t和 R l o g i n、第 2 7章中的 F T P以 及第 2 8章中的 S M T P等。這些應(yīng)用通常都是用戶進(jìn)程。 U D P為應(yīng)用程序發(fā)送和接收數(shù)據(jù)報(bào)。一個(gè)數(shù)據(jù)報(bào)是指從發(fā)送方傳輸?shù)浇邮辗降囊粋€(gè)信息單元(例如,發(fā)送方指定的一定字節(jié)數(shù)的信息)。但是與 T C P不同的是, U D P是不可靠的,它不能保證數(shù)據(jù)報(bào)能安全無(wú)誤地到達(dá)最終目的。本書第 11章將討論 U D P,然后在第 1 4章( D N S :域名系統(tǒng)),第 1 5章( T F T P:簡(jiǎn)單文件傳送協(xié)議),以及第 1 6章( BO OT P:引導(dǎo)程序協(xié)議)介紹使用 U D P的應(yīng)用程序。 S N M P也使用了 U D P協(xié) 議,但是由于它還要處理許多其他的協(xié)議,因此本書把它留到第 2 5章再進(jìn)行討論。 I P是網(wǎng)絡(luò)層上的主要協(xié)議,同時(shí)被 T C P和 U D P使用。 T C P和 U D P的每組數(shù)據(jù)都通過(guò)端系統(tǒng)和每個(gè)中間路由器中的 I P層在互聯(lián)網(wǎng)中進(jìn)行傳輸。在圖 1 - 4中,我們給出了一個(gè)直接訪問(wèn) I P的應(yīng)用程序。這是很少見的,但也是可能的(一些較老的選路協(xié)議就是以這種方式來(lái)實(shí)現(xiàn)的。當(dāng)然新的運(yùn)輸層協(xié)議也有可能使用這種方式)。第 3章主要討論 I P協(xié)議,但是為了使內(nèi)容更加有針對(duì) 性,一些細(xì)節(jié)將留在后面的章節(jié)中進(jìn)行討論。第 9章和第 1 0章 討論 I P如何進(jìn)行選路。 I C M P是 I P協(xié)議的附屬協(xié)議。 I P層用它來(lái)與其他主機(jī)或路由器交換錯(cuò)誤報(bào)文和其他重要信息。第 6章對(duì) I C M P的有關(guān)細(xì)節(jié)進(jìn)行討論。盡管 I C M P主要被 I P使用,但應(yīng)用程序也有可能訪問(wèn)它。我們將分析兩個(gè)流行的診斷工具, P i n g和 Tr a c e r o u t e(第 7章和第 8章),它們都使用了 I C M P。 I G M P是 I n t e r n e t組管理協(xié)議。它用來(lái)把一個(gè) U D P數(shù)據(jù)報(bào)多播到多個(gè)主機(jī)。我們?cè)诘?1 2章中描述廣播(把一個(gè) U D P數(shù)據(jù)報(bào)發(fā) 送到某個(gè)指定網(wǎng)絡(luò)上的所有主機(jī))和多播的一般特性,然后在第 1 3章中對(duì) I G M P協(xié)議本身進(jìn)行描述。 A R P(地址解析協(xié)議)和 R A R P(逆地址解析協(xié)議)是某些網(wǎng)絡(luò)接口(如以太網(wǎng)和令牌環(huán)網(wǎng))使用的特殊協(xié)議,用來(lái)轉(zhuǎn)換 I P層和網(wǎng)絡(luò)接口層使用的地址。我們分別在第 4章和第 5章對(duì)這兩種協(xié)議進(jìn)行分析和介紹。 1.8 客戶 -服務(wù)器模型 大部分網(wǎng)絡(luò)應(yīng)用程序在編寫時(shí)都假設(shè)一端是客戶,另一端是服務(wù)器,其目的是為了讓服務(wù)器為客戶提供一些特定的服務(wù)。 可以將這種服務(wù)分為兩種類型:重復(fù)型或并發(fā)型。重復(fù)型服務(wù)器通過(guò)以下步驟 進(jìn)行交互: I1. 等待一個(gè)客戶請(qǐng)求的到來(lái)。 I2. 處理客戶請(qǐng)求。 I3. 發(fā)送響應(yīng)給發(fā)送請(qǐng)求的客戶。 I4. 返回 I 1步。 重復(fù)型服務(wù)器主要的問(wèn)題發(fā)生在 I 2狀態(tài)。在這個(gè)時(shí)候,它不能為其他客戶機(jī)提供服務(wù)。 相應(yīng)地,并發(fā)型服務(wù)器采用以下步驟: C1. 等待一個(gè)客戶請(qǐng)求的到來(lái)。 C2. 啟動(dòng)一個(gè)新的服務(wù)器來(lái)處理這個(gè)客戶的請(qǐng)求。在這期間可能生成一個(gè)新的進(jìn)程、任務(wù)或線程,并依賴底層操作系統(tǒng)的支持。這個(gè)步驟如何進(jìn)行取決于操作系統(tǒng)。生成的新服務(wù)器對(duì)客戶的全部請(qǐng)求進(jìn)行處理。處理結(jié)束后,終止這個(gè)新服務(wù)器。 C3. 返回 C 1步。 并發(fā)服務(wù)器的優(yōu)點(diǎn)在于它是利用生成其他服務(wù)器的方法來(lái)處理客戶的請(qǐng)求。也就是說(shuō),每個(gè)客戶都有它自己對(duì)應(yīng)的服務(wù)器。如果操作系統(tǒng)允許多任務(wù),那么就可以同時(shí)為多個(gè)客戶服務(wù)。 對(duì)服務(wù)器,而不是對(duì)客戶進(jìn)行分類的原因是因?yàn)閷?duì)于一個(gè)客戶來(lái)說(shuō),它通常并不能夠辨別自己是與一個(gè)重復(fù)型服務(wù)器或并發(fā)型服務(wù)器進(jìn)行對(duì)話。 一般來(lái)說(shuō), T C P服務(wù)器是并發(fā)的,而 U D P服務(wù)器是重復(fù)的,但也存在一些例外。我們將在 11 . 1 2節(jié)對(duì) U D P對(duì)其服務(wù)器產(chǎn)生的影響進(jìn)行詳細(xì)討論,并在 1 8 . 11節(jié)對(duì) T C P對(duì)其服務(wù)器的影 響進(jìn)行討論。 第 2 章 鏈路層 2.1 引言 從圖 1 - 4中可以看出,在 T C P / I P協(xié)議族中,鏈路層主要有三個(gè)目的:( 1)為 I P模塊發(fā)送和接收 I P數(shù)據(jù)報(bào);( 2)為 A R P模塊發(fā)送 A R P請(qǐng)求和接收 A R P應(yīng)答;( 3)為 R A R P發(fā)送 R A R P請(qǐng)求和接收 R A R P應(yīng)答。 T C P / I P支持多種不同的鏈路層協(xié)議,這取決于網(wǎng)絡(luò)所使用的硬件,如以太網(wǎng)、令牌環(huán)網(wǎng)、 F D D I(光纖分布式數(shù)據(jù)接口)及 R S-2 3 2串行線路等。在本章中,我們將詳細(xì)討論以太網(wǎng)鏈路層協(xié)議,兩個(gè)串 行接口鏈路層協(xié)議( S L I P和 P P P),以及大多數(shù)實(shí)現(xiàn)都包含的環(huán)回( l o o p b a c k)驅(qū)動(dòng)程序。以太網(wǎng)和 S L I P是本書中大多數(shù)例子使用的鏈路層。對(duì) M T U(最大傳輸單元)進(jìn)行了介紹,這個(gè)概念在本書的后面章節(jié)中將多次遇到。我們還討論了如何為串行線路選擇 M T U。 2.2 以太網(wǎng)和 IEEE 802 封裝 以太網(wǎng)這個(gè)術(shù)語(yǔ)一般是指數(shù)字設(shè)備公司( Digital Equipment Corp.)、英特爾公司( I n t e lC o r p .)和 X e r o x公司在 1 9 8 2年聯(lián)合公布的一個(gè)標(biāo)準(zhǔn)。它是當(dāng)今 T C P / I P采用的主要的局域網(wǎng)技術(shù)。它采用一種稱作 C S M A / C D的媒體接入方法,其意思是帶沖突檢測(cè)的載波偵聽多路接入( Carrier Sense, Multiple Access with Collision Detection)。它的速率為 10 Mb/s,地址為 48 bit。 幾年后, I E E E(電子電氣工程師協(xié)會(huì)) 8 0 2委員會(huì)公布了一個(gè)稍有不同的標(biāo)準(zhǔn)集,其中 8 0 2 . 3針對(duì)整個(gè) C S M A / C D網(wǎng)絡(luò), 8 0 2 . 4針對(duì) 令牌總線網(wǎng)絡(luò), 8 0 2 . 5針對(duì)令牌環(huán)網(wǎng)絡(luò)。這三者的共同特性由 8 0 2 . 2標(biāo)準(zhǔn)來(lái)定義,那就是 8 0 2網(wǎng)絡(luò)共有的邏輯鏈路控制( L L C)。不幸的是, 8 0 2 . 2和 8 0 2 . 3定義了一個(gè)與以太網(wǎng)不同的幀格式。文獻(xiàn) Stallings 1987對(duì)所有的 IEEE 802標(biāo)準(zhǔn)進(jìn)行了詳細(xì)的介紹。 在 T C P / I P世界中,以太網(wǎng) I P數(shù)據(jù)報(bào)的封裝是在 RFC 894Hornig 1984中定義的, IEEE 802網(wǎng)絡(luò)的 I P數(shù)據(jù)報(bào)封裝是在 RFC 1042Postel and Reynolds 1988中定義的。主機(jī)需求 R F C要求每臺(tái) I n t e r n e t主機(jī)都與一個(gè) 10 Mb/s的以太網(wǎng)電纜相連接: 1) 必須能發(fā)送和接收采用 RFC 894(以太網(wǎng))封裝格式的分組。 2) 應(yīng)該能接收與 RFC 894混合的 RFC 1042( IEEE 802)封裝格式的分組。 3) 也許能夠發(fā)送采用 RFC 1042格式封裝的分組。 如果主機(jī)能同時(shí)發(fā)送兩種類型的分組數(shù)據(jù),那么發(fā)送的分組必須是可以設(shè)置的,而且默認(rèn)條件下必須是 RFC 894分組。最常使用的封裝格式是 RFC 894定義的格式。 圖 2 - 1顯示了兩種不同形式的封裝格式。圖中每個(gè)方框下面的數(shù)字是它們的字節(jié)長(zhǎng)度。兩種幀格式都采用 48 bit( 6字節(jié))的目的地址和源地址( 8 0 2 . 3允許使用 16 bit的地址,但一般是 48 bit地址)。這就是我們?cè)诒緯兴Q的硬件地址。 A R P和 R A R P協(xié)議(第 4章和第 5章)對(duì) 32 bit的 I P地址和 48 bit的硬件地址進(jìn)行映射。接下來(lái)的 2個(gè)字節(jié)在兩種幀格式中互不相同。在 8 0 2標(biāo)準(zhǔn)定義的幀格式中,長(zhǎng)度字段是指它后續(xù)數(shù)據(jù)的字節(jié)長(zhǎng)度,但不包括 C R C檢驗(yàn)碼。以太網(wǎng)的類型字段定義了 后續(xù)數(shù)據(jù)的類型。在 8 0 2標(biāo)準(zhǔn)定義的幀格式中,類型字段則由后續(xù)的子網(wǎng)接入?yún)f(xié)議( Sub-network AccessP r o t o c o l, S N A P)的首部給出。幸運(yùn)的是, 8 0 2定義的有效長(zhǎng)度值與以太網(wǎng)的有效類型值無(wú)一相同,這樣,就可以對(duì)兩種幀格式進(jìn)行區(qū)分。 在以太網(wǎng)幀格式中,類型字段之后就是數(shù)據(jù);而在 8 0 2幀格式中,跟隨在后面的是 3字節(jié)的 802.2 LLC和 5字節(jié)的 802.2 SNAP。目的服務(wù)訪問(wèn)點(diǎn)( Destination Service Access Point,D S A P)和源服務(wù)訪問(wèn)點(diǎn)( Source Service Access Point, SSAP)的值都設(shè)為 0 x a a。 Ct r l字段的值設(shè)為 3。隨后的 3個(gè)字節(jié) o rg code都置為 0。再接下來(lái)的 2個(gè)字節(jié)類型字段和以太網(wǎng)幀格式一樣(其他類型字段值可以參見 RFC 1340 Reynolds and Postel 1992)。 C R C字段用于幀內(nèi)后續(xù)字節(jié)差錯(cuò)的循環(huán)冗余碼檢驗(yàn)(檢驗(yàn)和)(它也被稱為 F C S或幀檢驗(yàn)序列)。 8 0 2 . 3標(biāo)準(zhǔn)定義的幀和以太網(wǎng)的幀都有最小長(zhǎng)度要求。 8 0 2 . 3規(guī)定數(shù) 據(jù)部分必須至少為 3 8字節(jié),而對(duì)于以太網(wǎng),則要求最少要有 4 6字節(jié)。為了保證這一點(diǎn),必須在不足的空間插入填充( p a d)字節(jié)。在開始觀察線路上的分組時(shí)將遇到這種最小長(zhǎng)度的情況。在本書中,我們?cè)谛枰臅r(shí)候?qū)⒔o出以太網(wǎng)的封裝格式,因?yàn)檫@是最為常見的封裝格式。 第 3 章 IP:網(wǎng)際協(xié)議 3.1 引言 I P是 T C P / I P協(xié)議族中最為核心的協(xié)議。所有的 T C P、 U D P、 I C M P及 I G M P數(shù)據(jù)都以 I P數(shù)據(jù)報(bào)格式傳輸(見圖 1 - 4)。許多剛開始接觸 T C P / I P的人對(duì) I P提供不可靠、無(wú)連接的數(shù)據(jù)報(bào)傳送服務(wù)感到很奇怪,特別是那些具有 X . 2 5或 S N A背景知識(shí)的人。 不可靠( u n r e l i a b l e)的意思是它不能保證 I P數(shù)據(jù)報(bào)能成功地到達(dá)目的地。 I P僅提供最好的傳輸服務(wù)。如果發(fā)生某種錯(cuò)誤時(shí),如某個(gè)路由器暫時(shí)用完了緩沖區(qū), I P有一個(gè)簡(jiǎn)單的錯(cuò)誤處理算法:丟棄該數(shù)據(jù)報(bào),然后發(fā)送I C M P消息報(bào)給信源端。任何要求的可靠性必須由上層來(lái)提供(如 T C P)。 無(wú)連接( c o n n e c t i o n l e s s)這個(gè)術(shù)語(yǔ)的意思是 I P并不維護(hù)任何關(guān)于后續(xù)數(shù)據(jù)報(bào)的狀態(tài)信息。每個(gè)數(shù)據(jù)報(bào)的處理是相互獨(dú)立的。這也說(shuō)明, I P數(shù)據(jù)報(bào)可以不按發(fā)送順序接收。如果一信源向相同的信宿發(fā)送兩個(gè)連續(xù)的數(shù)據(jù)報(bào)(先是 A,然后是 B),每個(gè)數(shù)據(jù)報(bào)都是獨(dú)立地進(jìn)行路由選擇,可能選擇不同的路線,因此 B可能在 A到達(dá)之前先到達(dá)。 在本章,我們將簡(jiǎn)要介紹 I P首部中的各個(gè)字段,討論 I P路由選擇和子網(wǎng)的有關(guān)內(nèi)容。還要介紹兩個(gè)有用的命令: i f c o n f i g和 n e t s t a t。關(guān)于 I P首部中一些字段的細(xì)節(jié),將留在以后使用這些字段的時(shí)候再進(jìn)行討論。 RFC 791Postel 1981a是 I P的正式規(guī)范文件。 第 18章 TCP 的連接和終止 18.3 連接建立的超時(shí) 有很多情況導(dǎo)致無(wú)法建立連接。一種情況是服務(wù)器主機(jī)沒有處于正常狀態(tài)。為了模擬這種情況,我們斷開服務(wù)器主機(jī)的電纜線,然后向它發(fā)出 telnet命令。圖 18-6顯示了 tcpdump的輸出。 在這個(gè)輸出中有趣的一點(diǎn)是客戶間隔多長(zhǎng)時(shí)間發(fā)送一個(gè) SYN,試圖建立連接。第2個(gè) SYN與第 1個(gè)的間隔是 5.8秒,而第 3個(gè)與第 2個(gè)的間隔是 24秒。作為一個(gè)附注,這個(gè)例子運(yùn)行 38分鐘后客戶重新啟動(dòng)。這對(duì)應(yīng)初始序號(hào)為 291 008 001 (約為 38 60 64000 2)。我們?cè)?jīng)介紹過(guò)使用典型的伯克利實(shí)現(xiàn)版的系統(tǒng)將初始序號(hào)初始化為 1,然后每隔 0.5秒就增加 64000。另外,因?yàn)檫@是系統(tǒng)啟動(dòng)后的第一個(gè) TCP連接,因此客戶的端口號(hào)是 1024。 圖 18-6中沒有顯示客戶端在放棄建立連接嘗試前進(jìn)行 SYN重傳的時(shí)間。為了了解它我們必須對(duì) telnet命令進(jìn)行計(jì)時(shí): 時(shí)間差值是 76秒。大多數(shù)伯克利系統(tǒng)將建立一個(gè)新連接的最長(zhǎng)時(shí)間限制為 75秒。我們將在 21.4節(jié)看到由客戶發(fā)出的第 3個(gè)分組大約在 16:25:29超時(shí), 客戶在它第 3個(gè)分組發(fā)出后 48秒而不是 75秒后放棄連接。 18.3.1 第一次超時(shí)時(shí)間在圖 18-6中一個(gè)令人困惑的問(wèn)題是第一次超時(shí)時(shí)間為 5.8秒,接近 6秒,但不準(zhǔn)確,相比之下第二個(gè)超時(shí)時(shí)間幾乎準(zhǔn)確地為 24秒。運(yùn)行十多次測(cè)試,發(fā)現(xiàn)第一次超時(shí)時(shí)間在 5.59秒 5.93秒之間變化。然而,第二次超時(shí)時(shí)間則總是 24.00秒(精確到小數(shù)點(diǎn)后面兩位)。 這是因?yàn)?BSD版的 TCP軟件采用一種 500 ms的定時(shí)器。這種 500 ms的定時(shí)器用于確定本章中所有的各種各樣的 TCP超時(shí)。當(dāng)我們鍵入 telnet命令,將建立一個(gè) 6秒的定時(shí)器( 12個(gè)時(shí)鐘滴答( tick),但它可能在之后的 5.5秒 6秒內(nèi)的任意時(shí)刻超時(shí)。圖 18-7顯示了這一發(fā)生過(guò)程。 盡管定時(shí)器初始化為 12個(gè)時(shí)鐘滴答,但定時(shí)計(jì)數(shù)器會(huì)在設(shè)置后的第一個(gè) 0500 ms中的任意時(shí)秒刻減 1。從那以后,定時(shí)計(jì)數(shù)器大約每隔 500 ms減 1,但在第 1個(gè) 500 ms內(nèi)是可變的(我們使用限定詞“大約”是因?yàn)樵?TCP每隔 500 ms獲得系統(tǒng)控制的瞬間,系統(tǒng)內(nèi)核可能會(huì)優(yōu)先處理其他中斷)。 當(dāng)?shù)未鹩?jì)數(shù)器為 0時(shí), 6秒的定時(shí)器便會(huì)超時(shí)(見圖 18-7),這個(gè)定時(shí)器會(huì)在以后的 24秒( 48個(gè)滴答)重新復(fù)位。之后的下一個(gè)定時(shí)器將更接近 24秒,因?yàn)楫?dāng)TCP的 500 ms定時(shí)器被內(nèi)核調(diào)用時(shí),它就會(huì)被修改一次。 在圖 18-6中,出現(xiàn)了符號(hào) tos 0x10 。這是 IP數(shù)據(jù)報(bào)內(nèi)的服務(wù)類型( TOS)字段(參見圖 3-2)。 BSD/386中的 Telnet客戶進(jìn)程將這個(gè)字段設(shè)置為最小時(shí)延。 18.4 最大報(bào)文段長(zhǎng)度最大報(bào)文段長(zhǎng)度( MSS)表示 TCP傳往另一端的最大塊數(shù)據(jù)的長(zhǎng)度。當(dāng)一個(gè)連接建立時(shí),連接的雙方都要通告各自的 MSS。我們已經(jīng)見過(guò) MSS都是 1024。這導(dǎo)致 IP數(shù)據(jù)報(bào)通常是 40字節(jié) 長(zhǎng): 20字節(jié)的 TCP首部和 20字節(jié)的 IP首部。在有些書中,將它看作可“協(xié)商”選項(xiàng)。它并不是任何條件下都可協(xié)商。當(dāng)建立一個(gè)連 接時(shí),每一方都有用于通告它期望接收的 MSS選項(xiàng)( MSS選項(xiàng)只能出現(xiàn)在 SYN報(bào)文段中)。如果一方不接收來(lái)自另一方的 MSS值,則 MSS就定為默認(rèn)值 536字節(jié)(這個(gè)默認(rèn)值允許 20字節(jié)的 IP首部和 20字節(jié)的 TCP首部以適合 576字節(jié) IP數(shù)據(jù)報(bào) ) 。 一般說(shuō)來(lái),如果沒有分段發(fā)生, MSS還是越大越好(這也并不總是正確,參見圖 24-3和圖 24-4中的例子)。報(bào)文段越大允許每個(gè)報(bào)文段傳送的數(shù)據(jù)就 越多,相對(duì) IP和 TCP首部有更高的網(wǎng)絡(luò)利用率。當(dāng) TCP發(fā)送一個(gè) SYN時(shí),或者是因?yàn)橐粋€(gè)本地應(yīng)用進(jìn)程想發(fā)起一個(gè)連接,或者是因?yàn)榱硪欢说闹鳈C(jī)收到了一個(gè)連接請(qǐng)求,它能將 MSS值設(shè)置為外出接口上的 MTU長(zhǎng)度減去固定的 IP首部和 TCP首部長(zhǎng)度。對(duì)于一個(gè)以太網(wǎng), MSS值可達(dá) 1460字節(jié)。使用 IEEE 802.3的封裝(參見 2.2節(jié)),它的 MSS可達(dá) 1452字節(jié)。 在本章見到的涉及 BSD/386和 SVR4的 MSS為 1024,這是因?yàn)樵S多 BSD的實(shí)現(xiàn)版本需要 MSS為 512的倍數(shù)。其他的系統(tǒng),如 SunOS 4.1.3、 Solaris 2.2 和 AIX 3.2.2,當(dāng)雙方都在一個(gè)本地以太網(wǎng)上時(shí)都規(guī)定 MSS為 1460。 Mogul 1993 的比較顯示了在以太網(wǎng)上 1460的 MSS在性能上比 1024的 MSS更好。如果目的 IP地址為“非本地的 (nonlocal)”, MSS通常的默認(rèn)值為 536。而區(qū)分地址是本地還是非本地是簡(jiǎn)單的,如果目的 IP地址的網(wǎng)絡(luò)號(hào)與子網(wǎng)號(hào)都和我們的相同,則是本地的;如果目的 IP地址的網(wǎng)絡(luò)號(hào)與我們的完全不同,則是非本地的;如果目的 IP地址的網(wǎng)絡(luò)號(hào)與我們的相同而子網(wǎng)號(hào)與我們的不同,則可能是本地的 ,也可能是非本地的。大多數(shù) TCP實(shí)現(xiàn)版都提供了一個(gè)配置選項(xiàng)(附錄 E和圖 E-1),讓系統(tǒng)管理員說(shuō)明不同的子網(wǎng)是屬于本地還是非本地。這個(gè)選項(xiàng)的設(shè)置將確定 MSS可以選擇盡可能的大(達(dá)到外出接口的 MTU長(zhǎng)度)或是默認(rèn)值 536。 MSS讓主機(jī)限制另一端發(fā)送數(shù)據(jù)報(bào)的長(zhǎng)度。加上主機(jī)也能控制它發(fā)送數(shù)據(jù)報(bào)的長(zhǎng)度,這將使以較小 MTU連接到一個(gè)網(wǎng)絡(luò)上的主機(jī)避免分段??紤]我們的主機(jī)slip,通過(guò) MTU為 296的 SLIP鏈路連接到路由器 bsdi上。圖 18-8顯示這些系統(tǒng)和主機(jī) sun。 從 sun向 slip發(fā)起一個(gè) TCP連接, 并使用 tcpdump來(lái)觀察報(bào)文段。圖 18-9顯示這個(gè)連接 的建立(省略了通告窗口大?。?在這個(gè)例子中, sun發(fā)送的報(bào)文段不能超過(guò) 256字節(jié)的數(shù)據(jù),因?yàn)樗盏降?MSS選項(xiàng)值為 256(第 2行)。此外,由于 slip知道它外出接口的 MTU長(zhǎng)度為 296,即使 sun已經(jīng)通告它的 MSS為 1460,但為避免將數(shù)據(jù)分段,它不會(huì)發(fā)送超過(guò) 256字節(jié)數(shù)據(jù)的報(bào)文段。系統(tǒng)允許發(fā)送的數(shù)據(jù)長(zhǎng)度小于另一端的 MSS值。 只有當(dāng)一端的主機(jī)以小于 576字節(jié)的 MTU直接連接到一個(gè)網(wǎng)絡(luò)中,避免這種分段才會(huì)有效。如果兩端的主機(jī)都連接到以 太網(wǎng)上,都采用 536的 MSS,但中間網(wǎng)絡(luò)采用 296的 MTU,也將會(huì)出現(xiàn)分段。使用路徑上的 MTU發(fā)現(xiàn)機(jī)制(參見 24.2節(jié))是關(guān)于這個(gè)問(wèn)題的唯一方法。 18.11 TCP 服務(wù)器的設(shè)計(jì) 我們?cè)?1 . 8節(jié)說(shuō)過(guò)大多數(shù)的 T C P服務(wù)器進(jìn)程是并發(fā)的。當(dāng)一個(gè)新的連接請(qǐng)求到達(dá)服務(wù)器時(shí),服務(wù)器接受這個(gè)請(qǐng)求,并調(diào)用一個(gè)新進(jìn)程來(lái)處理這個(gè)新的客戶請(qǐng)求。不同的操作系統(tǒng)使用不同的技術(shù)來(lái)調(diào)用新的服務(wù)器進(jìn)程。在 U n i x系統(tǒng)下,常用的技術(shù)是使用 f o r k函數(shù)來(lái)創(chuàng)建新的進(jìn)程。如果系統(tǒng)支持,也可使用輕型進(jìn)程,即線程( t h r e a d)。 我們感興趣的是 T C P與若干并發(fā)服務(wù)器的交互作用。需要回答下面的問(wèn)題:當(dāng)一個(gè)服務(wù)器進(jìn)程接受一來(lái)自客戶進(jìn)程的服務(wù)請(qǐng)求時(shí)是如何處理端口的?如果多個(gè)連接請(qǐng)求幾乎同時(shí)到 達(dá)會(huì)發(fā)生什么情況? 18.11.1 TCP 服務(wù)器端口號(hào) 通過(guò)觀察任何一個(gè) T C P服務(wù)器,我們能了解 T C P如何處理端口號(hào)。我們使用 n e t s t a t命令來(lái)觀察 Te l n e t服務(wù)器。下面是在沒有 Te l n e t連接時(shí)的顯示(只留下顯示 Te l n e t服務(wù)器的行) sun % netstat -a -n -f inet Active Internet connections (including servers) Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp 0 0 *.23 *.* LISTEN a標(biāo)志將顯示網(wǎng)絡(luò)中的所有主機(jī)端,而不僅僅是處于 E S TA B L I S H E D的主機(jī)端。 - n標(biāo)志將以點(diǎn)分十進(jìn)制的 形式顯示 I P地址,而不是通過(guò) D N S將地址轉(zhuǎn)化為主機(jī)名,同時(shí)還要求顯示端口號(hào)(例如為 2 3)而不是服務(wù)名稱(如 Te l n e t)。 -f inet選項(xiàng)則僅要求顯示使用 T C P或 U D P的主機(jī)。顯示的本地地址為 * . 2 3,星號(hào)通常又稱為通配符。這表示傳入的連接請(qǐng)求(即 S Y N)將被任何一個(gè)本地接口所接收。如果該主機(jī)是多接口主機(jī),我們將制定其中的一個(gè) I P地址為本地 I P地址,并且只接收來(lái)自這個(gè)接口的連接(在本節(jié)后面我們將看到這樣的例子)。本地端口為 2 3,這是 Te l n e t的熟知端口號(hào)。 遠(yuǎn)端地址顯示為 * . *,表示還不知道遠(yuǎn)端 I P地址和端口號(hào),因?yàn)樵摱诉€處于 L I S T E N狀態(tài),正等待連接請(qǐng)求的到達(dá)。現(xiàn)在我們?cè)谥鳈C(jī) s l i p( 1 4 0 . 2 5 2 . 1 3 . 6 5)啟動(dòng)一個(gè) Te l n e t客戶程序來(lái)連接這個(gè) Te l n e t服務(wù)器。以下是 n e t s t a t程序的輸出行: Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp 0 0 3.23 5.1029 ESTABLISHED tcp 0 0 *.23 *.* LISTEN 端口號(hào)為 23的第 1行表示處于 E S TABLISHED 狀態(tài)的連接。另外還顯示了這個(gè)連接的本地 I P地址、本地端口號(hào)、遠(yuǎn)端 I P地址和遠(yuǎn)端端口號(hào)。本地 I P地址為該連接請(qǐng)求到達(dá)的接口(以太網(wǎng)接口, 1 4 0 . 2 5 2 . 1 3 . 3 3)。處于 L I S T E N狀態(tài)的服務(wù)器進(jìn)程仍然存在。這個(gè)服務(wù)器進(jìn)程是當(dāng)前 Te l n e t服務(wù)器用于接收其他的連接請(qǐng)求。當(dāng)傳入的連接請(qǐng)求到達(dá)并被接收時(shí),系統(tǒng)內(nèi)核中的T C P模塊就創(chuàng)建一個(gè)處于 E S TA B L I S H E D狀態(tài)的進(jìn)程。另外,注意處于 E S TA B L I S H E D狀態(tài)的連接的端口不會(huì)變化:也是 2 3,與處于 L I S T E N狀態(tài)的進(jìn)程相同。現(xiàn)在我們?cè)谥鳈C(jī) s l i p上啟動(dòng)另一個(gè) Te l n e t客戶進(jìn)程,并仍與這個(gè) Te l n e t服務(wù)器進(jìn)行連接。以下是 n e t s t a t程序 的輸出行: Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp 0 0 3.23 5.1030 ESTABLISHED tcp 0 0 3.23 5.1029 ESTABLISHED tcp 0 0 *.23 *.* LISTEN 現(xiàn)在我們有兩條從相同主機(jī)到相同服務(wù)器的處于 E S TA B L I S H E D 的連接。它們的本地端口號(hào)均為 2 3。由于它們的遠(yuǎn)端端口號(hào)不同,這不會(huì)造成沖突。因?yàn)槊總€(gè) Te l n e t 客戶進(jìn)程要使用一個(gè)外設(shè)端口,并且這個(gè)外設(shè)端口會(huì)選擇為主機(jī)( s l i p)當(dāng)前未曾使用的端口,因此它們的端口號(hào)肯定不同。這個(gè)例子再次重申 T C P使用由本地地址和遠(yuǎn)端地址組成的 4元組:目的 I P 地址、目的端口號(hào)、源 I P地址和源端口號(hào)來(lái)處理傳入的多個(gè)連接 請(qǐng)求。 T C P 僅通過(guò)目的端口號(hào)無(wú)法確定那個(gè)進(jìn)程接收了一個(gè)連接請(qǐng)求。另外,在三個(gè)使用端口 2 3的進(jìn)程中,只有處于 L I S T E N 的進(jìn)程能夠接收新的連接請(qǐng)求。處于 E S TA B L I S H E D的進(jìn)程將不能接收 S Y N報(bào)文段,而處于 L I S T E N的進(jìn)程將不能接收數(shù)據(jù)報(bào)文段。下面我們從主機(jī) s o l a r i s 上啟動(dòng)第 3 個(gè) Te l n e t 客戶進(jìn)程,這個(gè)主機(jī)通過(guò) S L I P鏈路與主機(jī) s u n 相連,而不是以太網(wǎng)接口。 第 21 章 TCP 的超時(shí)與重傳 21.1 引言 TCP提供可靠的運(yùn)輸層。它使用的方法之一就是確認(rèn)從另一端收到的數(shù)據(jù)。但數(shù)據(jù)和確認(rèn)都有可能會(huì)丟失。 TCP通過(guò)在發(fā)送時(shí)設(shè)置一個(gè)定時(shí)器來(lái)解決這種問(wèn)題。如果當(dāng)定時(shí)器溢出時(shí)還沒有收到確認(rèn),它就重傳該數(shù)據(jù)。對(duì)任何實(shí)現(xiàn)而言,關(guān)鍵之處就在于超時(shí)和重傳的策略,即怎樣決定超時(shí)間隔和如何確定重傳的頻率。 我們已經(jīng)看到過(guò)兩個(gè)超時(shí)和重傳的例子:( 1)在 6.5節(jié)的 ICMP端口不能到達(dá)的例子中,看到 TFTP客戶使用 UDP實(shí)現(xiàn)了一個(gè)簡(jiǎn)單的超時(shí)和重傳機(jī)制:假定 5秒是一個(gè)適當(dāng)?shù)臅r(shí)間間隔,并每隔 5秒進(jìn)行重傳;( 2)在向一個(gè)不存在的主機(jī)發(fā) 送 ARP的例子中(第 4.5節(jié)),我們看到當(dāng) TCP試圖建立連接的時(shí)候,在每個(gè)重傳之間使用一個(gè)較長(zhǎng)的時(shí)延來(lái)重傳 SYN。 對(duì)每個(gè)連接, TCP管理 4個(gè)不同的定時(shí)器。 1) 重傳定時(shí)器使用于當(dāng)希望收到另一端的確認(rèn)。在本章我們將詳細(xì)討論這個(gè)定時(shí)器以及一些相關(guān)的問(wèn)題,如擁塞避免。 2) 堅(jiān)持 (persist)定時(shí)器使窗口大小信息保持不斷流動(dòng),即使另一端關(guān)閉了其接收窗口。第 22章將討論這個(gè)問(wèn)題。 3) ?;?(keepalive)定時(shí)器可檢測(cè)到一個(gè)空閑連接的另一端何時(shí)崩潰或重啟。第 23章將描述這個(gè)定時(shí)器。 4) 2MSL定時(shí)器測(cè)量一個(gè)連接處于 TIME_WAIT狀態(tài)的時(shí)間。我們?cè)?18.6節(jié)對(duì)該狀態(tài)進(jìn)行了介紹。 本章以一個(gè)簡(jiǎn)單的 TCP超時(shí)和重傳的例子開始,然后轉(zhuǎn)向一個(gè)更復(fù)雜的例子。該例子可以使我們觀察到 TCP時(shí)鐘管理的所有細(xì)節(jié)??梢钥吹?TCP的典型實(shí)現(xiàn)是怎樣測(cè)量 TCP報(bào)文段的往返時(shí)間以及 TCP如何使用這些測(cè)量結(jié)果來(lái)為下一個(gè)將要傳輸?shù)膱?bào)文段建立重傳超時(shí)時(shí)間。接著我們將研究 TCP的擁塞避免 當(dāng)分組丟失時(shí) TCP所采取的動(dòng)作 并提供一個(gè)分組丟失的實(shí)際例子,我們還將介紹較新的快速重傳和快速恢復(fù)算法,并介紹該算法如何使 TCP檢測(cè)分組丟失比等待時(shí)鐘超時(shí)更快 21.2 超時(shí)與重傳的簡(jiǎn)單例子 首先觀察 TCP所使用的重傳機(jī)制,我們將建立一個(gè)連接,發(fā)送一些分組來(lái)證明一切正常,然后拔掉電纜,發(fā)送更多的數(shù)據(jù),再觀察 TCP的行為。 圖 21-1表示的是 tcpdump的輸出結(jié)果(已經(jīng)去掉了 bsdi設(shè)置的服務(wù)類型信息)。 圖 21-1 TCP超時(shí)和重傳的簡(jiǎn)單例子 第 1、 2和 3行表示正常的 TCP連接建立的過(guò)程,第 4行是“ hello, world”( 12個(gè)字符加上回車和換行)的傳輸過(guò)程,第 5行是其確認(rèn)。接著我們從 svr4拔掉了以太網(wǎng)電纜, 第 6行表示 and hi”將被發(fā)送。第 718行是這個(gè)報(bào)文段的 12次重傳過(guò)程,而第 19行則是發(fā)送方的 TCP最終放棄并發(fā)送一個(gè)復(fù)位信號(hào)的過(guò)程。 現(xiàn)在檢查連續(xù)重傳之間不同的時(shí)間差,它們?nèi)≌蠓謩e為 1、 3、 6、 12、 24、48和多個(gè) 64秒。在本章的后面,我們將看到當(dāng)?shù)谝淮伟l(fā)送后所設(shè)置的超時(shí)時(shí)間實(shí)際上為 1.5秒(它在首次發(fā)送后的 1.0136秒而不是精確的 1.5秒后,發(fā)生的原因我們已在圖 18-7中進(jìn)行了解釋),此后該時(shí)間在每次重傳時(shí)增加 1倍并直至 64秒。 這個(gè)倍乘關(guān)系被稱為“指數(shù)退避 (exponential backoff)”??梢詫⒃摾优c 6.5節(jié)中的 TFTP例子比較,在那里每次重傳總是在前一次的 5秒后發(fā)生。 首次分組傳輸(第 6行, 24.480秒)與復(fù)位信號(hào)傳輸(第 19行, 566.488秒)之間的時(shí)間差約為 9分鐘,該時(shí)間在目前的 TCP實(shí)現(xiàn)中是不可變的。 對(duì)于大多數(shù)實(shí)現(xiàn)而言,這個(gè)總時(shí)間是不可調(diào)整的。 Solaris 2.2允許管理者改變這個(gè)時(shí)間( E.4節(jié)中的 tcp_ip_abort_interval變量),且其默認(rèn)值為 2分鐘,而不是最常用的 9分鐘。 21.3 往返時(shí)間測(cè)量 TCP超時(shí)與重傳中最重要的部分就是 對(duì)一個(gè)給定連接的往返時(shí)間( RTT)的測(cè)量。由于路由器和網(wǎng)絡(luò)流量均會(huì)變化,因此我們認(rèn)為這個(gè)時(shí)間可能經(jīng)常會(huì)發(fā)生變化, TCP應(yīng)該跟蹤這些變化并相應(yīng)地改變其超時(shí)時(shí)間。 首先 TCP必須測(cè)量在發(fā)送一個(gè)帶有特別序號(hào)的字節(jié)和接收到包含該字節(jié)的確認(rèn)之間的 RTT。在上一章中,我們?cè)岬皆跀?shù)據(jù)報(bào)文段和 ACK之間通常并沒有一一對(duì)應(yīng)的關(guān)系。在圖 20.1中,這意味著發(fā)送方可以測(cè)量到的一個(gè) RTT,是在發(fā)送報(bào)文段 4(第 11024字節(jié))和接收?qǐng)?bào)文段 7(對(duì) 11024字節(jié)的 ACK)之間的時(shí)間,用 M表示所測(cè)量到的 RTT。 最初的 TCP規(guī)范使 TCP使用低通過(guò)濾器來(lái)更新一個(gè)被平滑的 RTT估計(jì)器(記為O)。 R R+(1- )M 這里的 是一個(gè)推薦值為 0.9的平滑因子。每次進(jìn)行新測(cè)量的時(shí)候,這個(gè)被平滑的 RTT將得到更新。每個(gè)新估計(jì)的 90來(lái)自

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論