[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

SV: flowchart info for ping



David Whelan [dwhelan@auracom.com] wrote:

> Does anyone know where  can find high-level "what happens when" 
> documentation for ping? I mean clear information about what exactly 
> happens from the moment you enter a ping command to the returned message.

I don't know any good high-level doc. Most books about TCP/IP will cover it, but you will have to read quite much before you have the picture.

I could try a brief description: 

You probably type something like   "ping www.google.com" in your command window.  

The first thing here is to translate the name "www.google.com" into a numerical internet (IP=internet protocol) address.  To this end, your computer (executing the ping program) emits a name lookup query to a name server. I shall skip the details of DNS (domain name service") for now. The name server replies supplying the IP address. In the particular case of google, since they have such a huge traffic, you wil get multiple addresses, and your computer will pick one of the addresses by random. This provides a crude load balancing, as each address corresponds to different computers at Google's centers.

You can see the responses using the command "host":

   ~$ host www.google.com
   www.google.com          CNAME   www.l.google.com
   www.l.google.com        A       216.239.59.103
   www.l.google.com        A       216.239.59.104
   www.l.google.com        A       216.239.59.147
   www.l.google.com        A       216.239.59.99

In my case my computer chose the last of the above addresses, "216.239.59.99".

It turns out that DNS can translate both ways, from domain names (like "www.google.com") to IP addresses,  and from IP to domain name. In principle there should have been a one-to-one correspondence, but with all the network administrators playing all sorts of tricks, there is not. The author of the ping program apparently decided that the users would be better served if ping checked the reverse translation:

   $ host 216.239.59.99
   Name: gv-in-f99.google.com
   Address: 216.239.59.99

So the ping program determined that it would be exchanging messages with a computer called gv-in-f99.google.com, at IP address 216.239.59.99.

Then the ping program sends a message to this IP address. 

At this point we have to think of the messages, not in terms of black characters on white background, but in terms of bits. Your computer emits electrical signals on the wire (or on the air) representing a sequence of zeros and ones, making up an IP datagram. The exact format of an IP datagram is described in a set of documents called 

   RFC 791  Postel, J. (ed.), "Internet Protocol - DARPA Internet Program Protocol Specification
   RFC 792  Postel, J "Internet Control Message Protocol"

Just google for e.g. "RFC792".  The letters RFC mean "Request For Comment", and  these documents were originally issued as proposals or draft standards. 

The datagram sent by your computer has in its first couple of bytes an "Ethernet header", or some similar header. The details of this header are designed to indicate four things: 

  1.  The identity of the receiving network card
  2.  The identity of the sending network card
  3.  The nature of the data that follows - an IP datagram (There are other types of datagrams)
  4.  The length of the data that follows.

These data are only relevant for the transmission to the next box, typically the router/modem that connects your computer to the net. The router must then discard the ethernet header, create a new ethernet (or some other type) header suitable for the next leg of the IP datagram's journey, and emit the message with this header.  It's a point that each leg of the journey can use a different technology, and so the nature of the per-leg header will be different. 

You may ask about how the computer knows what to put in the field that identifies the receiver's network card. For this, try to Google about "ARP Address Resolution Protocol". Keep in mind that your computer is configured to know the IP address of its "default gateway" to the net. Most likely this configuration happens when your computer has just booted, and negotiated its network configuration using the DHCP protocol. Google for DHCP.

The various boxes that relay the datagram util it reaches Google's computer are normally connected to a number of communication lines, and the boxes must decide what line to use for the forwarding. In order to make this decision, they look at the first 20 bytes of the IP datagram proper. They constitute an IP header, an contain the four bytes of the destination IP address. Notice that this header contains addresses concerning the end-to-end communication between your computer and Google's computer, while the headers I have written about above are about a single "hop".  

Routers have large tables specifying ranges of addresses that should be forwarded on each of the available communication lines. Routers also exchange connectivity data regularly, and recompute the tables from time to time, as they learn about new links becoming available or old links going down, or about links becoming more "expensive" (congested). 

Eventually there will be some kind of electrical activity tickling a network interface at Google's computer, making the nic (network interface card) collect the bits of your IP datagram. When the datagram is complete, the nic emits another electical signal, an interrupt, to the processor of that computer.  The computer's processor interrupts whatever it was doing, and services the interrupt, arranging the IP datagram to be transferred to the computer's memory. The instructions that govern the computer while it does this, belong to the operating system kernel. The kernel is what coordinates and schedules the various programs that are running on the computer at any time. The kernel keeps track of the various connections and "ports" that those programs are using. The kernel inspects the IP datagram to determine which of the running programs should be notified about the arrival. 

At this point, the kernel will look first at the "protocol" field of the IP header. This field will contain a number that says that the datagram is an ICMP datagram. ICMP datagrams are not usually passed on to any running program, but are instead handled by the kernel itself. They are used to notify about errors, computers or networks having become unavailable, etc. The kernel next looks at the rest of the datagram, past the IP header. This is the ICMP datagram proper. It too contains a nummeric field saying what particular kind of ICMP it is. It will be an Echo Request message. An Echo Request message will normally be sent straight back by the kernel, only slightly modified so it becomes an Echo Response message. The IP header is also rewritten, interchanging the IP addresses of the sender and receiver.

The point here is that, since the kernel does not have to forward the message to any particular running program, it can respond very fast and use few computer cycles to do so.

When the Echo Response datagram eventually reaches your computer, the kernel in your computer must notify the running "ping" program, which then prepares a line of output on the screen:

   $ ping www.google.com
   PING www.l.google.com (216.239.59.99) 56(84) bytes of data.
   64 bytes from gv-in-f99.google.com (216.239.59.99): icmp_seq=1 ttl=244 time=45.6 ms

Here we see that it is upon receipt of the response, that ping uses the reverse DNS lookup to translate the responding IP address (216.239.59.99) into a domain name (gv-in-f99.google.com)

The line of text on the screen contains some bits of information. Ping emits an Echo Request once per second. The requests have a field called ICMP_SEQ, where ping inserts the sequence number of the request: 1, 2, 3, ... Since this number is unchanged in the Echo Response, ping can know which of the emitted datagrams generated the response. This allows ping to compute how long time the roudtrip took, in my case 45.6 milliseconds.  The IP datagram has a hop-count field, that is normally initialized to 255 when it is issued. Each router that passes on this datagram reduces this number by one. If the routing tables were to become corrupted so that the datagram is sent in a circle, the hop count will eventually be reduced to zero. At that point, the router just drops the datagram. In this way the net is saved from eternal congestion by undeliverable datagrams. The hop downcount field is called Time-To-Live, ttl. We see that the echo response arrived with a ttl field of 244. That shows that Google's computer is 255-244=11 hops away from mine.

I hope this answers your question.
Regards.

Reply to: