Architecture - Web Developer Boot Camp

Basic Architecture

The underlying components of the web are:

  • TCP/IP which provides the network which connects computer programs on the Internet;

  • DNS which provides the ability to locate the network address of a computer from the computer's name;

  • HTTP which is the language used by web browsers to request resources from web servers and web servers to respond to the web browser's requests;

  • URLs which identify the location of web pages and other resources to be retrieved from web servers.

TCP/IP

TCP/IP provides the mechanism whereby programs on any two computers can communicate on the Internet. The important concepts for us are the IP address of a computer and the ports used by programs.

Computers on the Internet each have a unique address called the computer's IP address which is used to locate the computer. Since there may be more than one connection between two computers, another form of address called a port is used to distinguish different connections to the same computer.

If you consider the analogy where a Computer is like a Company and a Program is like an Employee of the Company, then the IP Address of a Computer is like the Company's Telephone number and the port is like the Employee's Telephone Extension.

Example IP Address
192.168.0.1.
Example IP Address with Port
192:168.0.1:80

Many programs use a default port if none is specified. For example, Web browsers and servers use port 80 if it is not specified in the request.

DNS

DNS is a service which runs on special computers on the Internet which translates Computer Names into IP Addresses. DNS makes it possible for you to request a web page from www.mozilla.org rather from 207.126.111.202.

HTTP

HTTP is a protocol used to communicate requests and responses between client computers (those running web browsers and the like) and server computers (those running web servers). Each request specifies a method and URL which identifies a specific action the client requests the server perform on the resource identified by the URL.

The most common HTTP methods are GET which asks the server to return the resource in its response and POST which sends information to the server to be processed by the specified resource. When you visit a web page in your web browser your web browser probably used GET to retrieve it. Submitting a form on a web page your web browser used either the GET or POST method depending on how the author coded the form.

URL

An absolute URL specifies a resource using the name or address of the machine where the resource is located and the full directory path on the machine where the resource is located. It has the form:

http://hostname-or-ip-address:port/path?querystring

http://

http is the protocol scheme which tells the client that it should use the HTTP protocol when sending the request. http> is always followed by ://.

hostname-or-ip-address

hostname-or-ip-address can either be the name of the machine (as registered with the DNS service) where the resource is located or the actual IP address of the machine. It the host name is used, it is not case-sensitive. For example, WWW.EXAMPLE.COM and www.example.com are equivalent host names.

:port

:port specifies the port on the server which should handle your request. port is a number between 1 and 65535. If the :port is missing from the URL, it is assumed to be :80.

/path

/path specifies the location of the resource in the machine's web server directories.

/path is case-sensitive since the web server may be hosted on a UNIX machine which has case-sensitive paths and file names. The easiest way to make sure you do not have case-sensitivity issues in your URLs is to always write them in lower-case.

If /path contains references to sub-directories, for example /2004/09/25/, the slashes must be forward slashes (/). Backward slashes (\) may work in browsers such as Internet Explorer but will not work in Mozilla.

?querystring

?querystring is typically issued used when the request is the result of submitting a form using the GET method. It contains a sequence of name=value pairs separated by a special delimiter characters. Many authors and web browsers use the ampersand (&) to create query strings. Note that raw ampersands are illegal in XML and XHTML documents and should be coded as &.

An example query string might look like ?firstname=jane&species=cat. The same query string should be coded as ?firstname=jane&species=cat if it is to be used in an XML or XHTML document.

Example HTTP session

The best way to see the components of the web work together is to look at the actual network traffic between a web browser and a web server. The following is an ethereal log showing the network traffic from using Mozilla to request the web page http://bclary.com/. Text in this font are my comments about the traffic.

No.     Time        Source                Destination           Protocol Info
      5 1.331408    192.168.1.100         net0116012.direcpc.com DNS      Standard query A bclary.com
      8 2.351255    net0116012.direcpc.com 192.168.1.100         DNS      Standard query response A 66.33.193.205
      9 2.351655    192.168.1.100         bclary.com            TCP      1497 > http [SYN] Seq=0 Ack=0 Win=16384 Len=0 MSS=1460
     10 2.354552    bclary.com            192.168.1.100         TCP      http > 1497 [SYN, ACK] Seq=0 Ack=1 Win=4096 Len=0 MSS=1420
     11 2.354562    192.168.1.100         bclary.com            TCP      1497 > http [ACK] Seq=1 Ack=1 Win=17040 Len=0

Before the page could even be requested, the network address of bclary.com had to be looked up and a TCP connection established. Note that on this satellite-based internet connection, the time required was on the order of one second. You can increase the page load performance of your content by minimizing the number of DNS look ups (the number of different machine names) required to load your content..

     12 2.354632    192.168.1.100         bclary.com            HTTP     GET / HTTP/1.1

    GET / HTTP/1.1\r\n
    Host: bclary.com\r\n
    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n
    Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n
    Accept-Language: en-us,en;q=0.5\r\n
    Accept-Encoding: gzip,deflate\r\n
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
    Keep-Alive: 300\r\n
    Connection: keep-alive\r\n
    \r\n

Mozilla requested the URL http://bclary.com/. Note how Mozilla identifies itself, and begins negotiating with the web server by sending preferences for Content Types, Language, Encoding, Character Sets, and TCP connection. Note also that Mozilla did not have a copy of the document already in its cache when this request was made.

     13 2.358182    bclary.com            192.168.1.100         TCP      http > 1497 [ACK] Seq=1 Ack=385 Win=4096 Len=0
     16 4.010677    bclary.com            192.168.1.100         HTTP     HTTP/1.1 200 OK (application/xhtml+xml)

    HTTP/1.1 200 OK\r\n
    Date: Sun, 26 Sep 2004 03:44:17 GMT\r\n
    Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n
    Last-Modified: Tue, 21 Sep 2004 06:41:26 GMT\r\n
    ETag: "4f2e5d-29dc-414fcd16"\r\n
    Accept-Ranges: bytes\r\n
    Keep-Alive: timeout=15, max=100\r\n
    Connection: Keep-Alive\r\n
    Content-Type: application/xhtml+xml\r\n
    Content-Encoding: gzip\r\n
    Content-Length: 3456\r\n
    \r\n

     17 4.011942    bclary.com            192.168.1.100         HTTP     Continuation
     18 4.011966    192.168.1.100         bclary.com            TCP      1497 > http [ACK] Seq=385 Ack=2841 Win=17040 Len=0
     19 4.036299    192.168.1.100         bclary.com            TCP      1498 > http [SYN] Seq=0 Ack=0 Win=16384 Len=0 MSS=1460
     20 4.039000    bclary.com            192.168.1.100         TCP      http > 1498 [SYN, ACK] Seq=0 Ack=1 Win=4096 Len=0 MSS=1420
     21 4.039023    192.168.1.100         bclary.com            TCP      1498 > http [ACK] Seq=1 Ack=1 Win=17040 Len=0

bclary.com responded with the HTTP headers and the contents of the document. Note how the web server gave the current date, the last time the document was modified, established a TCP connection which will be kept alive, and responded with a Content Type for XHTML and a gzipped compressed encoded version of the page. Note that since the retrieved document is compressed, Mozilla must wait for it to be fully downloaded before parsing the contents and issuing any additional requests for content which is referenced by the document.

     22 4.039634    192.168.1.100         bclary.com            HTTP     GET /2004/07/21/style HTTP/1.1

    GET /2004/07/21/style HTTP/1.1\r\n
    Host: bclary.com\r\n
    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n
    Accept: text/css,*/*;q=0.1\r\n
    Accept-Language: en-us,en;q=0.5\r\n
    Accept-Encoding: gzip,deflate\r\n
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
    Keep-Alive: 300\r\n
    Connection: keep-alive\r\n
    Referer: http://bclary.com/\r\n
    \r\n

     23 4.042358    bclary.com            192.168.1.100         TCP      http > 1498 [ACK] Seq=1 Ack=349 Win=4096 Len=0
     24 4.077165    bclary.com            192.168.1.100         HTTP     Continuation
     25 4.213129    192.168.1.100         bclary.com            TCP      1497 > http [ACK] Seq=385 Ack=3850 Win=16031 Len=0
     28 6.213411    bclary.com            192.168.1.100         HTTP     HTTP/1.1 200 OK (text/css)

    HTTP/1.1 200 OK\r\n
    Date: Sun, 26 Sep 2004 03:44:20 GMT\r\n
    Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n
    Content-Location: style.css\r\n
    Vary: negotiate\r\n
    TCN: choice\r\n
    Last-Modified: Wed, 21 Jul 2004 14:01:17 GMT\r\n
    ETag: "be598-3d7-40fe772d;41427405"\r\n
    Accept-Ranges: bytes\r\n
    Content-Length: 983\r\n
    Keep-Alive: timeout=15, max=100\r\n
    Connection: Keep-Alive\r\n
    Content-Type: text/css\r\n
    \r\n

Once the original document was downloaded and parsed, Mozilla was able to request the CSS style sheet which was referenced in the XHTML of the page. Mozilla had to wait for the full document to download before making the request because the document was compressed by the web server. This saved the bandwidth required to download the file, but forced Mozilla to wait to request the style sheet rather than parsing the document as it downloaded and immediately requesting the style sheet when it was referenced. Note that it took Mozilla almost 5 seconds to download the page and the referenced CSS file.

     29 6.227674    192.168.1.100         bclary.com            HTTP     GET /favicon.png HTTP/1.1

    GET /favicon.png HTTP/1.1\r\n
    Host: bclary.com\r\n
    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n
    Accept: image/png,*/*;q=0.5\r\n
    Accept-Language: en-us,en;q=0.5\r\n
    Accept-Encoding: gzip,deflate\r\n
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
    Keep-Alive: 300\r\n
    Connection: keep-alive\r\n
    \r\n

     30 6.230413    bclary.com            192.168.1.100         TCP      http > 1497 [ACK] Seq=3850 Ack=700 Win=4096 Len=0
     31 6.319948    192.168.1.100         bclary.com            TCP      1498 > http [ACK] Seq=349 Ack=1405 Win=15636 Len=0
     33 8.199361    bclary.com            192.168.1.100         HTTP     HTTP/1.1 200 OK (image/png)

    HTTP/1.1 200 OK\r\n
    Date: Sun, 26 Sep 2004 03:44:22 GMT\r\n
    Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n
    Last-Modified: Wed, 21 Jul 2004 23:05:50 GMT\r\n
    ETag: "4e5287-10bc-40fef6ce"\r\n
    Accept-Ranges: bytes\r\n
    Content-Length: 4284\r\n
    Keep-Alive: timeout=15, max=99\r\n
    Connection: Keep-Alive\r\n
    Content-Type: image/png\r\n
    \r\n

     34 8.200092    bclary.com            192.168.1.100         HTTP     Continuation
     35 8.200117    192.168.1.100         bclary.com            TCP      1497 > http [ACK] Seq=700 Ack=6690 Win=17040 Len=0
     36 8.266306    bclary.com            192.168.1.100         HTTP     Continuation
     37 8.266708    bclary.com            192.168.1.100         HTTP     Continuation
     38 8.266721    192.168.1.100         bclary.com            TCP      1497 > http [ACK] Seq=700 Ack=8490 Win=17040 Len=0

Once the original document and all of the referenced files such as the CSS style sheets were downloaded, Mozilla requested the favicon image which was referenced in the XHTML of the page. Note that it took Mozilla an additional second to request and download the favicon image.

Web browsers cache files by temporarily keeping copies of downloaded files. If the file has not changed when a web page is revisited, the browser can use the cached copy thus saving the time required to request and download a new copy as well as saving bandwidth for the web server. A detailed discussion of caching is beyond this document, but for more information I highly recommend Mark Nottingham excellent Caching Tutorial for Web Authors and Webmasters.

To illustrate how caching works, I requested the page again. This time, Mozilla already had copies of the document, the CSS style sheet and the favicon image.

     40 14.362606 192.168.1.100 bclary.com HTTP GET / HTTP/1.1

    GET / HTTP/1.1\r\n
    Host: bclary.com\r\n
    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n
    Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n
    Accept-Language: en-us,en;q=0.5\r\n
    Accept-Encoding: gzip,deflate\r\n
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
    Keep-Alive: 300\r\n
    Connection: keep-alive\r\n
    If-Modified-Since: Tue, 21 Sep 2004 06:41:26 GMT\r\n
    If-None-Match: "4f2e5d-29dc-414fcd16"\r\n
    Cache-Control: max-age=0\r\n
    \r\n

Note the If-Modified-Since and If-None-Match headers. Mozilla is telling the web server that it already has a local copy of the document in its cache and is asking if the document has been modified since Mozilla obtained its copy.


     41 14.365718   bclary.com            192.168.1.100         TCP      http > 1498 [ACK] Seq=1405 Ack=848 Win=4096 Len=0
     42 17.149742   bclary.com            192.168.1.100         HTTP     HTTP/1.1 304 Not Modified

    HTTP/1.1 304 Not Modified\r\n
    Date: Sun, 26 Sep 2004 03:44:31 GMT\r\n
    Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n
    Connection: Keep-Alive\r\n
    Keep-Alive: timeout=15, max=99\r\n
    ETag: "4f2e5d-29dc-414fcd16"\r\n
    \r\n

The web server responded that Mozilla did not need to download the document again since the document had not changed since it was first downloaded.

     43 17.173554   192.168.1.100         bclary.com            HTTP     GET /2004/07/21/style HTTP/1.1

    GET /2004/07/21/style HTTP/1.1\r\n
    Host: bclary.com\r\n
    User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n
    Accept: text/css,*/*;q=0.1\r\n
    Accept-Language: en-us,en;q=0.5\r\n
    Accept-Encoding: gzip,deflate\r\n
    Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
    Keep-Alive: 300\r\n
    Connection: keep-alive\r\n
    Referer: http://bclary.com/\r\n
    If-Modified-Since: Wed, 21 Jul 2004 14:01:17 GMT\r\n
    If-None-Match: "be598-3d7-40fe772d;41427405"\r\n
    Cache-Control: max-age=0\r\n
    \r\n

     44 17.176624   bclary.com            192.168.1.100         TCP      http > 1497 [ACK] Seq=8490 Ack=1170 Win=4096 Len=0
     45 17.254806   192.168.1.100         bclary.com            TCP      1498 > http [ACK] Seq=848 Ack=1656 Win=17040 Len=0
     46 19.321286   bclary.com            192.168.1.100         HTTP     HTTP/1.1 304 Not Modified

    HTTP/1.1 304 Not Modified\r\n
    Date: Sun, 26 Sep 2004 03:44:33 GMT\r\n
    Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n
    Connection: Keep-Alive\r\n
    Keep-Alive: timeout=15, max=98\r\n
    ETag: "be598-3d7-40fe772d;41427405"\r\n
    Content-Location: style.css\r\n
    Vary: negotiate\r\n
    \r\n

     47 19.461781   192.168.1.100         bclary.com            TCP      1497 > http [ACK] Seq=1170 Ack=8794 Win=16736 Len=0
	

Once Mozilla determined that it could use the cached copy of the document, it asked the web server to return the CSS style sheet if it had been modified. Again, the web server replied that the copy of the style sheet in Mozilla's cache was current and could be reused. It took Mozilla almost 5 seconds to determine that the copies of the document and associated files in its cache were still current. This long amount of time in this example is due to the extreme latency (round trip time for a request to travel from the client to the server and back) of a satellite internet connection. For example, my connection takes about 0.9 seconds to send and receive a single response from a server.

Note for content such as style sheets that do not change frequently, you can increase the performance of your pages by specifying an extremely long cache expires time so that browsers will not need to check if referenced files have changed. If you need to modify a file and wish to force browsers to reload it, then use a date-based path to the file as in this example or use a version string in the file name.

Gzipped encoding and other compression schemes can greatly reduce the bandwidth required to host a web site, and can improve the perceived performance of your web pages for your visitors using dialup internet connections by reducing the size of the files to be downloaded. This benefit does have drawbacks however. Gzipped content must be completely downloaded before it can be parsed and additional requests can be made for CSS, JavaScript and images files. Depending on the size of your documents and the types of connections your visitors use, Gzipped encoding could potentially result in a poorer perceived performance for your web site.

Try to limit the number of distinct files containing CSS style sheets or JavaScript. These types of file have a large effect on how a web browser makes requests for the contents of your web page and can adversely affect visitors using high-latency connections such as satellites.

Tip Summary
  • Use all lowercase URLs to ensure you do not have case-sensitivity problems.

  • Use forward slashes (/) in URLs.

  • Use escaped ampersands (&) rather than raw ampersands (&).

  • Minimize the number of references to different machine names to reduce the time required to perform DNS lookups when loading your content.

  • Make effective use of the browser's cache through the use of version specific URLs for content such as CSS style sheets which do not change frequently.

  • Judiciously use compressed encodings such as Gzip making sure that the size of your documents and the types of internet connections of your visitors results in an improved experience.

  • Limit the number of distinct files referenced in your content to reduce the time required to check the validity of the browser's cache. Reducing the number of files can measurably help your visitors with high-latency connections.

Resources
home | up | topabout: