Architecture - Web Developer Boot Camp
- Basic Architecture
-
The underlying components of the web are:
-
TCP/IP which provides the network which connects computer programs on the Internet;
-
DNS which provides the ability to locate the network address of a computer from the computer's name;
-
HTTP which is the language used by web browsers to request resources from web servers and web servers to respond to the web browser's requests;
-
URLs which identify the location of web pages and other resources to be retrieved from web servers.
- TCP/IP
-
TCP/IP provides the mechanism whereby programs on any two computers can communicate on the Internet. The important concepts for us are the IP address of a computer and the ports used by programs.
Computers on the Internet each have a unique address called the computer's IP address which is used to locate the computer. Since there may be more than one connection between two computers, another form of address called a port is used to distinguish different connections to the same computer.
If you consider the analogy where a Computer is like a Company and a Program is like an Employee of the Company, then the IP Address of a Computer is like the Company's Telephone number and the port is like the Employee's Telephone Extension.
- Example IP Address
192.168.0.1
.- Example IP Address with Port
192:168.0.1:80
Many programs use a default port if none is specified. For example, Web browsers and servers use port 80 if it is not specified in the request.
- DNS
-
DNS is a service which runs on special computers on the Internet which translates Computer Names into IP Addresses. DNS makes it possible for you to request a web page from www.mozilla.org rather from 207.126.111.202.
- HTTP
-
HTTP is a protocol used to communicate requests and responses between client computers (those running web browsers and the like) and server computers (those running web servers). Each request specifies a method and URL which identifies a specific action the client requests the server perform on the resource identified by the URL.
The most common HTTP methods are
GET
which asks the server to return the resource in its response andPOST
which sends information to the server to be processed by the specified resource. When you visit a web page in your web browser your web browser probably usedGET
to retrieve it. Submitting a form on a web page your web browser used either theGET
orPOST
method depending on how the author coded the form. - URL
-
An absolute URL specifies a resource using the name or address of the machine where the resource is located and the full directory path on the machine where the resource is located. It has the form:
http://hostname-or-ip-address:port/path?querystring
http://
-
http
is the protocol scheme which tells the client that it should use the HTTP protocol when sending the request.http>
is always followed by://
. hostname-or-ip-address
-
hostname-or-ip-address
can either be the name of the machine (as registered with the DNS service) where the resource is located or the actual IP address of the machine. It the host name is used, it is not case-sensitive. For example,WWW.EXAMPLE.COM
andwww.example.com
are equivalent host names. :port
-
:port
specifies the port on the server which should handle your request.port
is a number between 1 and 65535. If the:port
is missing from the URL, it is assumed to be:80
. /path
-
/path
specifies the location of the resource in the machine's web server directories./path
is case-sensitive since the web server may be hosted on a UNIX machine which has case-sensitive paths and file names. The easiest way to make sure you do not have case-sensitivity issues in your URLs is to always write them in lower-case.If
/path
contains references to sub-directories, for example/2004/09/25/
, the slashes must be forward slashes (/
). Backward slashes (\
) may work in browsers such as Internet Explorer but will not work in Mozilla. ?querystring
-
?querystring
is typically issued used when the request is the result of submitting a form using theGET
method. It contains a sequence ofname=value
pairs separated by a special delimiter characters. Many authors and web browsers use the ampersand (&
) to create query strings. Note that raw ampersands are illegal in XML and XHTML documents and should be coded as&
.An example query string might look like
?firstname=jane&species=cat
. The same query string should be coded as?firstname=jane&species=cat
if it is to be used in an XML or XHTML document.
- Example HTTP session
-
The best way to see the components of the web work together is to look at the actual network traffic between a web browser and a web server. The following is an ethereal log showing the network traffic from using Mozilla to request the web page
http://bclary.com/
. Text in this font are my comments about the traffic.No. Time Source Destination Protocol Info 5 1.331408 192.168.1.100 net0116012.direcpc.com DNS Standard query A bclary.com 8 2.351255 net0116012.direcpc.com 192.168.1.100 DNS Standard query response A 66.33.193.205 9 2.351655 192.168.1.100 bclary.com TCP 1497 > http [SYN] Seq=0 Ack=0 Win=16384 Len=0 MSS=1460 10 2.354552 bclary.com 192.168.1.100 TCP http > 1497 [SYN, ACK] Seq=0 Ack=1 Win=4096 Len=0 MSS=1420 11 2.354562 192.168.1.100 bclary.com TCP 1497 > http [ACK] Seq=1 Ack=1 Win=17040 Len=0
Before the page could even be requested, the network address of bclary.com had to be looked up and a TCP connection established. Note that on this satellite-based internet connection, the time required was on the order of one second. You can increase the page load performance of your content by minimizing the number of DNS look ups (the number of different machine names) required to load your content..
12 2.354632 192.168.1.100 bclary.com HTTP GET / HTTP/1.1 GET / HTTP/1.1\r\n Host: bclary.com\r\n User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n Keep-Alive: 300\r\n Connection: keep-alive\r\n \r\n
Mozilla requested the URL http://bclary.com/. Note how Mozilla identifies itself, and begins negotiating with the web server by sending preferences for Content Types, Language, Encoding, Character Sets, and TCP connection. Note also that Mozilla did not have a copy of the document already in its cache when this request was made.
13 2.358182 bclary.com 192.168.1.100 TCP http > 1497 [ACK] Seq=1 Ack=385 Win=4096 Len=0 16 4.010677 bclary.com 192.168.1.100 HTTP HTTP/1.1 200 OK (application/xhtml+xml) HTTP/1.1 200 OK\r\n Date: Sun, 26 Sep 2004 03:44:17 GMT\r\n Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n Last-Modified: Tue, 21 Sep 2004 06:41:26 GMT\r\n ETag: "4f2e5d-29dc-414fcd16"\r\n Accept-Ranges: bytes\r\n Keep-Alive: timeout=15, max=100\r\n Connection: Keep-Alive\r\n Content-Type: application/xhtml+xml\r\n Content-Encoding: gzip\r\n Content-Length: 3456\r\n \r\n 17 4.011942 bclary.com 192.168.1.100 HTTP Continuation 18 4.011966 192.168.1.100 bclary.com TCP 1497 > http [ACK] Seq=385 Ack=2841 Win=17040 Len=0 19 4.036299 192.168.1.100 bclary.com TCP 1498 > http [SYN] Seq=0 Ack=0 Win=16384 Len=0 MSS=1460 20 4.039000 bclary.com 192.168.1.100 TCP http > 1498 [SYN, ACK] Seq=0 Ack=1 Win=4096 Len=0 MSS=1420 21 4.039023 192.168.1.100 bclary.com TCP 1498 > http [ACK] Seq=1 Ack=1 Win=17040 Len=0
bclary.com responded with the HTTP headers and the contents of the document. Note how the web server gave the current date, the last time the document was modified, established a TCP connection which will be kept alive, and responded with a Content Type for XHTML and a gzipped compressed encoded version of the page. Note that since the retrieved document is compressed, Mozilla must wait for it to be fully downloaded before parsing the contents and issuing any additional requests for content which is referenced by the document.
22 4.039634 192.168.1.100 bclary.com HTTP GET /2004/07/21/style HTTP/1.1 GET /2004/07/21/style HTTP/1.1\r\n Host: bclary.com\r\n User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n Accept: text/css,*/*;q=0.1\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n Keep-Alive: 300\r\n Connection: keep-alive\r\n Referer: http://bclary.com/\r\n \r\n 23 4.042358 bclary.com 192.168.1.100 TCP http > 1498 [ACK] Seq=1 Ack=349 Win=4096 Len=0 24 4.077165 bclary.com 192.168.1.100 HTTP Continuation 25 4.213129 192.168.1.100 bclary.com TCP 1497 > http [ACK] Seq=385 Ack=3850 Win=16031 Len=0 28 6.213411 bclary.com 192.168.1.100 HTTP HTTP/1.1 200 OK (text/css) HTTP/1.1 200 OK\r\n Date: Sun, 26 Sep 2004 03:44:20 GMT\r\n Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n Content-Location: style.css\r\n Vary: negotiate\r\n TCN: choice\r\n Last-Modified: Wed, 21 Jul 2004 14:01:17 GMT\r\n ETag: "be598-3d7-40fe772d;41427405"\r\n Accept-Ranges: bytes\r\n Content-Length: 983\r\n Keep-Alive: timeout=15, max=100\r\n Connection: Keep-Alive\r\n Content-Type: text/css\r\n \r\n
Once the original document was downloaded and parsed, Mozilla was able to request the CSS style sheet which was referenced in the XHTML of the page. Mozilla had to wait for the full document to download before making the request because the document was compressed by the web server. This saved the bandwidth required to download the file, but forced Mozilla to wait to request the style sheet rather than parsing the document as it downloaded and immediately requesting the style sheet when it was referenced. Note that it took Mozilla almost 5 seconds to download the page and the referenced CSS file.
29 6.227674 192.168.1.100 bclary.com HTTP GET /favicon.png HTTP/1.1 GET /favicon.png HTTP/1.1\r\n Host: bclary.com\r\n User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n Accept: image/png,*/*;q=0.5\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n Keep-Alive: 300\r\n Connection: keep-alive\r\n \r\n 30 6.230413 bclary.com 192.168.1.100 TCP http > 1497 [ACK] Seq=3850 Ack=700 Win=4096 Len=0 31 6.319948 192.168.1.100 bclary.com TCP 1498 > http [ACK] Seq=349 Ack=1405 Win=15636 Len=0 33 8.199361 bclary.com 192.168.1.100 HTTP HTTP/1.1 200 OK (image/png) HTTP/1.1 200 OK\r\n Date: Sun, 26 Sep 2004 03:44:22 GMT\r\n Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n Last-Modified: Wed, 21 Jul 2004 23:05:50 GMT\r\n ETag: "4e5287-10bc-40fef6ce"\r\n Accept-Ranges: bytes\r\n Content-Length: 4284\r\n Keep-Alive: timeout=15, max=99\r\n Connection: Keep-Alive\r\n Content-Type: image/png\r\n \r\n 34 8.200092 bclary.com 192.168.1.100 HTTP Continuation 35 8.200117 192.168.1.100 bclary.com TCP 1497 > http [ACK] Seq=700 Ack=6690 Win=17040 Len=0 36 8.266306 bclary.com 192.168.1.100 HTTP Continuation 37 8.266708 bclary.com 192.168.1.100 HTTP Continuation 38 8.266721 192.168.1.100 bclary.com TCP 1497 > http [ACK] Seq=700 Ack=8490 Win=17040 Len=0
Once the original document and all of the referenced files such as the CSS style sheets were downloaded, Mozilla requested the favicon image which was referenced in the XHTML of the page. Note that it took Mozilla an additional second to request and download the favicon image.
Web browsers cache files by temporarily keeping copies of downloaded files. If the file has not changed when a web page is revisited, the browser can use the cached copy thus saving the time required to request and download a new copy as well as saving bandwidth for the web server. A detailed discussion of caching is beyond this document, but for more information I highly recommend Mark Nottingham excellent Caching Tutorial for Web Authors and Webmasters.
To illustrate how caching works, I requested the page again. This time, Mozilla already had copies of the document, the CSS style sheet and the favicon image.
40 14.362606 192.168.1.100 bclary.com HTTP GET / HTTP/1.1 GET / HTTP/1.1\r\n Host: bclary.com\r\n User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n Keep-Alive: 300\r\n Connection: keep-alive\r\n If-Modified-Since: Tue, 21 Sep 2004 06:41:26 GMT\r\n If-None-Match: "4f2e5d-29dc-414fcd16"\r\n Cache-Control: max-age=0\r\n \r\n
Note the
If-Modified-Since
andIf-None-Match
headers. Mozilla is telling the web server that it already has a local copy of the document in its cache and is asking if the document has been modified since Mozilla obtained its copy.41 14.365718 bclary.com 192.168.1.100 TCP http > 1498 [ACK] Seq=1405 Ack=848 Win=4096 Len=0 42 17.149742 bclary.com 192.168.1.100 HTTP HTTP/1.1 304 Not Modified HTTP/1.1 304 Not Modified\r\n Date: Sun, 26 Sep 2004 03:44:31 GMT\r\n Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n Connection: Keep-Alive\r\n Keep-Alive: timeout=15, max=99\r\n ETag: "4f2e5d-29dc-414fcd16"\r\n \r\n
The web server responded that Mozilla did not need to download the document again since the document had not changed since it was first downloaded.
43 17.173554 192.168.1.100 bclary.com HTTP GET /2004/07/21/style HTTP/1.1 GET /2004/07/21/style HTTP/1.1\r\n Host: bclary.com\r\n User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n Accept: text/css,*/*;q=0.1\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n Keep-Alive: 300\r\n Connection: keep-alive\r\n Referer: http://bclary.com/\r\n If-Modified-Since: Wed, 21 Jul 2004 14:01:17 GMT\r\n If-None-Match: "be598-3d7-40fe772d;41427405"\r\n Cache-Control: max-age=0\r\n \r\n 44 17.176624 bclary.com 192.168.1.100 TCP http > 1497 [ACK] Seq=8490 Ack=1170 Win=4096 Len=0 45 17.254806 192.168.1.100 bclary.com TCP 1498 > http [ACK] Seq=848 Ack=1656 Win=17040 Len=0 46 19.321286 bclary.com 192.168.1.100 HTTP HTTP/1.1 304 Not Modified HTTP/1.1 304 Not Modified\r\n Date: Sun, 26 Sep 2004 03:44:33 GMT\r\n Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n Connection: Keep-Alive\r\n Keep-Alive: timeout=15, max=98\r\n ETag: "be598-3d7-40fe772d;41427405"\r\n Content-Location: style.css\r\n Vary: negotiate\r\n \r\n 47 19.461781 192.168.1.100 bclary.com TCP 1497 > http [ACK] Seq=1170 Ack=8794 Win=16736 Len=0
Once Mozilla determined that it could use the cached copy of the document, it asked the web server to return the CSS style sheet if it had been modified. Again, the web server replied that the copy of the style sheet in Mozilla's cache was current and could be reused. It took Mozilla almost 5 seconds to determine that the copies of the document and associated files in its cache were still current. This long amount of time in this example is due to the extreme latency (round trip time for a request to travel from the client to the server and back) of a satellite internet connection. For example, my connection takes about 0.9 seconds to send and receive a single response from a server.
Note for content such as style sheets that do not change frequently, you can increase the performance of your pages by specifying an extremely long cache expires time so that browsers will not need to check if referenced files have changed. If you need to modify a file and wish to force browsers to reload it, then use a date-based path to the file as in this example or use a version string in the file name.
Gzipped encoding and other compression schemes can greatly reduce the bandwidth required to host a web site, and can improve the perceived performance of your web pages for your visitors using dialup internet connections by reducing the size of the files to be downloaded. This benefit does have drawbacks however. Gzipped content must be completely downloaded before it can be parsed and additional requests can be made for CSS, JavaScript and images files. Depending on the size of your documents and the types of connections your visitors use, Gzipped encoding could potentially result in a poorer perceived performance for your web site.
Try to limit the number of distinct files containing CSS style sheets or JavaScript. These types of file have a large effect on how a web browser makes requests for the contents of your web page and can adversely affect visitors using high-latency connections such as satellites.
-
- Tip Summary
-
-
Use all lowercase URLs to ensure you do not have case-sensitivity problems.
-
Use forward slashes (
/
) in URLs. -
Use escaped ampersands (
&
) rather than raw ampersands (&
). -
Minimize the number of references to different machine names to reduce the time required to perform DNS lookups when loading your content.
-
Make effective use of the browser's cache through the use of version specific URLs for content such as CSS style sheets which do not change frequently.
-
Judiciously use compressed encodings such as Gzip making sure that the size of your documents and the types of internet connections of your visitors results in an improved experience.
-
Limit the number of distinct files referenced in your content to reduce the time required to check the validity of the browser's cache. Reducing the number of files can measurably help your visitors with high-latency connections.
-
- Resources
-
- Architecture of the World Wide Web (W3C)
- TCP/IP (Wikipedia)
- DNS (Wikipedia)
- HTTP (Wikipedia)
- Caching Tutorial for Web Authors and Webmasters