Understanding at least a little of the architecture of the web is important in order to properly develop web content. The Internet consists of interconnected computers running programs which communicate with each other. The Web consists of client computers running Web browser programs which communicate with server computers running Web server programs.
The underlying components of the web are:
TCP/IP which provides the network which connects computer programs on the Internet;
DNS which provides the ability to locate the network address of a computer from the computer's name;
HTTP which is the language used by web browsers to request resources from web servers and web servers to respond to the web browser's requests;
URLs which identify the location of web pages and other resources to be retrieved from web servers.
TCP/IP provides the mechanism whereby programs on any two computers can communicate on the Internet. The important concepts for us are the IP address of a computer and the ports used by programs.
Computers on the Internet each have a unique address called the computer's IP address which is used to locate the computer. Since there may be more than one connection between two computers, another form of address called a port is used to distinguish different connections to the same computer.
If you consider the analogy where a Computer is like a Company and a Program is like an Employee of the Company, then the IP Address of a Computer is like the Company's Telephone number and the port is like the Employee's Telephone Extension.
192.168.0.1.192:168.0.1:80Many programs use a default port if none is specified. For example, Web browsers and servers use port 80 if it is not specified in the request.
DNS is a service which runs on special computers on the Internet which translates Computer Names into IP Addresses. DNS makes it possible for you to request a web page from www.mozilla.org rather from 207.126.111.202.
HTTP is a protocol used to communicate requests and responses between client computers (those running web browsers and the like) and server computers (those running web servers). Each request specifies a method and URL which identifies a specific action the client requests the server perform on the resource identified by the URL.
The most common HTTP methods are GET
which asks the server to return the resource in its
response and POST which sends information to
the server to be processed by the specified resource.
When you visit a web page in your web browser your web
browser probably used GET to retrieve it.
Submitting a form on a web page your web browser used
either the GET or POST method
depending on how the author coded the form.
An absolute URL specifies a resource using the name or address of the machine where the resource is located and the full directory path on the machine where the resource is located. It has the form:
http://hostname-or-ip-address:port/path?querystring
http://http is the protocol scheme which
tells the client that it should use the HTTP protocol
when sending the request. http> is
always followed by ://.
hostname-or-ip-addresshostname-or-ip-address can either be
the name of the machine (as registered with the DNS
service) where the resource is located or the actual
IP address of the machine. It the host name is used,
it is not case-sensitive. For example,
WWW.EXAMPLE.COM and
www.example.com are equivalent host
names.
:port:port specifies the port on the
server which should handle your request.
port is a number between 1 and 65535. If
the :port is missing from the URL, it is
assumed to be :80.
/path/path specifies the location of the
resource in the machine's web server directories.
/path is case-sensitive since the web
server may be hosted on a UNIX
machine which has case-sensitive paths and file
names. The easiest way to make sure
you do not have case-sensitivity issues in your URLs is
to always write them in lower-case.
If /path contains
references to sub-directories, for example
/2004/09/25/, the slashes
must be forward slashes
(/). Backward slashes (\)
may work in browsers such as Internet Explorer
but will not work in Mozilla.
?querystring?querystring is typically issued used
when the request is the result of submitting a form
using the GET method. It contains a
sequence of name=value pairs separated
by a special delimiter characters. Many authors and
web browsers use the ampersand (&)
to create query strings. Note that raw
ampersands are illegal in XML and XHTML documents and
should be coded as
&.
An example query string might look like
?firstname=jane&species=cat. The
same query string should be coded as
?firstname=jane&species=cat if
it is to be used in an XML or XHTML document.
The best way to see the components of the web work
together is to look at the actual network traffic between
a web browser and a web server. The following is an
ethereal log showing the network traffic from using
Mozilla to request the web page
http://bclary.com/. Text in this font are my comments about
the traffic.
No. Time Source Destination Protocol Info
5 1.331408 192.168.1.100 net0116012.direcpc.com DNS Standard query A bclary.com
8 2.351255 net0116012.direcpc.com 192.168.1.100 DNS Standard query response A 66.33.193.205
9 2.351655 192.168.1.100 bclary.com TCP 1497 > http [SYN] Seq=0 Ack=0 Win=16384 Len=0 MSS=1460
10 2.354552 bclary.com 192.168.1.100 TCP http > 1497 [SYN, ACK] Seq=0 Ack=1 Win=4096 Len=0 MSS=1420
11 2.354562 192.168.1.100 bclary.com TCP 1497 > http [ACK] Seq=1 Ack=1 Win=17040 Len=0
Before the page could even be requested, the network address of bclary.com had to be looked up and a TCP connection established. Note that on this satellite-based internet connection, the time required was on the order of one second. You can increase the page load performance of your content by minimizing the number of DNS look ups (the number of different machine names) required to load your content..
12 2.354632 192.168.1.100 bclary.com HTTP GET / HTTP/1.1
GET / HTTP/1.1\r\n
Host: bclary.com\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 300\r\n
Connection: keep-alive\r\n
\r\n
Mozilla requested the URL http://bclary.com/. Note how Mozilla identifies itself, and begins negotiating with the web server by sending preferences for Content Types, Language, Encoding, Character Sets, and TCP connection. Note also that Mozilla did not have a copy of the document already in its cache when this request was made.
13 2.358182 bclary.com 192.168.1.100 TCP http > 1497 [ACK] Seq=1 Ack=385 Win=4096 Len=0
16 4.010677 bclary.com 192.168.1.100 HTTP HTTP/1.1 200 OK (application/xhtml+xml)
HTTP/1.1 200 OK\r\n
Date: Sun, 26 Sep 2004 03:44:17 GMT\r\n
Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n
Last-Modified: Tue, 21 Sep 2004 06:41:26 GMT\r\n
ETag: "4f2e5d-29dc-414fcd16"\r\n
Accept-Ranges: bytes\r\n
Keep-Alive: timeout=15, max=100\r\n
Connection: Keep-Alive\r\n
Content-Type: application/xhtml+xml\r\n
Content-Encoding: gzip\r\n
Content-Length: 3456\r\n
\r\n
17 4.011942 bclary.com 192.168.1.100 HTTP Continuation
18 4.011966 192.168.1.100 bclary.com TCP 1497 > http [ACK] Seq=385 Ack=2841 Win=17040 Len=0
19 4.036299 192.168.1.100 bclary.com TCP 1498 > http [SYN] Seq=0 Ack=0 Win=16384 Len=0 MSS=1460
20 4.039000 bclary.com 192.168.1.100 TCP http > 1498 [SYN, ACK] Seq=0 Ack=1 Win=4096 Len=0 MSS=1420
21 4.039023 192.168.1.100 bclary.com TCP 1498 > http [ACK] Seq=1 Ack=1 Win=17040 Len=0
bclary.com responded with the HTTP headers and the contents of the document. Note how the web server gave the current date, the last time the document was modified, established a TCP connection which will be kept alive, and responded with a Content Type for XHTML and a gzipped compressed encoded version of the page. Note that since the retrieved document is compressed, Mozilla must wait for it to be fully downloaded before parsing the contents and issuing any additional requests for content which is referenced by the document.
22 4.039634 192.168.1.100 bclary.com HTTP GET /2004/07/21/style HTTP/1.1
GET /2004/07/21/style HTTP/1.1\r\n
Host: bclary.com\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n
Accept: text/css,*/*;q=0.1\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 300\r\n
Connection: keep-alive\r\n
Referer: http://bclary.com/\r\n
\r\n
23 4.042358 bclary.com 192.168.1.100 TCP http > 1498 [ACK] Seq=1 Ack=349 Win=4096 Len=0
24 4.077165 bclary.com 192.168.1.100 HTTP Continuation
25 4.213129 192.168.1.100 bclary.com TCP 1497 > http [ACK] Seq=385 Ack=3850 Win=16031 Len=0
28 6.213411 bclary.com 192.168.1.100 HTTP HTTP/1.1 200 OK (text/css)
HTTP/1.1 200 OK\r\n
Date: Sun, 26 Sep 2004 03:44:20 GMT\r\n
Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n
Content-Location: style.css\r\n
Vary: negotiate\r\n
TCN: choice\r\n
Last-Modified: Wed, 21 Jul 2004 14:01:17 GMT\r\n
ETag: "be598-3d7-40fe772d;41427405"\r\n
Accept-Ranges: bytes\r\n
Content-Length: 983\r\n
Keep-Alive: timeout=15, max=100\r\n
Connection: Keep-Alive\r\n
Content-Type: text/css\r\n
\r\n
Once the original document was downloaded and parsed, Mozilla was able to request the CSS style sheet which was referenced in the XHTML of the page. Mozilla had to wait for the full document to download before making the request because the document was compressed by the web server. This saved the bandwidth required to download the file, but forced Mozilla to wait to request the style sheet rather than parsing the document as it downloaded and immediately requesting the style sheet when it was referenced. Note that it took Mozilla almost 5 seconds to download the page and the referenced CSS file.
29 6.227674 192.168.1.100 bclary.com HTTP GET /favicon.png HTTP/1.1
GET /favicon.png HTTP/1.1\r\n
Host: bclary.com\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n
Accept: image/png,*/*;q=0.5\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 300\r\n
Connection: keep-alive\r\n
\r\n
30 6.230413 bclary.com 192.168.1.100 TCP http > 1497 [ACK] Seq=3850 Ack=700 Win=4096 Len=0
31 6.319948 192.168.1.100 bclary.com TCP 1498 > http [ACK] Seq=349 Ack=1405 Win=15636 Len=0
33 8.199361 bclary.com 192.168.1.100 HTTP HTTP/1.1 200 OK (image/png)
HTTP/1.1 200 OK\r\n
Date: Sun, 26 Sep 2004 03:44:22 GMT\r\n
Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n
Last-Modified: Wed, 21 Jul 2004 23:05:50 GMT\r\n
ETag: "4e5287-10bc-40fef6ce"\r\n
Accept-Ranges: bytes\r\n
Content-Length: 4284\r\n
Keep-Alive: timeout=15, max=99\r\n
Connection: Keep-Alive\r\n
Content-Type: image/png\r\n
\r\n
34 8.200092 bclary.com 192.168.1.100 HTTP Continuation
35 8.200117 192.168.1.100 bclary.com TCP 1497 > http [ACK] Seq=700 Ack=6690 Win=17040 Len=0
36 8.266306 bclary.com 192.168.1.100 HTTP Continuation
37 8.266708 bclary.com 192.168.1.100 HTTP Continuation
38 8.266721 192.168.1.100 bclary.com TCP 1497 > http [ACK] Seq=700 Ack=8490 Win=17040 Len=0
Once the original document and all of the referenced files such as the CSS style sheets were downloaded, Mozilla requested the favicon image which was referenced in the XHTML of the page. Note that it took Mozilla an additional second to request and download the favicon image.
Web browsers cache files by temporarily keeping copies of downloaded files. If the file has not changed when a web page is revisited, the browser can use the cached copy thus saving the time required to request and download a new copy as well as saving bandwidth for the web server. A detailed discussion of caching is beyond this document, but for more information I highly recommend Mark Nottingham's excellent Caching Tutorial for Web Authors and Webmasters.
To illustrate how caching works, I requested the page again. This time, Mozilla already had copies of the document, the CSS style sheet and the favicon image.
40 14.362606 192.168.1.100 bclary.com HTTP GET / HTTP/1.1
GET / HTTP/1.1\r\n
Host: bclary.com\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 300\r\n
Connection: keep-alive\r\n
If-Modified-Since: Tue, 21 Sep 2004 06:41:26 GMT\r\n
If-None-Match: "4f2e5d-29dc-414fcd16"\r\n
Cache-Control: max-age=0\r\n
\r\n
Note the If-Modified-Since and
If-None-Match headers. Mozilla is telling the web
server that it already has a local copy of the document in its
cache and is asking if the document has been modified since
Mozilla obtained its copy.
41 14.365718 bclary.com 192.168.1.100 TCP http > 1498 [ACK] Seq=1405 Ack=848 Win=4096 Len=0
42 17.149742 bclary.com 192.168.1.100 HTTP HTTP/1.1 304 Not Modified
HTTP/1.1 304 Not Modified\r\n
Date: Sun, 26 Sep 2004 03:44:31 GMT\r\n
Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n
Connection: Keep-Alive\r\n
Keep-Alive: timeout=15, max=99\r\n
ETag: "4f2e5d-29dc-414fcd16"\r\n
\r\n
The web server responded that Mozilla did not need to download the document again since the document had not changed since it was first downloaded.
43 17.173554 192.168.1.100 bclary.com HTTP GET /2004/07/21/style HTTP/1.1
GET /2004/07/21/style HTTP/1.1\r\n
Host: bclary.com\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a4) Gecko/20040924\r\n
Accept: text/css,*/*;q=0.1\r\n
Accept-Language: en-us,en;q=0.5\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 300\r\n
Connection: keep-alive\r\n
Referer: http://bclary.com/\r\n
If-Modified-Since: Wed, 21 Jul 2004 14:01:17 GMT\r\n
If-None-Match: "be598-3d7-40fe772d;41427405"\r\n
Cache-Control: max-age=0\r\n
\r\n
44 17.176624 bclary.com 192.168.1.100 TCP http > 1497 [ACK] Seq=8490 Ack=1170 Win=4096 Len=0
45 17.254806 192.168.1.100 bclary.com TCP 1498 > http [ACK] Seq=848 Ack=1656 Win=17040 Len=0
46 19.321286 bclary.com 192.168.1.100 HTTP HTTP/1.1 304 Not Modified
HTTP/1.1 304 Not Modified\r\n
Date: Sun, 26 Sep 2004 03:44:33 GMT\r\n
Server: Apache/1.3.31 (Unix) DAV/1.0.3 mod_gzip/1.3.26.1a PHP/4.3.5 mod_ssl/2.8.19 OpenSSL/0.9.6c\r\n
Connection: Keep-Alive\r\n
Keep-Alive: timeout=15, max=98\r\n
ETag: "be598-3d7-40fe772d;41427405"\r\n
Content-Location: style.css\r\n
Vary: negotiate\r\n
\r\n
47 19.461781 192.168.1.100 bclary.com TCP 1497 > http [ACK] Seq=1170 Ack=8794 Win=16736 Len=0
Once Mozilla determined that it could use the cached copy of the document, it asked the web server to return the CSS style sheet if it had been modified. Again, the web server replied that the copy of the style sheet in Mozilla's cache was current and could be reused. It took Mozilla almost 5 seconds to determine that the copies of the document and associated files in its cache were still current. This long amount of time in this example is due to the extreme latency (round trip time for a request to travel from the client to the server and back) of a satellite internet connection. For example, my connection takes about 0.9 seconds to send and receive a single response from a server.
Note for content such as style sheets that do not change frequently, you can increase the performance of your pages by specifying an extremely long cache expires time so that browsers will not need to check if referenced files have changed. If you need to modify a file and wish to force browsers to reload it, then use a date-based path to the file as in this example or use a version string in the file name.
Gzipped encoding and other compression schemes can greatly reduce the bandwidth required to host a web site, and can improve the perceived performance of your web pages for your visitors using dialup internet connections by reducing the size of the files to be downloaded. This benefit does have drawbacks however. Gzipped content must be completely downloaded before it can be parsed and additional requests can be made for CSS, JavaScript and images files. Depending on the size of your documents and the types of connections your visitors use, Gzipped encoding could potentially result in a poorer perceived performance for your web site.
Try to limit the number of distinct files containing CSS style sheets or JavaScript. These types of file have a large effect on how a web browser makes requests for the contents of your web page and can adversely affect visitors using high-latency connections such as satellites.
Use all lowercase URLs to ensure you do not have case-sensitivity problems.
Use forward slashes (/) in URLs.
Use escaped ampersands (&)
rather than raw ampersands (&).
Minimize the number of references to different machine names to reduce the time required to perform DNS lookups when loading your content.
Make effective use of the browser's cache through the use of version specific URLs for content such as CSS style sheets which do not change frequently.
Judiciously use compressed encodings such as Gzip making sure that the size of your documents and the types of internet connections of your visitors results in an improved experience.
Limit the number of distinct files referenced in your content to reduce the time required to check the validity of the browser's cache. Reducing the number of files can measurably help your visitors with high-latency connections.
| First | Prev Next | Last |
| Home | Index | Up | Top | Feedback |