Content Types - Web Developer Boot Camp
- File Types
-
The type of a file is determined by the application used to create it and is used by other applications when they process it. For example, an image editing application might create a file containing an image using the PNG format. An image viewer application can use the fact that the file contains a PNG image to read the image file into memory and to display the image on the screen.
Operating Systems and application programs can use several methods to determine the type of a file stored on a local disk.
- File Extensions
-
Many operating systems and programs use the file extension of a file to record its type. In our example, the file containing the PNG image might be named
example.png
. A computer can use the file extensionpng
to determine the appropriate program to use when editing or viewing the image. For example, an image editor like Gimp might be an appropriate choice to edit the PNG image while a text editor would not. File extensions are most useful in circumstances where there is a clear one-to-one association between file types and file extensions. Note however that not all operating systems use file extensions to determine file type. For example, the Macintosh OS stores additional information concerning a file's type separately from the file itself. Another problem with using file extensions to determine file type arises on the web where files may be dynamically created using a server side program which does not have an extension appropriate to the file type. - Magic Numbers
-
Some programs attempt to use
magic numbers
to determine the type of a file. A magic number is a sequence of bytes at the beginning of a file which can (sometimes) be used to determine the format and hence type of a file. There are problems with using magic numbers to detect file types since these numbers are not guaranteed to uniquely identify a file's type and since the beginning of the file must be read before its type can be determined. - Metadata
-
Some Operating Systems such as Mac OS store information about a file's type separately from the file.
- Content Types
-
Content Types are a generalization of the concept of file types for use over the internet in email and on the web where file extensions and operating system based metadata are not always available. Content Types are the means by which the appropriate programs are used to view various media types such as web pages, images, and multimedia.
The history of Content Types can be traced back to the beginning of email. Although RFC 822 defined the standard for text email messages, the need quickly arose to be able to send messages containing more than just plain text. In order to allow the receiver of a message to determine how to process the non-text portions of a message, the concept of Content Type was introduced in RFC 1049 and further refined in RFCs 2045, 2046, 2047, 2048, and 2049. HTTP borrowed the concepts of Content Type in RFC 1945 (Hypertext Transfer Protocol -- HTTP/1.0) and RFC 2616 (Hypertext Transfer Protocol -- HTTP/1.1).
As you can see in the example HTTP session in
Web Developer Boot Camp - Architecture
, Mozilla advertises the Content Types it can accept via theAccept
header and the web server responds using theContent-Type
to inform Mozilla which type was actually sent. Browsers which implement the internet standards are required to respect theContent-Type
headers sent by web servers.Content-Type
is used by Mozilla to determine how to display images such as GIF, PNG, or JPEG, whether to load the appropriate plugin for Flash, Real media, or Quicktime content, or whether to prompt the user to download unknown binary content.- Importance of respecting Content-Type
-
It is important for Web Browser, Web Servers and Web Server Administrators to respect Content-Types for a variety of reasons:
- Security
-
Part of the rationale behind the creation of the standards for Content Types was to enable programs to distinguish between safe and unsafe content. Safe content types are those which, by definition, are not intended to cause execution of programs on the visitor's computer. Unsafe content types, such as executable programs, cause a visitor's computer to run arbitrary code leading to potential infection by worms or viruses. By rigorously respecting the content type, it is possible to minimize circumstances where unsafe content is treated as safe.
- Web Server Administrator Control
-
Mozilla provides web server administrators the greatest control over how content is delivered to users by respecting the Content Type reported by the web server. For example, if a web server administrator wishes an image file to be displayed in one context but downloaded to the visitor's disk in another, she can specifiy different content types to instruct Mozilla how to handle the file.
- Internet Explorer and Content-Type
-
Internet Explorer does not in general respect the Content-Type returned by web servers and attempts to expand on the use of magic numbers and uses a variety of strategies to automatically detect the content type of files. Microsoft probably felt the need to add content type detection since many web servers are improperly configured and do not report the correct content types for images or multimedia files. However by doing so, they inadvertantly made it more difficult for other browser creators to follow the standards and have made the web less secure.
The use of content type guessing by Internet Explorer causes problems for other browsers such as Mozilla since inexperienced web server administrators can host improperly typed content without affecting the Internet Explorer users. Many uninformed people have complained about Mozilla's lack of content type detection however one of the major reasons Mozilla has maintained its policy of respecting the reported content type is due to security of Mozilla's users. As a result, Mozilla has been immune to a number of content type related security vulnerabilities that have plagued Internet Explorer.
Internet Explorer 6 on Windows XP, Service Pack 2 introduces a new setting under Tools->Internet Options->Security settings where you can choose to Disable Open files based on content, not file extension. Choosing Disable will make Internet Explore respect the Content Type at least in some circumstances. Unfortunately, this setting is not disabled by default. However, should a future exploit in Internet Explorer be discovered which takes advantage of the Content Type guessing, you can probably expect an update to Internet Explorer which disables Content Type guessing.
- Example Content Types
-
The following illustrates the differences between Mozilla, Firefox, Opera 7.54, Internet Explorer and Internet Explorer 6 XPSP2 with "Open based on content disabled" when dealing with different Content Types.
File Mozilla 1.6 Mozilla 1.7, Firefox 1.0 MSIE 6 MSIE 6XPSP2 Opera 7.54 text as (text/plain) displays text displays text displays text displays text displays text html as (text/html) displays html displays html displays html displays html displays html html as (text/plain) displays text displays text displays html displays text displays text jpg as (image/jpg) displays image displays image displays image displays image displays image jpg as (text/plain) displays text prompts for download or Text helper application displays image displays text displays image As you can see, Mozilla respects the Content Type of the files although Mozilla 1.7 and Firefox 1.0 both prompt on the case where a JPEG file is served as
text/plain
. This is different from earlier versions of Mozilla but is intended to help safely work around misconfigured web servers which serve binary content with a Content Type oftext/plain
.Internet Explorer and Opera both fail to respect the Content Types in some circumstances. To understand the importance of this consider the recent JPEG vulnerability. Suppose your company filtered all
image/jpg
files from your email for security purposes, but allowedtext/plain
files. If a virus writer sent an infected JPEG image masked as text, Internet Explorer and Opera would both automatically display the image and thus potentially fall prey to the exploit.
- Native Mozilla Content Types
-
The following Content Types are defined in the Mozilla source. Not all types are supported natively without special build configurations. This table needs to be updated. See IANA MIME Media Types for more details on registered MIME types.
Description Content Type File Extension Apple File Data and Resource in single stream application/applefile application/compress gzip file application/gzip gz application/http-index-format Java Archive application/java-archive jar Apple File in a BinHex4.0 file application/mac-binhex40 hqx application/marimba MathML application/mathml+xml mathml Binary application/octet-stream bin,dms,lha,lzh,exe,class,so,dll application/oleobject Adobe PDF application/pdf application/pgp application/pkcs7-mime application/pkcs7-signature p7s Adobe PostScript application/postscript ai,eps,ps application/pre-encrypted RDF (Mozilla 1.8+) application/rdf+xml rdf application/uue application/uuencode XUL application/vnd.mozilla.xul+xml xul application/x-compress application/x-fortezza-ckl application/x-fortezza-krl application/x-gunzip gzip file application/x-gzip application/x-javascript-config JavaScript application/x-javascript js application/x-macbinary application/x-marimba application/x-netscape-revocation application/x-ns-proxy-autoconfig application/x-oleobject application/x-pgp-message application/x-pkcs7-crl application/x-pkcs7-mime application/x-pkcs7-signature application/x-uue application/x-uuencode application/x-www-form-urlencoded application/x-x509-ca-cert application/x-x509-email-cert application/x-x509-server-cert application/x-x509-user-cert application/x-xpinstall xpi XHTML application/xhtml+xml xhtml,xht XSLT application/xslt+xml xslt XML application/xml xml,xsl Zip file application/zip zip audio/basic au,snd Bitmap image image/bmp bmp GIF image image/gif gif JPEG image image/jpeg jpeg,jpg,jpe image/pjpeg PNG image image/png png SVG image image/svg+xml svg TIFF iamge image/tiff tiff,tif ICON image/x-icon image/x-jg image/x-jng image/x-portable-pixmap ppm image/x-xbitmap xbm image/x-xbm xbm image/xbm xbm message/external-body message/news message/rfc822 multipart/alternative multipart/appledouble multipart/byteranges multipart/digest multipart/header-set multipart/mixed multipart/parallel multipart/related multipart/signed multipart/x-mixed-replace text/calendar ics,ifb CSS text/css css text/enriched HTML text/html html,htm text/jsss text/mdl Text text/plain asc,txt RDF text/rdf rdf text/richtext rtx text/x-vcard XML text/xml xml video/mpeg mpeg,mpg,mpe video/x-mng mng - Common Plugin installed Content Types
-
The following Content Types are relatively common and are installed through the installation of the associated plugin.
Description Content Type File Extension Adobe Acrobat-Acrobat Portable Document Format application/pdf pdf Adobe Acrobat-Acrobat Forms Data Format application/vnd.fdf fdf Adobe Acrobat-XML Version of Acrobat Forms Data Format application/vnd.adobe.xfdf xfdf Adobe Acrobat- Acrobat XML Data Package application/vnd.adobe.xdp+xml xdp Adobe Acrobat-Adobe FormFlow99 Data File application/vnd.adobe.xfd+xml xfd Adobe ESD Manager Plugin-Adobe Download Manager Version Files application/aom-getversion aomver Adobe SVG Viewer-Scalable Vector Graphics image/svg-xml svg, svgz Adobe SVG Viewer-Scalable Vector Graphics image/svg+xml svg, svgz Adobe SVG Viewer-Scalable Vector Graphics image/vnd.adobe.svg+xml svg, svgz Java Plug-in-Java Applet application/x-java-applet none Java Plug-in-JavaBeans application/x-java-bean none Java Plug-in-Java Virtual Machine for Netscape application/x-java-vm MetaStream 3 Plugin-MetaStream Plugin File application/x-mtx mtx Microsoft® DRM-Network Interface Plugin application/x-drm nip Microsoft® DRM-Network Interface Plugin application/x-drm-v2 nip QuickTime Plug-in - QuickTime Movie video/quicktime mov,qt QuickTime Plug-in-QuickTime Image File image/x-quicktime qtif,qti RealPlayer Version Plugin-RealPlayer Version Plugin application/vnd.rn-realplayer-javascript rpj RealPlayer(tm) G2 LiveConnect-Enabled Plug-In (32-bit) -RealPlayer(tm) as Plug-in audio/x-pn-realaudio-plugin rpm Shockwave Flash-Macromedia Flash movie application/x-shockwave-flash swf Shockwave Flash-FutureSplash movie application/futuresplash spl Shockwave for Director-Shockwave Movie application/x-director dir,dxr,dcr Windows Media Player Plug-in Dynamic Link Library-Media Files application/asx * Windows Media Player Plug-in Dynamic Link Library-Media Files video/x-ms-asf-plugin * Windows Media Player Plug-in Dynamic Link Library-Media Files application/x-mplayer2 * Windows Media Player Plug-in Dynamic Link Library-Media Files video/x-ms-asf asf,asx,* Windows Media Player Plug-in Dynamic Link Library-Media Files video/x-ms-wm wm,* Windows Media Player Plug-in Dynamic Link Library-Media Files audio/x-ms-wma wma,* Windows Media Player Plug-in Dynamic Link Library-Media Files audio/x-ms-wax wax,* Windows Media Player Plug-in Dynamic Link Library-Media Files video/x-ms-wmv wmv,* Windows Media Player Plug-in Dynamic Link Library-Media Files video/x-ms-wvx wvx,* - Configuring Web Server Content Types
-
- Apache
-
Apache 1.3 module mod_mime and Apache 2.0 module mod_mime describe how to use either the
AddType
directive or themime.types
file to define a Content Type in terms of a file's extension.The
AddType content-type file-extension
directive associates the content type content-type with the file extension file-extension. This directive can be specified in the server configuration file for the server, for virtual hosts, directories or in the.htaccess
file in the web server's content directories. For example, the following specifies that all files with file extension.html
should be treated astext/html
.AddType text/html .html
The
mime.types
file can also be used to specify content types using a similar syntax (without theAddType
) with one content type per line.text/html .html
For performance and security reasons, it is best to either specify the content-type to file extension mapping in the server configuration file or the
mime.types
file. - IIS
-
IIS 6.0 does not allow serving of content using a default Content-Type and must be configured for each type which is to be served. See IIS 6.0 Does Not Serve Unknown MIME Types for details on how to configure IIS 6.0's Content-Types.
For IIS 5.0, see Configure MIME Mapping for details of how to specify Content-Types.
- Resources
-
- IANA MIME Media Types
- RFC 822 Standard for the Format of ARPA Internet Text Messages
- RFC 1049 A Content-Type Header Field for Internet Messages
- RFC 2045 Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies
- RFC 2046 Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types
- RFC 2047 Multipurpose Internet Mail Extensions (MIME) Part Three: Message Header Extensions for Non-ASCII Text
- RFC 2048 Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures
- RFC 2049 Multipurpose Internet Mail Extensions (MIME) Part Five: Conformance Criteria and Examples