Content Types - Web Developer Boot Camp

File Types

The type of a file is determined by the application used to create it and is used by other applications when they process it. For example, an image editing application might create a file containing an image using the PNG format. An image viewer application can use the fact that the file contains a PNG image to read the image file into memory and to display the image on the screen.

Operating Systems and application programs can use several methods to determine the type of a file stored on a local disk.

File Extensions

Many operating systems and programs use the file extension of a file to record its type. In our example, the file containing the PNG image might be named example.png. A computer can use the file extension png to determine the appropriate program to use when editing or viewing the image. For example, an image editor like Gimp might be an appropriate choice to edit the PNG image while a text editor would not. File extensions are most useful in circumstances where there is a clear one-to-one association between file types and file extensions. Note however that not all operating systems use file extensions to determine file type. For example, the Macintosh OS stores additional information concerning a file's type separately from the file itself. Another problem with using file extensions to determine file type arises on the web where files may be dynamically created using a server side program which does not have an extension appropriate to the file type.

Magic Numbers

Some programs attempt to use magic numbers to determine the type of a file. A magic number is a sequence of bytes at the beginning of a file which can (sometimes) be used to determine the format and hence type of a file. There are problems with using magic numbers to detect file types since these numbers are not guaranteed to uniquely identify a file's type and since the beginning of the file must be read before its type can be determined.

Metadata

Some Operating Systems such as Mac OS store information about a file's type separately from the file.

Content Types

Content Types are a generalization of the concept of file types for use over the internet in email and on the web where file extensions and operating system based metadata are not always available. Content Types are the means by which the appropriate programs are used to view various media types such as web pages, images, and multimedia.

The history of Content Types can be traced back to the beginning of email. Although RFC 822 defined the standard for text email messages, the need quickly arose to be able to send messages containing more than just plain text. In order to allow the receiver of a message to determine how to process the non-text portions of a message, the concept of Content Type was introduced in RFC 1049 and further refined in RFCs 2045, 2046, 2047, 2048, and 2049. HTTP borrowed the concepts of Content Type in RFC 1945 (Hypertext Transfer Protocol -- HTTP/1.0) and RFC 2616 (Hypertext Transfer Protocol -- HTTP/1.1).

As you can see in the example HTTP session in Web Developer Boot Camp - Architecture, Mozilla advertises the Content Types it can accept via the Accept header and the web server responds using the Content-Type to inform Mozilla which type was actually sent. Browsers which implement the internet standards are required to respect the Content-Type headers sent by web servers.

Content-Type is used by Mozilla to determine how to display images such as GIF, PNG, or JPEG, whether to load the appropriate plugin for Flash, Real media, or Quicktime content, or whether to prompt the user to download unknown binary content.

Importance of respecting Content-Type

It is important for Web Browser, Web Servers and Web Server Administrators to respect Content-Types for a variety of reasons:

Security

Part of the rationale behind the creation of the standards for Content Types was to enable programs to distinguish between safe and unsafe content. Safe content types are those which, by definition, are not intended to cause execution of programs on the visitor's computer. Unsafe content types, such as executable programs, cause a visitor's computer to run arbitrary code leading to potential infection by worms or viruses. By rigorously respecting the content type, it is possible to minimize circumstances where unsafe content is treated as safe.

Web Server Administrator Control

Mozilla provides web server administrators the greatest control over how content is delivered to users by respecting the Content Type reported by the web server. For example, if a web server administrator wishes an image file to be displayed in one context but downloaded to the visitor's disk in another, she can specifiy different content types to instruct Mozilla how to handle the file.

Internet Explorer and Content-Type

Internet Explorer does not in general respect the Content-Type returned by web servers and attempts to expand on the use of magic numbers and uses a variety of strategies to automatically detect the content type of files. Microsoft probably felt the need to add content type detection since many web servers are improperly configured and do not report the correct content types for images or multimedia files. However by doing so, they inadvertantly made it more difficult for other browser creators to follow the standards and have made the web less secure.

The use of content type guessing by Internet Explorer causes problems for other browsers such as Mozilla since inexperienced web server administrators can host improperly typed content without affecting the Internet Explorer users. Many uninformed people have complained about Mozilla's lack of content type detection however one of the major reasons Mozilla has maintained its policy of respecting the reported content type is due to security of Mozilla's users. As a result, Mozilla has been immune to a number of content type related security vulnerabilities that have plagued Internet Explorer.

Internet Explorer 6 on Windows XP, Service Pack 2 introduces a new setting under Tools->Internet Options->Security settings where you can choose to Disable Open files based on content, not file extension. Choosing Disable will make Internet Explore respect the Content Type at least in some circumstances. Unfortunately, this setting is not disabled by default. However, should a future exploit in Internet Explorer be discovered which takes advantage of the Content Type guessing, you can probably expect an update to Internet Explorer which disables Content Type guessing.

Example Content Types

The following illustrates the differences between Mozilla, Firefox, Opera 7.54, Internet Explorer and Internet Explorer 6 XPSP2 with "Open based on content disabled" when dealing with different Content Types.

File Mozilla 1.6 Mozilla 1.7, Firefox 1.0 MSIE 6 MSIE 6XPSP2 Opera 7.54
text as (text/plain) displays text displays text displays text displays text displays text
html as (text/html) displays html displays html displays html displays html displays html
html as (text/plain) displays text displays text displays html displays text displays text
jpg as (image/jpg) displays image displays image displays image displays image displays image
jpg as (text/plain) displays text prompts for download or Text helper application displays image displays text displays image

As you can see, Mozilla respects the Content Type of the files although Mozilla 1.7 and Firefox 1.0 both prompt on the case where a JPEG file is served as text/plain. This is different from earlier versions of Mozilla but is intended to help safely work around misconfigured web servers which serve binary content with a Content Type of text/plain.

Internet Explorer and Opera both fail to respect the Content Types in some circumstances. To understand the importance of this consider the recent JPEG vulnerability. Suppose your company filtered all image/jpg files from your email for security purposes, but allowed text/plain files. If a virus writer sent an infected JPEG image masked as text, Internet Explorer and Opera would both automatically display the image and thus potentially fall prey to the exploit.

Native Mozilla Content Types

The following Content Types are defined in the Mozilla source. Not all types are supported natively without special build configurations. This table needs to be updated. See IANA MIME Media Types for more details on registered MIME types.

Description Content Type File Extension
Apple File Data and Resource in single stream application/applefile
application/compress
gzip file application/gzip gz
application/http-index-format
Java Archive application/java-archive jar
Apple File in a BinHex4.0 file application/mac-binhex40 hqx
application/marimba
MathML application/mathml+xml mathml
Binary application/octet-stream bin,dms,lha,lzh,exe,class,so,dll
application/oleobject
Adobe PDF application/pdf
application/pgp
application/pkcs7-mime
application/pkcs7-signature p7s
Adobe PostScript application/postscript ai,eps,ps
application/pre-encrypted
RDF (Mozilla 1.8+) application/rdf+xml rdf
application/uue
application/uuencode
XUL application/vnd.mozilla.xul+xml xul
application/x-compress
application/x-fortezza-ckl
application/x-fortezza-krl
application/x-gunzip
gzip file application/x-gzip
application/x-javascript-config
JavaScript application/x-javascript js
application/x-macbinary
application/x-marimba
application/x-netscape-revocation
application/x-ns-proxy-autoconfig
application/x-oleobject
application/x-pgp-message
application/x-pkcs7-crl
application/x-pkcs7-mime
application/x-pkcs7-signature
application/x-uue
application/x-uuencode
application/x-www-form-urlencoded
application/x-x509-ca-cert
application/x-x509-email-cert
application/x-x509-server-cert
application/x-x509-user-cert
application/x-xpinstall xpi
XHTML application/xhtml+xml xhtml,xht
XSLT application/xslt+xml xslt
XML application/xml xml,xsl
Zip file application/zip zip
audio/basic au,snd
Bitmap image image/bmp bmp
GIF image image/gif gif
JPEG image image/jpeg jpeg,jpg,jpe
image/pjpeg
PNG image image/png png
SVG image image/svg+xml svg
TIFF iamge image/tiff tiff,tif
ICON image/x-icon
image/x-jg
image/x-jng
image/x-portable-pixmap ppm
image/x-xbitmap xbm
image/x-xbm xbm
image/xbm xbm
message/external-body
message/news
message/rfc822
multipart/alternative
multipart/appledouble
multipart/byteranges
multipart/digest
multipart/header-set
multipart/mixed
multipart/parallel
multipart/related
multipart/signed
multipart/x-mixed-replace
text/calendar ics,ifb
CSS text/css css
text/enriched
HTML text/html html,htm
text/jsss
text/mdl
Text text/plain asc,txt
RDF text/rdf rdf
text/richtext rtx
text/x-vcard
XML text/xml xml
video/mpeg mpeg,mpg,mpe
video/x-mng mng
Common Plugin installed Content Types

The following Content Types are relatively common and are installed through the installation of the associated plugin.

Description Content Type File Extension
Adobe Acrobat-Acrobat Portable Document Format application/pdf pdf
Adobe Acrobat-Acrobat Forms Data Format application/vnd.fdf fdf
Adobe Acrobat-XML Version of Acrobat Forms Data Format application/vnd.adobe.xfdf xfdf
Adobe Acrobat- Acrobat XML Data Package application/vnd.adobe.xdp+xml xdp
Adobe Acrobat-Adobe FormFlow99 Data File application/vnd.adobe.xfd+xml xfd
Adobe ESD Manager Plugin-Adobe Download Manager Version Files application/aom-getversion aomver
Adobe SVG Viewer-Scalable Vector Graphics image/svg-xml svg, svgz
Adobe SVG Viewer-Scalable Vector Graphics image/svg+xml svg, svgz
Adobe SVG Viewer-Scalable Vector Graphics image/vnd.adobe.svg+xml svg, svgz
Java Plug-in-Java Applet application/x-java-applet none
Java Plug-in-JavaBeans application/x-java-bean none
Java Plug-in-Java Virtual Machine for Netscape application/x-java-vm
MetaStream 3 Plugin-MetaStream Plugin File application/x-mtx mtx
Microsoft® DRM-Network Interface Plugin application/x-drm nip
Microsoft® DRM-Network Interface Plugin application/x-drm-v2 nip
QuickTime Plug-in - QuickTime Movie video/quicktime mov,qt
QuickTime Plug-in-QuickTime Image File image/x-quicktime qtif,qti
RealPlayer Version Plugin-RealPlayer Version Plugin application/vnd.rn-realplayer-javascript rpj
RealPlayer(tm) G2 LiveConnect-Enabled Plug-In (32-bit) -RealPlayer(tm) as Plug-in audio/x-pn-realaudio-plugin rpm
Shockwave Flash-Macromedia Flash movie application/x-shockwave-flash swf
Shockwave Flash-FutureSplash movie application/futuresplash spl
Shockwave for Director-Shockwave Movie application/x-director dir,dxr,dcr
Windows Media Player Plug-in Dynamic Link Library-Media Files application/asx *
Windows Media Player Plug-in Dynamic Link Library-Media Files video/x-ms-asf-plugin *
Windows Media Player Plug-in Dynamic Link Library-Media Files application/x-mplayer2 *
Windows Media Player Plug-in Dynamic Link Library-Media Files video/x-ms-asf asf,asx,*
Windows Media Player Plug-in Dynamic Link Library-Media Files video/x-ms-wm wm,*
Windows Media Player Plug-in Dynamic Link Library-Media Files audio/x-ms-wma wma,*
Windows Media Player Plug-in Dynamic Link Library-Media Files audio/x-ms-wax wax,*
Windows Media Player Plug-in Dynamic Link Library-Media Files video/x-ms-wmv wmv,*
Windows Media Player Plug-in Dynamic Link Library-Media Files video/x-ms-wvx wvx,*
Configuring Web Server Content Types
Apache

Apache 1.3 module mod_mime and Apache 2.0 module mod_mime describe how to use either the AddType directive or the mime.types file to define a Content Type in terms of a file's extension.

The AddType content-type file-extension directive associates the content type content-type with the file extension file-extension. This directive can be specified in the server configuration file for the server, for virtual hosts, directories or in the .htaccess file in the web server's content directories. For example, the following specifies that all files with file extension .html should be treated as text/html.

AddType text/html  .html

The mime.types file can also be used to specify content types using a similar syntax (without the AddType) with one content type per line.

text/html .html

For performance and security reasons, it is best to either specify the content-type to file extension mapping in the server configuration file or the mime.types file.

IIS

IIS 6.0 does not allow serving of content using a default Content-Type and must be configured for each type which is to be served. See IIS 6.0 Does Not Serve Unknown MIME Types for details on how to configure IIS 6.0's Content-Types.

For IIS 5.0, see Configure MIME Mapping for details of how to specify Content-Types.

Resources
  • IANA MIME Media Types
  • RFC 822 Standard for the Format of ARPA Internet Text Messages
  • RFC 1049 A Content-Type Header Field for Internet Messages
  • RFC 2045 Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies
  • RFC 2046 Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types
  • RFC 2047 Multipurpose Internet Mail Extensions (MIME) Part Three: Message Header Extensions for Non-ASCII Text
  • RFC 2048 Multipurpose Internet Mail Extensions (MIME) Part Four: Registration Procedures
  • RFC 2049 Multipurpose Internet Mail Extensions (MIME) Part Five: Conformance Criteria and Examples
home | up | topabout: