HTML 4 or XHTML 1?

Posted on June 20, 2007

html
xhtml

The debate between whether to use HTML 4 for a DOCTYPE or the new XHTML 1 continues, even within my own head. Here’s some of the pros/cons between them.

HTML 4.01

Causes a validation error with the closing slash in <link />. Technically, self-closing tags shouldn’t even be used, but browsers will usually forgive such errors.
Can create parts of the page dynamically with JavaScript while the document is still loading.
Can use named character entities (e.g., ) other than the four predefined ones: <, >, & and “.
Can use document.write() and the .innerHTML property with JavaScript (technically .innerHTML is non-standard even in HTML).
Script and style elements cannot have their contents hidden from legacy browsers.

XHTML 1.0

XHTML is based on XML and therefore extensible and designed for future applications.
Requires closing tags (well-formedness) – discourages sloppy coding.
Since it is based on XML, XHTML can be extended with new meaning like XML can.
The standard says that XHTML documents should be delivered with a MIME type of application/xhtml+xml, but unfortunately no version of IE (including 7) understands that type. The only way to get around it is to send the document as text/html, which is technically wrong. Luckily, this can easily be switched when browsers do support application/xhtml+xml (see this article about content negotiation).
FCKEditor writes XHTML-valid code.
XHTML can be parsed and read by XML agents and parsers.
ALA uses XHTML for their DOCTYPE.
Requires the use of escaping for CDATA
Element type selectors in CSS are case sensitive for XHTML, but not for HTML.
XHTML may not understand most named character entities (such as ) because the only official entities are “, &, <, >, and ‘. Note that HTML is actually the same way (except no ‘), but browsers have the entities hard-coded into their engine.

Conclusions

XHTML is the more forward-looking of the two, plus it creates well-formed, parsable XML documents. IE doesn’t support pure XHTML yet, but the XHTML standard allows us to “devolve” the document into regular HTML 4 using content type negotiation. IE and older browsers should still display the document in a “safe” manner as long as the XHTML Compatibility Rules (Appendex C) are followed.

However, it’s possible that the (HTML) rendering between IE and other browsers is too different to be worth supporting both content types (XHTML and HTML) simultaneously. Also, content type negotiation must be done through PHP, which means the top of each page must have the processing instructions.

So:

If the majority of the browsers for a site’s target audience is not Internet Explorer, or the site will be for internal use only, or requires an XML base (for parsing maybe), then use XHTML 1.0 Strict.
Otherwise, the vast majority of sites should be written in HTML 4.01 Strict.

If we use HTML 4.01 Strict, we should write the HTML as much like XHTML as possible: lowercase tags and attributes, quoted attribute values, closing all elements that can be closed. And HTML4 is not going to disappear–it will be supported far into the future.

Other XHTML tips:

Don’t use <script> and <style> definitions in the <head>. It’s too much of a pain. Link to external files instead.
Don’t use the <?xml … ?> header with XHTML documents.
Always use UTF-8 with XHTML documents.
Include a space in self-closing tags so that older browsers are not confused. (e.g., <br /> rather than <br/>)

HTML 4 or XHTML 1?

HTML 4.01

XHTML 1.0

Conclusions

References

Leave a Reply Cancel reply

|