The following text is an article I contributed to Philweavers.net yesterday. Unfortunately, the formatting got a bit screwed up, so I’m republishing it here in case anyone wants to read it in a slightly more legible form.
I came across an interesting nugget of information while trawling the W3C website for specs on their XHTML transitional variant a few days ago, that I found to be so wonderfully obtuse that I decided I had to write a little something about it. Sayeth the W3C, by way of defining XHTML:
"The Extensible HyperText Markup Language (XHTML) is a family of current and future document types and modules that reproduce, subset, and extend HTML, reformulated in XML. XHTML Family document types are all XML-based, and ultimately are designed to work in conjunction with XML-based user agents. XHTML is the successor of HTML, and a series of specifications has been developed for XHTML."
Riiight. Well, I guess we can all go home now, because the technobabble definition above clearly answers all our questions … unless of course, you were looking for a slightly more "practical" description.
This is what I hope to shed some light on over the course of this post. At the end of it, I also hope to be able to convince a few of you to study the XHTML spec on your own, as I really do believe that it will soon become the de facto standard by which we build our web pages.
Before we delve into XHTML, we need to first clarify our personal, working definitions of HTML. Nearly every designer I’ve talked to has more or less defined HTML thus: "It’s the language that tells the browser what the designer wants the page to look like." If you agree with this definition, cheer up. You’re one amongst thousands.
The problem, of course, is that this definition is absolutely, 100% wrong.
It’s this fundamental misunderstanding of what HTML is for that is largely responsible for all of the wrongly-coded websites currently online. So what is HTML really? The most simple definition I could come up with is this: "It’s the language that tells the browser what the content it’s displaying _is_."
Now, because that is a very simple definition, I will need to explain further, and I’ll do so by talking about specific HTML tags, which I’m sure all of you are quite familiar with. Let’s take the text-related tags, H1, H2, H3 and P as our first examples. If you think about what these tags mean, you’ll realize that they have little to do with what the content actually looks like. These tags are really more concerned with what the content within them is. (Contrary to popular belief, the "H" tags don’t define the size of the text so much as their level of importance, H1 being the most important or "main" title, H2 being the next most important, and so on. It just so happens that our visual idea of importance is directly related to size, so H1 is bigger than H2, which is bigger than H3, and so on.) P, meanwhile, defines a given body of text as being grouped together as a paragraph. Notice that neither the H tags or the P tag directly say anything about what they are supposed to look like. (By "look like," I mean, what font, color, or exact size they are supposed to be.)
In fact, if you look at the HTML vocabulary, you’ll see that the prevailing trend is to define the meaning of the content, and not its appearance. (There are a handful of exceptions to this though, which include the FONT, B, I and U tags among others. More on this later.) How then do we, as designers, actually define how we want our content to look? Well, that’s where CSS comes in.
Now, CSS as a discussion topic is enough to fill several books, so I won’t go into the gory details here. We all know that we use CSS to access various design features that aren’t available in HTML, but the way I see it being implemented is more of an extension than anything else. The thing is, CSS is the most correct way to apply layout and aesthetics to your pages. Essentially, we use HTML to define the various parts of the content, and we use CSS to design what those parts looks like. For example, you could have a long chunk of text, all separated properly with P tags. That would be HTML’s job. You would then apply styles to those paragraphs with CSS, by defining say, the font-family, the margin, the line-height or any other number of attributes.
This is why it is conceptually wrong to use the FONT, B, I or U tags, because they stipulate what the content within them should look like, when this is clearly not what HTML was intended to do. Most standards-aware designers use STRONG instead of B, and EM (emphasis) instead of I, because these alternative tags adhere more to the guiding principles of semantic markup. The thinking is that although there could be many ways to design what STRONG or EM(phasis) looks like, there is only one way to design B or I. Meanwhile, tags like TABLE or body attributes like BGCOLOR and BACKGROUND are usually incorrectly used as a means of creating graphically-rich layouts, when the correct methods should be to use DIVs and then define things like color and background-image through CSS instead. (TABLE is meant for making tables, and nothing else. We were never meant to use them as our primary layout tools, and yet so many people still do.)
Ok, assuming you’ve read this far without violently disagreeing with me, the question on your mind will probably be, why do I need to go through all the trouble of doing this, when the old methods work fine? That’s a very valid question, and I’ll admit that there are times I skimp on the proper HTML usage as well in an effort to save time. The answer to that question, coincidentally, brings me back to the original topic of this post, i.e., XHTML.
XHTML is the successor of HTML, and aims to properly enforce the principles mentioned above. The problem with currrent HTML is that it is very very quirky, with various browsers having different interpretations, and the language itself having weird inconsistencies (the aforementioned FONT, B, I or U tags are the most obvious). XHTML is the refined version of HTML, and its strictest variation fixes many of the mistakes in HTML 4.0. The reasoning is that the more strict you are, the more likely it is that browser manufacturers will follow the guidelines you set down.
There are some very compelling reasons to use XHTML, which I’ll go through very quickly now:
- This one’s the most obvious: when you properly separate content from design, you make your pages very flexible. You could easily change the layout of your entire website, by replacing one or two CSS files. And if you change your mind about small things, such as the color you chose for your links, you only need to change one line in one file, and the entire site is instantly revised.
- Repurposing your web pages for other mediums, like cell-phone screens, PDAs, or even print, becomes a lot easier because none of the tags you use are specific to the desktop-based web browser. Again, it’s simply a matter of switching out CSS files.
- Designing in XHTML allows your page to be more accessible to people with disabilities like low- or no-vision, because screen-readers can interpret the pages more easily.
- Your code is cleaner, easier to read and generally loads faster because it avoids redundancy in tag attributes.
- It’s a known fact that if your code identifies its doctype as XHTML Transitional or XHTML Strict, modern browsers render it faster because they don’t have to go through your code line-by-line and look for errors that need fixing.
The important thing to remember, I think, is that writing XHTML code is no more difficult than writing HTML. You will definitely need to wean yourself off of WYSIWYG editors in order to do this properly though, but you should have done that a long time ago anyway if you wanted to specialize in web design. The problem with WYSIWYG editors, you see, is that they (ironically) put a significant amount of emphasis on designing the look of your HTML documents, and you often end up with lots and lots of inappropriately-used code like FONT and BR and non-breaking-spaces as a result. Rather than cleaning these up later, it’d be better to simply dump your WYSIWYG editor completely and just code by hand. The beauty of proper XHTML usage is that the code you actually write is significantly decreased, with most of your "design" time spent tweaking and adjusting a single style-sheet that will affect your entire site.
As you can imagine, there are some rules as to how to write proper XHTML, and although it’s very similar to regular HTML, you will have to keep these things in mind. Here are a few of them, to give you an idea:
- All tagnames must be in lowercase, e.g., <strong> and not <STRONG>
- All tags must be properly nested and closed, e.g., <a href="#blah"><strong>sample textstrong> or src="filename.gif" width="100" height="100" />
- All attribute values must be contained with quotes, e.g.,
There may be some of you who are turned off by the finicky nature of XHTML and the tedium of coding by hand, but these are very necessary skills that will eventually become second-nature to you given enough practice. Look at this way: you will not find many professional architects or engineers who shun math in their day-to-day work, so it’s only natural that you, as a professional webdesigner understand the code you are using inside and out. Obscuring it with a WYSIWYG editor will only keep you from improving your skill to its fullest potential.
Over the next few weeks I’m going to try to write some practical XHTML/CSS usage examples, so you can get a better idea of just how easy it actually is. Unfortunately, many of my older sites are still all written in the clunky, table-style, but I have been forcing myself to build every new website in the proper XHTML style. So, ’til next time, keep your tags lowercase and your attributes properly quoted!