XHTML

From WhyNotWiki

Jump to: navigation, search

XHTML  edit   (Category  edit)


Web development  edit   (Category  edit)


Web design  edit   (Category  edit)


Contents

[edit] XHTML Comments

[edit] [Caveats (category)]: You can't use dashes (--) in your comment!

Timothy L. Warner (<2005-12-01). Mother Tongue Annoyances » Difference Between Em Dash and En Dash (http://www.mtannoyances.com/?p=218). Retrieved on 2007-05-11 11:18.


Many people who attempt to put comments into their HTML code -- text intended for the reference of the developer but not to be displayed by the browser -- get the syntax wrong, which can cause some browsers to mistakenly display part of the comment, or even worse, consider big chunks of the rest of the document to be a comment and ignore them. This widespread confusion about comment syntax is understandable given that HTML comment syntax is a rather inscrutable outgrowth of SGML (Standard Generalized Markup Language), a language devised for the formatting of government documents which formed the basis of HTML. Maybe some of the same people who devised the 1040 tax forms also created the syntax rules for SGML, and hence HTML.

While the popular conception is that comments open with <!-- and end with -->, this isn't quite completely accurate. Actually, comments start and end with "--", as in "-- This is a comment --", but such comments can only occur within the proper SGML context, which happens to be a block starting with <! and ending with >. This ends up producing the commonly observed comment syntax, but it requires the additional condition that you shouldn't have "--" occur in the middle of the comment, because that would mark the end of the comment. Actually, you can use "--" if it's followed with another "--", since multiple comments are allowed. So the following is legal:

<!-- This is a comment -- -- and here's more -->

But since you don't want to have to be so careful in counting your pairs of dashes, it's better not to include any double dashes anywhere in a comment line, so you can be sure the proper syntax is followed. This can get to be a bit of a problem in commenting out JavaScript code (recommended to hide it from older browsers), since "--" is frequently encountered as a decrement operator.

You'll have to use your best judgment in such cases about whether to rewrite your JavaScript code to avoid this operator, or live with a malformed "comment" that probably won't crash the common browsers which are used to dealing with such bad syntax anyway. Just try not to use comments like <!--------------> with so many dashes you're likely to lose count.

2007-06-19 15:54

My comment:

Actually if it's JavaScript you're trying to comment out, there are much better ways, such as the /* */ multi-line JavaScript comment, or enclosing the code in if(false) { }.

2005-10-13. HTML Comments - Anne’s Weblog (http://annevankesteren.nl/2005/10/comments). Retrieved on 2007-05-11 11:18.


...

A comment start is <!. A comment end is >. No variations. In between you have dashes. Ugh. Dashes are --. Inside a pair of matching dashes a comment end must be treated as a literal and therefore Acid 2 works as it does. Fun, not? [...] Per current specifications dashes have to be directly adjacent to a comment start; no whitespace in between. Browsers support whitespace in between. [...] Anyway, what happens with <!-- -- --> <!-- -->? When the parser reaches the end there is a problem. Dashes are missing. The last > is treated as part of the comment. The browser needs to reparse this. And now it gets interesting. The first < is to be treated as a literal, which makes it essentially, and things that follow it, a text node. That leaves us with a text node <!-- -- --> and a comment containing a space.

That is about it. I hope you find it as confusing as I do. I do get it though. And it sort of makes sense.

...

star_full.gif star_full.gif star_full.gif HTML and SGML comments (http://www.howtocreate.co.uk/SGMLComments.html). Retrieved on 2007-05-11 11:18.


[edit] This article is now being made obsolete

Due to the problems pointed out by this article, SGML comments have been removed from Acid 2, and future HTML versions will not require SGML comments. Browsers that have implemented them are now expected to remove their support for SGML comments, for all HTML versions.

...

[edit] Enter the double dash

Suddenly it is not so easy any more. You see, browsers were wrong. HTML was created as a subset of SGML, and SGML dictates a more complicated view of comments. Browsers all ignored SGML comments though, and stuck with the comment format they had always used. This was a sensible approach, in my opinion.

For a little while, Opera experimented with "correct" comment handling, and found that predictably, no Web page authors were aware of it, resulting in a lot of broken sites. So Opera changed back to using the format that everyone understood. Then Mozilla decided to implement them "properly" as well. It was implemented only in strict mode, but that did not stop it causing problems. Then the Acid 2 test came along, and for debatable reasons, they decided to include SGML comments in it.

It would have been better to rewrite the HTML standard to reflect the reality of what authors were using, but no. So browser vendors are now forced to implement SGML comments, or risk embarrassment, even though they will cause Web pages to break. Why will they break?

To put it simply, the double dash at the start and end of the comment do not start and end the comment. Double dash indicates a change in what the comment is allowed to contain. The first -- starts the comment, and tells the browser that the comment is allowed to contain > characters without ending the comment. The second -- does not end the comment. It tells the browser that if it encounters a > character, it must then end the comment. If another -- is added, then it goes back to allowing the > characters:

<!-- this can contain > characters -- this can not, so the comment ends here>

Each time a double dash is encountered, it changes the format between allowing, and not allowing the > characters to be inside the comment:

<!-- this can contain > characters -- this can not -- this can contain > characters -- this can not, so the comment ends here>

That example is not actually valid HTML, since the last part (between the last -- and the closing >) is not allowed to contain anything except whitespace. However, the SGML parsing rules will cause it to behave as described, even if there are some other non-whitespace characters in there:

<!-- this can contain > characters -- this can not -- this can contain > characters -->

Note, XML (and therefore also XHTML when served using an XML based content-type) took the sensible step of making it not valid to have -- inside a comment. As a result, trying to use it should result in a parsing error. Because of this, XML and XHTML do not have the SGML comment problem. In practice, I have never seen any real need for SGML comments, so I favour the XML approach. Note that XHTML, if served using the text/html content-type, will be treated as HTML, so the SGML comment parsing rules will be applied.

[edit] When you might encounter this problem

Let's say you want your web application to print out some state information in a comment on every page to make it easier to debug if there is ever a problem (while you are browsing the site). For example, you want to dump everything in the visitor's session. Let's also say that you deploy this into production including the debug code -- this debug output that is supposed to be invisible on the page and only discoverable if someone does View Source.

Well, that data could easily contain a '--' in the middle of it. And if it did, anything "inside your comment" (you thought) occurring after that '--' would display on the screen (because it would technically be not inside your comment at that point) ... which could be really ugly to anyone using your application.

[edit] Solution

In Ruby, you could implement a method like this:

class String
  def safe_in_comment
    gsub('-', '-')
  end
end

Then in your views, you can feel safe to put dumps in comments like this:

<!--
Session = <%= session.inspect.safe_in_comment %>
-->


[edit] Escaping in XHTML documents / HTML entities

[edit] Lists

star_full.gif star_full.gif star_empty.gif http://www.danshort.com/HTMLentities/ HTML Entities

[edit] Tools to convert from raw HTML to escaped HTML

[edit] accessify.com's tool

http://accessify.com/tools-and-wizards/developer-tools/quick-escape/

Original input (please view wiki source):

<nowiki>

<pre class="showimportantbits"><code><span class="moreimp"><!--</span>
<script type="text/javascript">
for( var i = 10; i > 0; i<span class="moreimp">--</span> ) {
        if( myar[i].status <span class="moreimp">></span> 3 ) {
                ntlp++;
        }
}
</script>
--></code>

</nowiki>

Output direct from conversion script (please view wiki source):



<pre class=\"showimportantbits\"><code><span class=\"moreimp\"><!--</span>
<script type=\"text/javascript\">
for( var i = 10; i > 0; i<span class=\"moreimp\">--</span> ) {
        if( myar[i].status <span class=\"moreimp\">></span> 3 ) {
                ntlp++;
        }
}
</script>
--></code></pre>


Which, when used as the contents of a pre tag in MediaWiki, at least, is rendered as this (please view wiki source):

<nowiki>

<pre class=\"showimportantbits\"><code><span class=\"moreimp\"><!--</span>
<script type=\"text/javascript\">
for( var i = 10; i > 0; i<span class=\"moreimp\">--</span> ) {
        if( myar[i].status <span class=\"moreimp\">></span> 3 ) {
                ntlp++;
        }
}
</script>
--></code>

</nowiki>

Desired output (see http://www.howtocreate.co.uk/SGMLComments.html) (don't view source -- is rendered correctly by MediaWiki):

<!--
<script type="text/javascript">
for( var i = 10; i > 0; i-- ) {
        if( myar[i].status > 3 ) {
                ntlp++;
        }
}
</script>
-->


Problems:

  • Adds backslashes (\) when it doesn't need to. I would have expected it to convert " to &quot; , but it converted it to \&quot; .
    • I don't know if this is okay to do in "normal XHTML", but when I put it in a pre tag on my wiki page (MediaWiki), the \s were visible, and I want them not to be!
  • The &gt; from my original source (yes, it's "already escaped") ended up being "unescaped" after I'd passed it through this filter , resulting in this: >. But I expected and desired for it to be "double escaped": &amp;gt;



[edit] Conventions

[edit] Is it better to wrap something in a block level element or put a <br /> tag at the end of the line?

A set of radio buttons that you want one per line, for example.

It's usually more convenient in the short term, if nothing else, to just put a br at the end... less work than setting up an unordered list, for example.

It would probably be more semantic/structured, however, to go ahead and wrap it in a tag rather than using a tag as a separator. (One of the slides on [1] talked about how <p> is more semantic than <br/>...)

[edit]

Aliases: HTML

Retrieved from "http://whynotwiki.com/XHTML"
Personal tools