Comic Update: Conversation Sans Semantics

September 21, 2009

Today’s comic features Jeremy Keith, HTML5 “DoctorMike Robinson and the squirrel having an innocent conversation about Thai food and emails going where they don’t belong, while the poor Google-bot attempts to understand who is speaking without semantic guidance. I should warn you, a specific body part’s medical term is used a few times. All in good taste, mind you.

The reason that these two fine England-dwelling individuals join the squirrel in the strip is that each of them also had a slight issue with something that I found distasteful over the week: HTML5 documentation giving guidance for using non-semantic markup as a solution for marking conversations in HTML. The markup in question for a short time suggested using the b tag to note a speaker, with the text of the speech being in p tags. A short bit of criticism later and that was dropped, but as you can see here, there’s no replacement suggestions yet for any semantic solution.

Look. It’s 2009. We’re working on HTML5. We know that semantic-free markup (or semantically-confused markup) is something best avoided when possible. A conversation is one of the basic methods of human communication. I’m going to guess 99.999% of all people have at least one conversation daily. At least a portion of these end up on the web. Is there any reason to assume that we wouldn’t want to make this data more accessible for machines and screen-readers to understand?

The proposed dialog element has apparently gone the way of the dodo. I don’t know if this is good or bad. But I’d like some sort of method to markup conversation that isn’t arbitrary and devoid of meaning. And, contrary to the opinion put forth in this W3C mailing list email, I’m going to believe that my opinion on this matter is valid despite my tendency to draw squirrels. Ever since making the commitment to providing transcripts of the comics I create, I’m invested in having some method to mark up conversation. I’m also in the camp that prefers that markup to make sense.

I don’t know all the pros and cons, but I like the proposal put forth by the HTML5 Super Friends in their list of concerns: let’s use cite and q, or at the very least do some research to see how well that one works out. It makes sense, it’s simple, and we don’t have to invent new elements. I for one am going to start using them going forward until something that makes more sense comes along.

But enough with suggesting semantic-free elements for markup. We’ve already got div and span, I don’t really see the need for b and i to keep rearing their ugly heads.

Respond To This Post

Share This Post With Others: |

Category: Comic | Tags: , , , , , , ,

9 Responses to “Comic Update: Conversation Sans Semantics”

  1. [...] a few seconds ago from web [...]

  2. The <dialog> element was well-intentioned but limiting it to <dt> and <dd> children was a mistake (for a variety of reasons, not the least of which is affecting the current approach to <details> and <figure>, which Tab Atkins Jr. documents).

    That said, good luck getting the WG to allow <cite> in any context that doesn’t fit with “title of work.” The arguments against expanding its usefulness have gotten increasingly absurd, which suggests they’re beyond listening to reason.

  3. @Erik – Yeah, in the past day or so of catching up on “dt”-related messages on the W3C archives I got the feeling that “good riddance” was the right response to dialog’s departure. But I’m holding out that somehow cite-support will rally.

  4. Kyle,

    Using the cite element to refer to speakers (not just “works”) has been accepted good semantic (X)HTML practice since at least early 2005. E.g. from one of my presentations:

    Since that time, numerous web designers/developers have used the HTML4 cite element in this way, and thus it makes sense to keep it.

    The best approach to encourage this change in HTML5 is perhaps two-fold:

    1. Find and document more uses of the cite element to refer to speakers.
    2. Document your opinion.

    I’ve create a wiki page on the WHATWG wiki to handle both of these, please take a look and add your real world use links and opinions accordingly:



  5. @Tantek – Thank you for the links and information! I will definitely be visiting the WHATWG wiki in the near future to add to that.

  6. [...] distinguishing features and fashion sense to turn me into a character in his web comic, CSSquirrel. The result is, as ever, [...]

  7. I frankly felt that dialog was stupid, but after a bit of thought I realized there are some valid points — HTML is for the web, a lot of conversation takes place on the web in text, and it’s not uncommon to publish that conversation in real-time or in transcripts. Not to mention a variety of other places dialog appears like scripts and comics. There’s really plenty of use cases to demonstrate the need for a way to associate a person with a statement, semantically mark up dialog, etc.

    I don’t necessarily think that a dialog tag and dt/dd are the best way to do it. I kind of felt dialog/dt/dd was picked for presentation, not semantics — the default styling lends itself well to dialog. But that should be the least consideration. The only other reason might be that dt/dd encapsulate a name and a statement well. But that has nothing to do with semantics either.

    I’m also not 100% sure about inventing other new tags as children for a dialog tag either, though I’d personally be willing to entertain ideas. If a common format could be agreed upon that didn’t seem to have any egregious blind spots (like cite has), I think the spec and HTML semantics would be improved.

    Which reminds of the reasons I thought XHTML was going to be the bomb. The core elements should be semantic but generic because to have truly semantic HTML would require a huge library of tags, many of whom have very limited use cases. The extensible part…but that’s water under the bridge, isn’t it?

    I did not realize dialog and cite were such a sensitive issues until last night/this morning. I’ve been reading through the HTML5 spec and periodically running up against things that seem counter-intuitive to me.

    Small things, really, that I think could be easily fixed. So I made a resolution to blog about each of them (often by writing I clarify matters to myself). While I was writing I was googling for other stuff on the subject and came up with a lot of links, including some hilarious discussions in mailing lists.

    It makes me a bit nervous, because I frankly can’t see any reasonable basis for the argument that titles should be the only valid content of the cite tag, but I didn’t intially realize that I was treading on sacred ground.

    I think we need to break this stuff down into its constituent parts before we charge headlong into solutions. But I’m not a person who deals well with conversations of the tone I’ve seen in the mailing lists. So you won’t see me on there either; I lack the courage.

  8. Ah, excellent, didn’t know about the page on the whatwg wiki for collecting examples of cite-for-names. I’ve added my employer’s website, since it’s been my practice for years to mark up testimonials with a blockquote/cite pair.

  9. The timed-text authoring format that is about a decade too late would have been a good place to sort this out.

    The work began back in the dark ages, around 2000. Many people believed that people who were likely to use semantics worth believing were more likely to have markup quality that could be relied on as an indicator. It seems that such superstitions have been banished along with XHTML2, a preference for politeness in what were then called “the professions”, and other such relics.

    It does seem that conversation is a common idiom in many documents (you know, books, comics, and such), and while HTML 5 might be moving from its origins in a way to mark up text there could yet be some value in retaining or improving its capacity to do so.

    We shall see how things pan out.