Comic Update: Who Really Is the Wizard of HTML5?

June 22, 2009

Today’s comic portrays my misgivings about HTML5 through the lens of L. Frank Baum, imagining a world where Chris Wilson, Manu Sporny and John Foliot were my companions through a standards-creating journey roadblocked by a guy in a purple coat with a big curtain.

Let’s review the facts.

Ian Hickson, editor of the HTML5 spec and top dog of the WHAT WG, is an employee of Google. He also adheres to a policy when dealing with people that can be summed up as: Deny, Delay, Too Late.

It can be argued that HTML5 is an important upgrade to one of the most vital technologies of the 21st century. Billions of people are using the Internet to facilitate communication and business, share their culture, access otherwise censored information when living under harsh regimes, and so forth. Most of the sites they use for these purposes are built in some fashion upon HTML.

At the currently accelerating rate of content creation, it’s safe to say that billions of pages will be built with HTML5. How these pages are designed, and how they’ll meet the needs of people both in the present and in the future rest upon how this standard is outlined. Everything from preserving the portability of microdata, ensuring the accessibility of web users with special needs, and finding ways to share media without the hassle of brand-specific plugin wars (anyone seen a flash site on an iPhone yet?) are determined by this effort.

So why is it that the person who is the center of this process is allowed to be a man who rejects consensus, actively denies issues (based on his own admitted policy) and substitutes expert advice in important areas like accessibility with analyzing data from the Google Index and parsing numbers? Numbers that we cannot have a third party confirm because every request to do just this is ignored?

There is no doubt in my mind that Ian is brilliant. However no man, no matter how brilliant, should be allowed to be so influential on a spec when he is bringing all this baggage to the table with him.

The biggest problem for me is as follows: Google. Ian’s work is highly influenced by data harvested by Google. I am positive Google has some spectacular views of the web, resulting in some highly accurate views of the current state of the Internet. I’m also sure that this doesn’t matter one bit if we have to take their word for it, because we can’t view it ourselves.

Most people search the web through Google. I get mail through Google, site analytics through Google, news through Google, and sometimes even browse with a browser used by Google. It’s impossible to throw a rock at the Internet and not somehow hit Google. It’s to the point where even the US government is getting a bit itchy and considering taking antitrust actions against them.

I don’t want to sound paranoid, but perhaps we shouldn’t craft HTML5 solely on Google’s say-so. If the data-harvesting Ian performs can’t be independently verified, then perhaps we shouldn’t accept it as fact. It’s just not prudent. We definitely shouldn’t use it as a substitute for actual experts in discussions like accessibility (which I spoke about last week). If Ian can’t accept that limitation or provide access to the raw data, then we need to consider whether a conflict of interests exists and whether he should remain as the editor. With him doing such a poor job of playing well with others (whether they be individuals, experts, or other WC3 working groups) while relying on private information from his employer, how can he be expected to create a HTML5 that meets not just his needs, or Google’s needs, but everyone’s needs?

I’m not convinced he can.

Respond To This Post

Share This Post With Others: |

Category: Comic | Tags: , , , , , , , ,

28 Responses to “Comic Update: Who Really Is the Wizard of HTML5?”

  1. Google is indeed behind the curtain on many things. I, as you are, am increasingly wary of the amount of power they wield online, but in this case they’ve made us incredibly happy by supporting and indexing RDFa markup, joining Yahoo who led adoption.

    Note that Google’s support and implementation of RDFa happened without the RDFa Task Force even knowing about it. It happened without a central figure and it happened organically. This is the way the Web has grown over the past two decades and should be the way it continues to grow. No single group or person should have a veto on who can participate or what can happen on the Web.

    The way that Ian runs WHATWG has produced some great work (the video/audio tags and Web Workers) as well as some fairly hideous changes (microdata, no extensibility, bad accessibility). More importantly, it has alienated many very good people that were responsible for bringing the web to where it is today. There is a deep distrust of WHATWG and the standards community is somewhat fractured at the moment. Good people are now quietly refusing to work with Ian and WHATWG – which is so very bad for the Web. Sam Ruby has done a great bit to quell this acrimonious environment and we’ve been working with him to try and move the semantic web portion of HTML forward.

    The RDFa task force has listened to the HTML4 and HTML5 community and collected a set of RDFa in HTML issues. Note that RDFa is only defined for use in XHTML right now, but people are using it in HTML4 and HTML5 anyway. We are actively working on resolving these issues and producing a document that specifies a unified RDFa syntax that is largely backwards compatible with the current version of RDFa and which works in HTML4 and HTML5 documents. We are compelled to do this because of the large-scale adoption we’re seeing. Google and Yahoo are the biggest companies to adopt the technology, but there are hundreds of smaller web shops and companies that use it as well.

    Ian has made it pretty clear that RDFa (in its current form) isn’t welcome in his version of HTML5, preferring an approach he created dubbed microdata. The RDFa Task Force has no choice but to produce a version of RDFa for HTML since one of the co-chairs of HTML Working Group, Sam Ruby, has asked us to address the issues of RDFa in HTML. We’re also under pressure to do this quickly (and correctly) because people are inserting RDFa into HTML documents without there being a spec to do so. The RDFa community is a victim of its own success and we’re stuck between a number of groups (XHTML2, HTMLWG, WHATWG, browser manufacturers) that are at odds with one another.

    Which brings us back to a point made earlier in this post – one of distributed innovation on the Web.

    The RDFa community is going to continue to work on RDFa in HTML with or without Ian or WHATWG’s approval. We would much rather work with everybody involved as colleagues, in a friendly, productive atmosphere. We started the year out trying to do exactly that, only to have it fall apart after being stone-walled by Ian and friends. We will produce a version of RDFa for HTML and float it back to the HTML WG and WHATWG, and if it’s rejected again, we will continue undaunted.

    Ian has said that he’s not the end-all-be-all of HTML5 and on that statement he’s right. Technologies that are useful will be adopted because of the way open/remix/reuse nature of the Web. RDFa is proving to be useful. Only time will tell if its one of those fundamental pillars of the next generation Internet, but it’s off to a great start and we don’t plan on stopping any time soon.

    Viva la Revolución Semántica!

  2. [...] to see the full comic. The related blog entry fleshes out the basic complaint some more: Why is it that the person who is the center of this [...]

  3. Not to mention: Google has a great view of the current internet, as you mentioned, but what the HTML5 group is trying to do is create the future internet. Google’s data is near-useless, except to verify some specific things.

  4. I think data collected via any means should never be taken at face value – there are huge risks of unreasonable bias and of bugs. When it’s collected secretly with no visibility into the source or the methodology, it’s impossible for anyone else to examine the biases, and so nobody can ever put much faith in the data.

    But much of the data used when discussing HTML5 has been independently verified or collected – http://philip.html5.org/data/ has a (unorganised) collection of data derived from publicly-available sources (mostly http://dmoz.org/ and http://dotnetdotcom.org/). When it’s looking at similar data as Hixie’s secret Google collections, the results have always been similar, so there’s been no reason to believe there are significant errors.

    In the context of accessibility, there is e.g. http://philip.html5.org/data/table-summary-values-dotbot.html showing how it’s commonly used amongst that collection of pages, which has been referenced in discussions of how commonly the summary attribute has unhelpful values; I don’t remember Hixie quoting any private data about that issue.

  5. Many of the results that Hixie has got from looking at the Google index have been independently verified by others, for example using the dotbot data, Opera’s MAMA data, or by independent, smaller, crawls. For some of the most controversial topics in the design of HTML 5 the size of the population used to analyse existing markup is almost irrelevant because the individual documents have to be interpreted by humans anyway; therefore you need to work with a small set of documents that have the “right” selection bias rather than the whole population.

    Do you actually have any examples of results based on Google data that cannot be reproduced with other datasets?

  6. Google is so evil that they even managed to get me to do what I’m doing now for the several years of HTML5′s development that came before they employed me!

  7. I can not believe you missed the chance of featuring Sam Ruby as Dorothy’s Ruby slippers.

  8. Some people may be interested to know that the SVG WG added support in SVG Tiny 1.2 for the attributes used by RDFa and Microformats. We recognize that various kinds of metadata are really important to an SVG file, and that authors and developers and implementers are going to develop their own ways of using the provided attributes according to their needs, so we deliberately did not define, dictate, or overly constrain how the attribute can be used, though we did give a couple of examples of use and referenced both RDFa and Microformats.

    We are also working with the W3C Validator folks to make sure that this syntax is given an okay when validating, and hopefully to provide tips on best practices. To many people, the results of the Validator, and not necessarily the wording in the spec, are what serves as a guide to them.

    So, RDFa could be used in SVG content inline in HTML, if desired.

  9. I really like this site although I am slightly disappointed to find out that it is run by a dude called Kyle and not actually by a very clever squirrel.

  10. Philip, then Ian needs to use these external data sources from now on, rather than falling back on Google’s index. It would solve any concerns about Google’s overt influence.

    Well, no, not enough. There needs to be more editors to ensure an unbiased result. And more input into what shows up in the spec. The current HTML5 working effort is flawed in this regard.

    As for the results, basing any decision about HTML5 on collections of data and existing usage is flawed, because the results are applied inconsistently, and erratically. Philip your data shows that table summary is used incorrectly. Well, your results also show that HTML tables are also used incorrectly. But tables remain, while summary is challenged.

    What happened in HTLM5 was a stronger clarification of HTML tables. Why the same couldn’t be applied to summary is anyone’s guess.

    So the data is meaningless. It’s really nothing more than an excuse and a justification for decisions based more on personal opinion than any kind of “best practice”. But it makes folks feel good; oh look, I have all this _data_. I must be right, because I have all this _data_. See my _data_.

    Doug, you’re probably also aware that the accessibility folks are also looking at the use of SVG as a fallback method if the HTML5 specification continues its current course. In my opinion, it’s a real bastardization of SVG, though. But, as was mentioned with RDFa, when you hit walls, you have to use whatever works.

    Sad about the walls, though. Sad that so many groups are excluded from any real input into what’s added into the spec.

  11. To follow up on Shelley’s point regarding raw _data_: on June 3rd of this year, the WAI Protocols and Formats Working Group announced that a consensus decision had been reached regarding the summary attribute for tables. They stated: “We request the table summary tag be restored in HTML 5 as per previous communications”, and as part of their justification/response noted: “We reject the argument that summary should be removed from the HTML specification because it is not implemented on most web sites. We note that accessibility is poorly supported on most web sites. The wider web is not an example of good practice.”
    [ http://lists.w3.org/Archives/Public/www-archive/2009Jun/0026.html ]

    Will the HTML5 WG accept this consensus position and re-instate the @summary attribute to tables in HTML5? Or will they continue to ignore consensus recommendations from other Chartered W3C Working Groups because they “know better”? Time will tell I suppose…

  12. @Michael – I completely agree. The present snapshot can be informative, but isn’t very predictive.

    @Philip & Jgraham – Shelley’s point that these external data sources should be used, then, rather than Google’s closed-to-outsiders data. Firstly, it helps take the wind out of any Google influence concerns that may arise (such as mine). Secondly, it provides other concerned parties access to that info to help win them over (assuming it’s regarding an issue where the data is capable of justifying a decision, which I frequently think isn’t the case).

    @Ian – I knew it! Does their wickedness have no boundaries?

    @David – Which reminds me, I have no slippers on the squirrel! That needs fixing pronto. Although I think fitting Sam down there might be messy.

    @Doug – That is very awesome. I need to read up on that more.

    @Joshue – Perhaps the Squirrel invented Kyle Weems as a pseudonym to cover its trail…

    @Shelley & John – I think you two touch on some important parts of the issue. Shelley, I agree that a the very least by using outside, accessible data sets as primary sources (rather than Google’s private data) it helps at least with appearances. I also agree, though, that in many cases the data doesn’t help determine the merits of a given issue at all. John, I think the issue you highlighted helps illustrate the biggest problem here. If the HTML5 WG is incapable or unwilling to work with the other Working Groups on important issues like accessibility, then something drastic needs to be done to ensure HTML5 actually represents the needs and issues of all users, not just a few privileged users.

  13. Thank you so much for this article! I wondered all the time a) how one person could gain so much power in the working group, b) why it is so hard to gain consensus there, and c) why W3C management doesn’t replace this person with someone with better social skills. The answer to all three is: Google. Your article opened my eyes: of course reaching a consensus on HTML5 is a pain! What else do you expect from a company that is so obsessed by data driven processes as to test 41 shades of blue?!

  14. And the WhatWG has been addressing your concerns in their usual mature, open manner:

    http://krijnhoetmer.nl/irc-logs/whatwg/20090623

    Not surprising, though. After all, this is the group that is the poster child for the term, “passive aggressive”.

  15. Philip said “I don’t remember Hixie quoting any private data about that issue.”

    Shelley said “Philip, then Ian needs to use these external data sources from now on, rather than falling back on Google’s index. It would solve any concerns about Google’s overt influence.”

    Kyle said “@Philip & Jgraham – Shelley’s point that these external data sources should be used, then, rather than Google’s closed-to-outsiders data.”

    It’s interesting that you complain about Hixie using Google’s index when he has only been looking at other people’s data.

  16. Zcorpan, I was responding to Philip’s complete comment, not one sentence. But much of the data used when discussing HTML5 has been independently verified or collected…”

    As for Ian’s use of Google data, I suggest you refer to the WhatWG blog entry on longdesc.

  17. @Shelley – I can’t say I’m surprised at the direction of the IRC conversation, considering the content of my post. They’re not going to suddenly think I’m correct and they’re insane.

    @Philip, Zcorpan, and anyone else asking the same question – http://blog.whatwg.org/the-longdesc-lottery is a link to a post (brought up in this blog by Mattur last week during the discussion on accessibility). That data (harvested from Google) was specifically used (in lieu of the advice of accessibility experts who’ve worked with blind users) to support their opinion on longdesc attribute. As Michael stated above, all the indexing does is prove current usage (which as John said, in the case of accessibility low usage isn’t a surprise because accessibility is poorly supported by most web authors), not future trends, or the validity of certain techniques.

  18. And as Laura Carlson linked in the public-html mailing list, the page listing all of the issues about @summary also references Ian’s use of Google data:

    http://esw.w3.org/topic/HTML/SummaryForTABLE#head-46654c194f99ed943eb26573a6506818962826ee

  19. What it comes down to is this: a number of technologist are crunching data to support their hypothesis and/or to justify how they choose to implement (or not implement) certain features. As most of us know, you can make numbers (and ‘raw data’) say anything with the right kind of spin, and spin-meisters such as Mark Pilgrim (the author of the longdesc-lottery blog posting) are very good at doing just that. (In fact, if you read the entire IRC log at http://krijnhoetmer.nl/irc-logs/whatwg/20090623#l-187 you will note that Henri Sivonen asks where Mark’s propaganda posting have been of late).

    According to the publicly available data then, Mark Pilgrim recommends the use of @summary in tables [ http://diveintoaccessibility.org/day_20_providing_a_summary_for_tables.html ] and Ian Hickson indeed *advocates* using the @longdesc attribute [ http://www.hixie.ch/advocacy/alttext ] – yes, data mining can be fun, but as now ‘proven’, can be used to spin whatever story you wish.

    (It will be interesting to see what the IRC log does with this…)

  20. About longdesc: noted.

    “And as Laura Carlson linked in the public-html mailing list, the page listing all of the issues about @summary also references Ian’s use of Google data:”

    That page just references a general-purpose data mining Hixie did in 2005. It does not follow that Hixie used that data in decisions about summary=”".

    I have not seen Hixie reference any Google data when discussing summary=”".

  21. I haven’t referenced any internal data in years, actually; pretty much ever since Philip` and others started doing public research.

  22. John: Any reported data should be subjected to critical thought and discussion, and not accepted blindly, but nor should it be rejected blindly. If someone thinks the data itself is flawed, they should argue why and suggest ways of getting better data, but in the meantime it seems to be the best data we’ve got and it allows us to make more informed decisions than if we had no data.

    “Mark Pilgrim recommends the use of @summary in tables” – it’s easy to demonstrate that claim is flawed, by pointing at http://krijnhoetmer.nl/irc-logs/whatwg/20090604#l-706 (“my opinion on accessible markup has changed since i wrote “dive into accessibility”" … “the only effort has been in defining markup (usually, accessibility-specific markup) that purports to solve certain problems, but there’s no followup to determine if those particular solutions actually DO solve those particular problems”).

    As far as I’m aware (though I haven’t looked at it closely myself), the data shows that a random user viewing a random web page trying to read summary attributes will waste their time (getting an unhelpful value (“Layout Table”), getting garbage (“Ripple.AutoProg.Unit3.MakeTree2.SetCategory”, “pid5673054″)) much more often than they’ll get something really useful. That’s important information for us to know, since it tells us there’s scope for improving overall accessibility across the web if we can encourage authors to use different markup that will have a higher proportion of useful content (perhaps by telling authors to make the text visible to all users, so they’re much less likely to stick garbage in there) without greatly reducing the absolute amount of useful content. There’s lots of room for interpretation and judgement calls in deciding what would be the optimal markup for the spec to suggest, but the data itself seems valid regardless of any opinions or spin, and any decision should take it into account.

  23. philip taylor wrote:
    “As far as I’m aware (though I haven’t looked at it closely myself), the data shows that a random user viewing a random web page trying to read summary attributes will waste their time (getting an unhelpful value (”Layout Table”), getting garbage (”Ripple.AutoProg.Unit3.MakeTree2.SetCategory”, “pid5673054″)) much more often than they’ll get something really useful.”

    The vast majority of bogus values will not be heard by screen reader users that support summary, as they are on layout tables, and screen readers that support summary also filter out layout tables. So when a user does hear a summary announced it will most probably be useful, although this will be a rare occurrence.

  24. Steve: Ah, I forgot about that. That seems like a good example of how the data could (and should) be improved, by filtering out layout tables similarly to how screen readers do it and then seeing what’s left over. (I haven’t been keeping track of all the list discussions – has anyone pointed to a description/implementation of an algorithm that would be similar to modern screen readers? It’d be nice to have that implemented in Java so it can be run over a wide set of pages to see what overall difference it makes in practice.)

  25. @Philip Taylor: Overall, I am in agreement with what you are saying – I deliberately sought out ‘conclusive proof’ that @summary and @longdesc should remain in HTML5, as two of the main spokespersons for the WHAT WG wrote as such, simply to prove that raw data alone is not that useful. For what it’s worth, my feelings regarding longdesc have not change much, although I recognize that it is both misunderstood and under-utilized (and I did not need to crawl Google’s index to arrive at that conclusion); perhaps with the aria-describedby attribute we will have a better (or second) opportunity to get the education part right. I hope. Meanwhile, I remain with the belief that we should continue to support an accessibility attribute until such time as we can prove that it’s replacement is in fact replacing the older attribute… like all languages, it should just fade from usage, rather than be decreed abolished.

    One of the key things you stated however was: “There’s lots of room for interpretation and judgment calls in deciding what would be the optimal markup for the spec to suggest…” and it is here that I and others take umbridge.

    The interpretation of the data (and reasons for, as well as possible solutions to) did not happen as a dialog between end users & subject matter experts and the specification authors, but instead was decided upon by a small group of people who emerge with a “solution” that we must simply accept. This is fundamentally wrong, and is the source of my continued disdain for the entire process. We are expected to take the ‘expert’ interpretation of the success or failure of accessibility features simply on the say-so of data interpretation by a few select people. When a larger collection of subject matter experts review the situation(s), the data available to them, and then discuss it in an open and transparent way – AND THEN ARRIVE ON A CONSENSUS POSITION – only to have it derisively dismissed on the IRC channel within minutes of the recommendation being publicly released… No, there is a real problem here that is quite obvious to even those who have arrived late to the discussion, or are simply interested by-standers watching this process.

    For data to be properly interpreted requires a certain amount of subject matter experience and expertise. And quite frankly, while Ian and Anne and Lachlan and Mark and Maciej and Henri and yes you (to name but a few of the more outspoken contributors) have considerable knowledge and experience with web technologies, when it comes to web accessibility you are not yet experts – oh sure, you might have opinions (who doesn’t), but often those opinions are not fully based upon real experience. And that’s OK, nobody can be an expert at everything and web accessibility is complex, nuanced and often subjective. But make no mistake about it, as those folks that vocally set aside (or dismiss) the expert opinion of recognized practitioners, advocates, experts and end users, they are, in effect, claiming to be more knowledgeable than those actual experts. This is the real root of the disagreement – who is better placed to interpret the ‘data’ and make recommendations?

  26. Power corrupts: even those who are able to separate their personal opinions can be part of a biased process:

    http://masinter.blogspot.com/2009/05/structural-bias-standards-and-elsewhere.html

  27. [...] for the browser community to move towards HTML 5. To the extent where some are accusing them of manipulating the standards process to get what they want in. With HTML 5, multimedia capabilities would be built right into the [...]

  28. @Kyle: “If the HTML5 WG is incapable or unwilling to work with the other Working Groups on important issues like accessibility”

    Nonono, you mean to say if Ian Hickson is incapable or unwilling. After all, he is the sole dictator over the HTML5 specification, so if he doesn’t want something, good luck trying to convince him. Ridiculous.