Simple and Readable Text Markup Languages vs. Rich Web Text Editing

David A. Wheeler

2011-05-19 (originally May 3, 2005)

Trend and Counter-Trend

There's a new(?) trend that shows that everything old really is sometimes new again. What's the trend? Simple, highly readable markup languages. And it has an "opponent", which is also not exactly new: web text editors. Yet they seem to be both coming to the fore, which is quite interesting.

Lightweight Markup Languages

In some situations, typical document formats (such as OpenDocument, Word .doc format, or even editing HTML directly) simply don't work well. This includes massive collaboration over an internet (such as with Wikipedia or other Wiki sites), or for creating relatively simple/short documents (such as for bloggers). Although existing markup languages like DocBook, HTML/XHTML, LaTex, and nroff/man all work, they're often complicated to write and read. You could use SGML or XML to create your own markup language, but that doesn't really address the need for simplicity. None of these work very well if you expect to have many users who don't really understand computers deeply (HTML comes closest, but complicated HTML documents become unreadable in a hurry).

Lightweight markup languages have resurged to meet these needs. These are markup languages that are really easy to read and write, which can then be automatically translated to other formats. Sadly, there are lots of slightly different ones. Capable examples include:

  1. AsciiDoc looks very reasonable if you want to create documents or websites; it can generate HTML, XML, DocBook, PDF, and man pages (DocBook can in turn generate other formats; it can also generate the obsolete LinuxDoc format). This is no trivial capability; it can handle cross-links, tables, and so on. Technically, AsciiDoc processing requires an implementation to look ahead to the next line to understand text; some find this annoying, but if it makes the language easy to read, I think that's quite reasonable.
  2. Markdown
  3. reStructuredText
  4. Wikipedia's markup language (supported by MediaWiki) has grown a lot of capabilities (to support creating an encyclopedia), yet it's still easy to use (and thus is a really capable example of this). There are a vast number of users of this notation, but setting up a processor for it isn't so easy.

The various Wiki languages, such as MoinMoin's, etc., are also examples of this. But there are a lot of different ones, all incompatible. Here's some text on StructuredText, ReStructuredText, and WikiText. Many Wiki languages use CamelCase to create links, unfortunately; a lot of people (including me) find that convention ugly and awkward (MediaWiki dumped CamelCase years ago; MediaWiki internal links look like this: [[MediaWiki link]]). Most Wiki languages are too limiting for wider use.

No doubt there are others. One I learned about recently is Markdown. Markdown is a notation for simply writing text and generating HTML or XHTML; it seems to be focused on helping bloggers.

Anyway, it's an interesting trend! In the long term, I hope that there will eventually be a merger and a "standard" simple markup language (or at least be a standard subset that works in many places). Wikipedia is so widely used that it might a good basis for that standard. AsciiDoc is similar in some ways, but in a few ways it's radically different; on the other hand, it's highly configurable, so that could be done. There is a completely different alternative, though: rich web text editing.

Rich Web Text Editing

Of course, nobody says you have to look at that markup directly. If you didn't require people to load programs, and could instead just edit text directly in their web browser, that wouldn't be so bad. Thus rich web text editing tools let you edit files directly from inside your web browser, without users having to "install" a program. In almost all cases, what's being edited is HTML or XML, but the user doesn't need to see HTML/XML directly.

Examples of such tools include Kupu ( here's an O'ReillyNet review of Kupu), fskeditor (an unfortunate name; fck are the programmer's initials), and tinmce.

There's also Walter; this is much less functional, but it has an interesting technical approach. Most alternatives depend on a new feature called "caret browsing", while Walter doesn't; it reads key events and the DOM interface, which means it should work on older browsers. But it doesn't work as well with newer browsers, so it's not clear that this will go anywhere.

A long time ago I publicly proposed that browsers should add support to the TEXTAREA tag, so that TEXTAREA type="html" would enable automatic editing of HTML text. I still think that would be a good idea! And it'd be upward-compatible, too. But these tools will let people work in a nicer way until browsers have this capability built-in.

It's a Big World

It's a big world, and I believe there's room for both of these approaches as well as the more traditional approaches. In fact, both simple/readable text markup languages and rich web text editing typically depend on more traditional approaches. Simple and readable text markup languages seem to be uniformly implemented by internally translating to a more traditional language, and rich web text editing internally uses a traditional markup language as well. But if they help, hopefully the above info will help you find information.


Feel free to see my home page at http://www.dwheeler.com.