by Jim Brain (brain@mail.msen.com)Note: Due to the recent relocation of myself and my family, I am behind on the development of the HTML viewer for the Commodore system. Therefore, this article will not focus on the actual viewer. With the development below 50% complete, the modules are subject to change. Describing them now would only confuse issues.
HTML. This simplistic acronym, unknown to most people before 1993, now
forms the heart of discussions. Its status is secured, as employers ask
how much "experience" one has with it, and resumes commonly include it.
A quick tally in any technical magazine reveals hundreds of references
to it, and trips to the bookstore yield mountains of titles referring to
it.
Most Commodore owners have a few questions about this acronym. First, what is it? Second, why should I care about it? In this series of articles, I will try to answer both questions to your satisfaction.
To answer the first question, let's step back to explain the World Wide Web (WWW). This explanation is not designed to replace more thorough treatments of the subject. In 1991, while working as a researcher at the CERN laboratory in Switzerland, Tim Berners-Lee developed a hypertext information retrieval system that allowed researchers at the lab to design informative online "presentations" of their work. In each presentation, a researcher could reference a document or presentation located elsewhere on the lab-wide network of computers. This reference was "live", meaning that a person could select it from the document and immediately view the referenced document. Thus, a matrix of related documents were created to interconnect the researchers' work.
In an effort to offer the researchers great latitude in presenting their
works while retaining some standard in layout, Berners-Lee found simple
ASCII text an inadequate presentation method. Clearly, a document
formatting procedure, or "markup language" was needed. However,
Berners-Lee found that popular document markup languages did not support
the concept of referencing, or "linking" between documents in a standard
and non-proprietary way. After looking past popular approaches like
Windows help files, troff, TeX, and Rich Text Format, Berners-Lee found
a standardized markup language that would support links and provide
flexibility in creating documents, yet retain some semblance of
commonality. The language was the Standard Generalized Markup Language
(SGML).
SGML in itself was derived from an IBM specific markup language called
Generalized Markup Language (GML). After some minor changes, the IBM
GML specification became standardized. SGML, though, represents more
than a simple formatting schema. SGML allows one to create multiple
derived markup languages off the SGML base, and a suitable program can
interpret each derived language independently. Thus, HTML functions as
a derivation of SGML.
Berners-Lee created the original specification for HTML while working on
the WWW framework. Since mid 1993, when the first graphical HTML
viewer arrived from the University of Illinois, the HTML specification
has been revised and updated at least 4 times, but remains an SGML
derived language.
HTML, like most formatting or document markup languages, allows the
document creator to insert special labels, or "tags" into the document,
which the language processor can parse. The language processor then
converts these tags into the special formatting options they represent.
In a simplistic markup language, one might place an asterisk "*" next to
any word to be highlighted. As the "marked up" document is read and
parse by the language processor, the resulting output would highlight
each word preceded by an asterisk. The asterisk itself would be
stripped from the resulting display, as it does not form part of the
document itself. In much the same way, HTML allows creators to insert
HTML tags into the document being formatted. An HTML display system
(commonly called an HTML viewer if the document is local or an HTML
browser if the document can be accessed from a remote location) then
parses the tags and renders the presentation of the document on a
suitable display.
HTML tags come in pairs. For each "open" tag, there is a corresponding "close" tag. All tags are simple ASCII words or letters preceded by a less-than "<" character, and followed by a greater-than ">" character. A simple tag is "HTML", which tells the browser that the document to follow is marked up in HTML. This tag takes the form:
<HTML>
Since tags are not case sensitive, <html> can be used as well. This tag
is the HTML open tag, and it has a corresponding close tag. In HTML, a
close tag is formed by inserting a slash "/" character after the less-
than character and before the tag name. Thus, </HTML> would form the
close HTML tag.
Some tags require optional information. This information is included after the tag name and before the greater-than character. Such tags include IMG, which instructs the HTML display system to load and display a graphics element at the present location. Since the location and name of the graphics element is needed, it is included as an "attribute" in the tag. To display a photo called jim.gif, I would include:
<IMG SRC=jim.gif>
in my document. Notice the space between the tag name and the attribute
name. That space is necessary.
IMG does indeed have a corresponding close tag, but since IMG doesn't
turn something on that must be turned off, the closing tag is seldom
used. That forms the basis for using closing tags. Opening tags that
"turn-on" a formatting style require closing tags. For opening tags
that do not "turn-on" a formatting style, closing them off is optional.
Of course, exceptions exist, but you'll rarely go wrong marking up with
this rule in mind.
The following tags are considered basic since they implement either the
essential or often used formatting options available in HTML. Each
opening tag is listed in its HTML form, and a description of the tag is
given:
Tag Description <html> begins an HTML document <head> specifies the heading information (title, etc.) <body> specifies the body of the document (information) <p> Inserts a paragraph. <hX> Renders the following text in heading size X. 1 <= X <= 6. H1 is largest, while H6 is smallest <br> Line break <title> Specifies the title of the document <hr> horizontal rule (line across document) <strong> Emphasize text strongly (typically rendered as bold text) <em> Emphasize text (typically rendered as italics)Remember, this is but a few of the possible tags.
In HTML, HTML documents are referred to as "pages", and each page is constructed as a simple ASCII or ISO 8859-1 (superset of ASCII) text file. No preprocessing is necessary. This makes creating documents as easy as editing a text document. HTML files are typically given the file extension ".html", and IBM PC computers running MS-DOS typically shorten this to ".htm" due to DOS limitations. However, the former extension is most correct. Although fancy HTML generation applications exist, most people on all platforms simply create pages using a text editor. Since Commodore owners can usually find a text editor, Commodore enthusiasts can create pages just as easily as anyone. Additionally, the WWW and HTML encourage writers to create small pages, and break up large documents into linked pages of smaller sizes. Typically, HTML documents are less than 10 kilobytes in length. At that size, even an expanded VIC-20 can create full size HTML pages.
Let's create our first document. Edit a file called template.html and
place the following text inside it:
<html> <head> <title>This is an HTML title</title> </head> <body> <h1>This is an example of Heading 1</h1> This is a paragraph. <p> This is another paragraph. <br> I want you to see this next sentence. <strong>Therefore, I am strongly emphasizing it</strong>. Now we are back to normal. This sentence is below the last in the source, but will appear following it when displayed. </body> </html>
Notice which tags require closing. Also, notice how <HEAD> and <BODY> are used in the document. Notice the two final sentences in the above example. The sentences appear on different lines in the document, but HTML specifies that all carriage returns will be translated into spaces. It further specifies that if multiple spaces exist in a file, they will be reduced to a single space. Thus, using spaces as alignment helps will not work in HTML. Likewise, using linefeeds and carriage returns to specify alignment will also fail. If a new line is necessary, use <p>, which will leave a blank line, or <br>, which start a new line.
This is an interesting question, and I hope you agree with my answer. Many claim that HTML is useless to the Commodore owner since the Commodore can't display HTML. While I am not even sure that is true, (I've heard of simple HTML viewer programs for the 128), it doesn't matter. Commodore owners who access the Internet from a "shell" account can access the World Wide Web via the "Lynx" text browser. Since the WWW is constructed of HTML pages, those Commodore owners can indeed view HTML files while online. Many Commodore enthusiasts possess useful information. Putting that information on the Internet via HTML and WWW makes it widely available to other Commodore and non-Commodore computer owners. Why worry about the latter? You'd be surprised how many former Commodore owners are coming back into the fold after viewing some Commodore HTML pages. The information on those pages triggers fond memories. Many fire off messages inquiring about purchasing a new or used CBM machine after seeing these pages.
To the naysayers, I submit that there is nothing PC-centric in the HTML standard. If an HTML viewer doesn't yet exist, it has nothing to do with the computer system. As HTML was created to allow successful operation over many different computer systems and graphics capabilities, HTML encourages usage on computer systems like the Commodore, where there are limitations in display size and resolution.
In fact, the Commodore community should embrace HTML as a markup
language, for it represents a standard way to effectively mark up
documentation for viewing on a variety of computer systems. Using HTML
opens up a whole set of possibilities for easily created, standardized
documentation publication.
Disk magazines, like _LOADSTAR_, _DRIVEN_, _VISION_, and _COMMODORE CEE_, could produce issues that contain more layout information than now offered. Since the viewer would now be standardized, these publications could possibly forego the distribution of the viewer software and offer more content in the extra space on disk. A side benefit is the ability for Commodore users to read each issue on any platform. Possibly you'll never need to read LOADSTAR 128 Quarterly on an IBM PC, but what about reading it on a 64, while your sole 128 does something else? Moving to HTML would shift a disk magazine's focus and concern from the presentation, which would become standard, to content, which is why Commodore owners read such magazine anyway. How many times has otherwise great information been presented badly in a disk magazine? Use of HTML could help alleviate that problem. Publishing a disk magazine is time consuming because not only must editors work on the articles themselves, they must also write the software that presents the articles to the viewer. Using HTML and a pre-written browser would allow editors to spend more time on laying out and editing articles.
Disk magazines aren't the only winners here. Have you ever wanted to create a small publication? The use of HTML and a third-party HTML viewer makes it easy for you to do so. Just like the editors of bigger publications, HTML allows you to concentrate on presenting your information without worrying about writing the presenter software. Now, obviously not everyone should publish their own magazine, but how about help files, information disks, software documentation, club newsletters, etc.? These publications can all benefit from this technology.
These are but a few of the benefits of switching to HTML for document
layout. Other uses include upward compatible support. Using HTML
allows the Commodore 128 user to view documents created for the 64 in 80
columns by 50 rows. C128D owners can take advantage of their 64kB video
RAM even when viewing documents created on 16kB video RAM C128s.
Publishers would no longer be constrained by lowest common denominator
support. They can now include whatever they want and be assured that
the presentation will look fine on all platforms. When a user upgrades
his machine, he or she can immediate utilize those new features without
requesting a new version of the publication. Also, for software, even
though the software itself might differ by machine, the online
documentation need be written only once. As well, never forget that
marking up in HTML makes migrating your documents to the Internet and
the WWW a snap!
Obviously, before Commodore users can reap the benefits of HTML, we must create both a HTML generator and a viewer. The generator is easy, as HTML is simply ASCII text files. So, we are left to design and implement an HTML viewer. The following conditions should be met:
Although we intend to develop a viewer that supports the above, our initial development will operate on a much smaller scale. The first revision of this viewer will operate on the stock machine and will contain support for the basic HTML tags as outlined above. Our design will allow us to extend the capabilities to encompass our goals.
I am not very good at drawing execution flows, and the native format of this magazine doesn't lend itself well to them, anyway. Therefore, I will simply describe the execution flow.
The viewer will start by asking the user for a document to access. If the file does not exist, an error is printed and the user is asked again. If the file exists, the viewer will begin reading it. If a tag is found, the tag should be acted upon. If text is loaded, it should be displayed on the screen using the current markup controls unless the control information is incomplete. In this case, the text should be stored for later display. The file should be parsed in this way, until the end is found. Then, the system will wait for either the user to select a link or type in a new document to view.
Most of the time, text can be displayed as soon as it is received. However, there are exceptions. Some tags, like the <TABLE> tag, which creates a table on the screen, require that all the data in the table be known before the table cell information can be calculated. In cases like these, we must store the data and wait for the </table> tag.
The above flow explanation ignores some subtleties like carriage return stripping and multiple space reduction. Those are left out because at least one tag, the <PRE> tag (preformatted text) overrides those rules. <PRE> text is displayed in a monospaced font exactly as it is prepared in the document. Text is not wrapped, and spaces are not reduced. So, we will make those formatting options that are normally turned on.
I regret that we haven't gotten very far in the development process with this installment, but we'll make up for lost time in the next installment. One thing that I would like to encourage from readers is comments and suggestions. Do you see a problem with some of the above information? Do you have a better way to parse some of the information? Do you see limitations in the data structures? Since we haven't delved into some of these aspects yet, do you have some ideas of your own? I can guarantee that I'm ready to discuss them with you; however, I can't read your mind. I think it's important that this project be completed, as it forms the core of a successful WWW browser, and I see everyone wanting to know when one will be available. I am less concerned that my name appear on the finished product. In fact, I think a product that draws on the talent of the entire Commodore community would most likely exceed the quality a single individual can afford a piece of software. So, fire up those assemblers and put on those thinking caps.
Copyright © 1992 - 1997 Commodore Hacking
Commodore, CBM, its respective computer system names, and the CBM logo are either registered trademarks or trademarks of ESCOM GmbH or VISCorp in the United States and/or other countries. Neither ESCOM nor VISCorp endorse or are affiliated with Commodore Hacking.
Commodore Hacking is published by:
Brain Innovations, Inc.
Last Updated: 1997-03-11 by Jim Brain