Programming Assignment 3

DOM Tree

In this assignment you will implement an HTML Document Object Model (DOM) Tree.
Browsers create DOM trees for web pages, to facilitate the manipulation of HTML entities by client-side code such as Javascript.

Worth 100 points

Due Tue, Aug 6, 11:00 PM (NO GRACE PERIOD)





Summary

You will write an application to build a Document Object Model (DOM) for a given HTML file, and process it with a set of given input commands that will transform the tree.


Document Object Model

The Document Object Model (DOM) is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page. (This is quoted from the W3C Document Object Model specification web page.)

When you access a web page, the browser builds a DOM tree structure for the HTML content in the page.

Consider the following HTML example:

<html>
<body>
<center>The <em>quick</em> brown fox</center>
</body>
</html>
which is rendered by a browser as:
The quick brown fox
The DOM tree for this HTML content would look like this:
         
         html
          |
         body
          |
        center
          |
    --------------
    |     |      |     
   The    em   brown fox
          |     
        quick   
Note: There is a space character after "The", and before "brown" so the corresponding nodes of the tree actually store "The " and " brown fox".

You can see the DOM tree for any HTML document in the browser. In Chrome, go to View --> Developer --> Inspect Elements. In Firefox, go to Tools -> Web Developer -> Inspector.

Open the sample.html file for the above example in your browser, and look at its DOM. You will notice that under the root html node, in addition to the body node, there is also a head node (with nothing under it), which corresponds to the head tag. The head tag is not used in the example above, and is automatically supplied by the browser. The HTML documents in this assignment will not have a head tag. You will also notice the trailing space character included in "The " and the leading space character included in " brown fox".

Web developers can build a "dynamic" HTML page by embedding, say, Javascript code in the page, that can manipulate the DOM tree. For example, such code can show or hide individual columns in an HTML table upon user request.

In the example above, you could have Javascript code within the page make calls to the DOM tree, to replace the em tag with the b tag, and remove the center tag, for the following result:

         
         html
          |
         body
          |
    --------------
    |     |      |     
  The     b      brown fox
          |     
        quick   
The browser keeps track, and renders the tree onto the web page like this:
The quick brown fox

Bring up try.html in your browser. This page has javascript code embedded in HTML, that replaces em with b or vice versa when you click a button. It does this by changing the DOM tree representation of the page.

You can view the web page's HTML source using the keyboard shortcut option-command-u (Mac) or ctrl-u (Windows)


Restricted HTML for Assignment

For this assignment, you will work with the following restricted set of HTML tags:

htmltop level
bodysecond level
pparagraph
ememphasis (italics)
bboldface
tabletable
trrow (within table)
tdcolumn (within tr)
olordered (numbered) list
ulunnumbered list
lilist item (within ol or ul)

Moreover, the format of the HTML document itself (supplied through an input file) will be restricted to the following:

Examples

Following are some sample HTML pages. Click on a link to see the page, and view the page source to see the underlying HTML:

ex1.htmlp, em, and table tags
ex2.htmlol and ul tags
ex3.htmlNested ol and ul tags


DOM Tree Structure

Since the nodes in the DOM tree have varying numbers of children, the structure is built using linked lists in which each node has three fields: the tag (which can be an HTML tag or plain text), the first child (which is null if the tag is plain text), and the sibling, which is a pointer to the next sibling.

As an example, consider the following input HTML:

<html>
<body>
The 
<b>
quick
</b>
 brown fox
</body>
</html>
(The third line is actually "The ", i.e. trailing blank space, and the "brown fox" line has a leading space before "brown".)

The DOM tree implementation for this HTML would look like this:

    ----------
    |html| |\|
    ------|---
          |
          V
       ----------
       |body| |\|
       ------|---
             |
             V
          ----------   -------   ----------------
          |The |\|-|-> |b| |-|-> | brown fox|\|\|
          ----------   ---|---   ----------------
                          |
                          V
                       -----------
                       |quick|\|\|
                       -----------

Note: Tree nodes containing tags do NOT include angle brackets with the tags.
So, if a node stores the tag b, it stores the string "b", NOT the string "<em>".
Also, closing tags (e.g. "/em") are NOT be stored in the tree.
If your tree violates these requirements, it will be considered incorrect!


Implementation

Download the attached domtree_project.zip file to your computer. DO NOT unzip it. Instead, follow the instructions on the Eclipse page under the section "Importing a Zipped Project into Eclipse" to get the entire project into your Eclipse workspace.

You will see a project called DOM Tree with the following classes:

There are also a number of sample test files directly under the project folder (see the Examples section that follows.)

Note: You are not required to use the included Stack class. But if you do use a stack, you MUST use this class (not your own).

You will implement the following methods in the Tree class:

Examples of Transformations

Here are examples of applying transformations to some HTML pages, and the resulting HTML (look inside the source files as well):

InputTransformationResult
ex1.htmlreplace em with b ex1tr1.html
ex1.htmldelete em ex1tr2.html
ex2.htmldelete ol ex2tr1.html
ex3.htmldelete ol ex3tr1.html
ex1.htmlboldface 2 ex1tr3.html
ex2.htmladd em around item ex2tr2.html

Rules of implementation


Running the Program

The DOM class in the apps package is the application driver. When you run it, it asks for an input html file name. It creates a scanner for this file, which it then passes to the Tree constructor. Then it calls the Tree build method to build the DOM tree.

Following this, it goes into a loop, and presents a menu of actions from which you can pick whether you print the tree structure, or the HTML generated from the tree, or do one of the transformations.

Here's a sample run with the tree built out of the ex1.html input file:

Enter HTML file name => ex1.html

Choose action: (p)rint Tree, (h)tml, (r)eplace tag, (b)oldface row, (d)elete tag, (a)dd tag, or (q)uit? => p

     html
      |----body
            |----p
                  |----A line of non-tagged text.
            |----p
                  |----A
                  |----em
                        |----new
                  |----paragraph.
            |----table
                  |----tr
                        |----td
                              |----em
                                    |----R1C1
                        |----td
                              |----R1C2
                  |----tr
                        |----td
                              |----R2C1
                        |----td
                              |----R2C2

NOTE: The tree structure printed via the print() method as above is the test for correctness.

Choose action: (p)rint Tree, (h)tml, (r)eplace tag, (b)oldface row, (d)elete tag, (a)dd tag, or (q)uit? => h

<html>
<body>
<p>
A line of non-tagged text.
</p>
<p>
A
<em>
new
</em>
paragraph.
</p>
<table>
<tr>
<td>
<em>
R1C1
</em>
</td>
<td>
R1C2
</td>
</tr>
<tr>
<td>
R2C1
</td>
<td>
R2C2
</td>
</tr>
</table>
</body>
</html>

Choose action: (p)rint Tree, (h)tml, (r)eplace tag, (b)oldface row, (d)elete tag, (a)dd tag, or (q)uit? => q
CAUTION: The HTML generated off the tree using the getHTML() method as above does NOT by itself guarantee correctness of the tree (the same HTML can be generated off different trees.) But after verifying with the print()ed tree structure, you can use the HTML to see the result in a browser page.

Submission

Submit your Tree.java file ONLY.

(If you submit a .class or .zip, or any other file, you will get no credit even if you have a working version on your computer. We will only grade whatever is in Sakai.)


Grading

All input files will be correctly formatted, so you don't have to do any error checking for input format.

Every transformation that is applied will be on a legitimate tree structure, where the transformation should work without failure.

Your build method will be graded first. After that, all other methods will be graded using our correct build implementation.

Each of the non-build methods will be graded using a tree built from scratch. This means every test case is treated separately, so that it does not depend on any previous test case's transformation of the tree.

Since every method you will implement results in building or modifying the DOM tree structure, we will grade by examing the tree structure that results at the end of each of your methods. Starting with root instance field, the grading process will traverse your tree structure and compare it with the expected structure. (As stated earlier in the Running the Program section, we will NOT be looking at the HTML printout of the tree.)

To check the tree structure as you go, you may either use the print() method, or you may want to use the Eclipse debugger to navigate through the tree nodes. There is comprehensive information on how to use the debugger in Eclipse under Help -> Help Contents -> Java development user guide -> Tasks -> Running and Debugging, and Help -> Help Contents -> Java development user guide -> References -> Views -> Variables View. Essentially, you need to know how to set breakpoints and examine variable values when the program stops at a breakpoint. (And of course, there are YouTube videos on using the debugger in Eclipse.)