Utilities for HTML & XHTML Revalidation

Utilities for HTML & XHTML Revalidation

Loading Social Plug-ins...
Language: English
Save to myLibrary Download PDF
Go to Page # Page of 9

Description: Some PostScript utilities for HTML and XHTML Revalidation. Dramatically speeds up ampersand repairs, "search and destroy" substitution for modern compatibility, multi page scripting, and JavaScript issues. Adaptable to modifying most any uncompressed textfile in any language.


Author: Don Lancaster (Fellow) | Visits: 3004 | Page Views: 3023
Domain:  High Tech Category: Displays Subcategory: webmastering 
Upload Date:
Short URL: https://www.wesrch.com/electronics/pdfEL1NABLMIIMHH

px *        px *

* Default width and height in pixels. Change it to your required dimensions.

Some PostScript Utilities for HTML and XHTML Revalidation
Don Lancaster Synergetics, Box 809, Thatcher, AZ 85552 copyright c2009 pub 11/09 as GuruGram #102 http://www.tinaja.com don@tinaja.com (928) 428-4073

One of the ruder surprises of the web is that different browsers tend to display
in different manners. Some may allow unique custom features, while others

choke on them. As the web has aged and newer and better standards have emerged, the rules have gotten more and more strict. In particular, HTML 4.0 and XTML now demand that...

Most commands are now case sensitive. Most data must be quote bracketed. Most commands must be lower case. "LOWSRC" is no longer permitted. "alt=" on images is now mandatory. Text ampersands must be in "&" format has largely supplanted . Some commands (such as ) must now self-delimit. Id's have largely replaced names. Id's and names have to start with a letter. Delimiting spaces are now often mandatory. JavaScript interpretation is now stricter.

A very useful validator can be found here. I was rudely surprised to find my older web pages generating thousands and even tens of thousands of errors per page. After manually correcting a few pages, I decided that most of the revalidation and verification could easily be handled by some hand written PostScript utilities. As we have seen countless times in the past, PostScript excels as a General Purpose Computing Language when its unique features can be properly exploited. In particular, PostScript is especially adept at modifying most any uncompressed text based disk file written in virtually any other computer language.
-- 102 . 1 --

In this GuruGram, we will explore a few ways that PostScript can dramatically speed up and simplify reverification of older website content to newer html and xhtml standards. Much more on our PostScript utilities appears here.

Repairing Ampersands
An ampersand is used as an "escape" character in both HTML and XHTML. Ferinstance   creates a nonbreaking space, while > gives you a "greater than" closing carat text character not to be used as a command delimiter. When a lone ampersand was found in an earlier browser, it was guessed to be a printing character. But such guesses are not permitted in current HTML or XHTML.
All printing ampersands must now be shown in their & format.

Repairing ampersands gets ugly in a hurry. Lone ampersands are quite common in URL's such as eBay Listings or Acme Mapper locations, among many others. But only those ampersands that are not followed by a semicolon within a few characters should get corrected. To make matters worse, there is an insidious bug in DreamWeaver that may
change all of your ampersands back the way they were hours after you fixed them! If you must use DreamWeaver to change ampersands, ALWAYS close your file immediately afterward and NEVER click on the refresh button.

Instead, a simple and versatile utility can be created in PostScript that opens any file, inspects each ampersand to make sure there in no semicolon following in the next few characters, and then alters only those that need changed. One example program is FIXAMPS1.PSL. It simply reads one character at a time of an HTML or XHTML file and tests to see if a correction is needed. The high level code looks something like this...

/correctampersands { /readfilename fileheader infilename mergestr store /readfile readfilename (r) file store /writefilename fileheader outfilename mergestr store /writefile writefilename (w+) file store 0 1 10000000 { readfile (x) readstring not {exit} if /curchar exch store writefile curchar writestring testforampersand} repeat readfile closefile writefile closefile} store

-- 102 . 2 --

... while the substitution utility is...

/testforampersand {curchar (&) eq { readfile bytesavailable 8 gt { /curposn readfile fileposition store readfile (x) readstring pop (;) eq readfile (x) readstring pop (;) eq or readfile (x) readstring pop (;) eq or readfile (x) readstring pop (;) eq or readfile (x) readstring pop (;) eq or readfile (x) readstring pop (;) eq or readfile (x) readstring pop (;) eq or not {writefile (amp;) writestring } if readfile curposn setfileposition} if } if } store

If the present character is an ampersand and if none of the next few characters are a semicolon, then a new amp; is written between the existing ampersand and the continuing text. One known bug is that ampersands inside a Visual Basic script internal to an .asp file will need separate attention as they must not be changed. Our Banner Rotator uses a VB script line of pattern = a(0) & a(1) ... & a(8) which must remain intact. Such exceptions are very rare and easily dealt with.

"Search and Destroy" Phrase Substitution
A different approach can be used to try and repair the majority of earlier HTML errors. In which big chunks of the code are bulk scanned for problem phrases. Should a problem phrase be found, it can be replaced with corrected code. We might call any earlier phrase Wuz and the replacement phrase Wilby. Now, some Wuz phrases will be generic and common to most all early HTML. Such as converting a to a . Others will be specific to your web page style. Such as changing a color=$FFCC99 to a color="FFCC99" . Such substitutions will be useful only if this particular color is of importance in your page layouts. It is important to decide how much generic and how much specific code you wish to correct. In general, taking out most of the errors with an automated routine will greatly simplify and speed up your revalidation. But, try for a perfect repair and you will end up spending much more time coding and testing than you would fixing the problems in the first place.
-- 102 . 3 --

AUTOVAL1.PSL is an example of a PostScript phrase substituter. It can be used on most any uncompressed text file in most any language, but clearly excels at html and xhtml repair.

At present, the repairs take place on sequential 40K strings. These are all long enough for surprisingly fast operation but still stay within a 64K limit should your wilby strings add to your file size. A srpairs scripting data file is first created listing your possible wuz and wilby substitutions. This is an array of form [ [(wuz1)(wilby1)] ... [(wuzn)(wilbyn)]] The wuz and wilbys can be a mix of your generic and specific code. Such as this partial example...
/srpairs [ [ ()() ] [ ()() ] [ ()()] [ ()()] [ ()() ] [ ()() ] [ (