This is a semi-literate document for the program semlit.pl and it's shell wrapper semlit.sh. This program is used to create semi-literate documentation (of which this document is an example). Semi-literate documents like this are intended to explain the internals of a program, which is different than user documentation. If you are an end user, you probably want the user document.
Copyright 2012, 2015 Steven Ford http://geeky-boy.com and licensed "public domain" style under:
To the extent possible under law, the contributors to this project
have waived all copyright and related or neighboring rights to
this work. This work is published from: United States. The project
home is https://github.com/fordsfords/semlit/tree/gh-pages.
To contact me, Steve Ford, project owner, you can find my email
address at http://geeky-boy.com.
Can't see it? Keep looking.
This document describes the internals of the semlit.pl
program, and is intended to be read by a programmer who wants to
understand, maintain, and perhaps reuse the code. Before
reading this documentation, you are expected to have a good
user-level understanding of SEMLIT. Please be familiar user
document with before starting this, unless you are
just getting a feel for what SEMLIT documentation is like.
The main program is written in Perl. The reader is assumed to
have at least entry-level
knowledge of Perl. Some of the more-advanced Perl constructs
are explained for the benefit of the novice.
There are two program source files:
Here are the high-level steps performed by the main semlit
program:
Let's declare some global variables to hold values for program
options, and set their default values:
00047 my $o_help; # -h
00048 my $o_fs = ","; # -f
00049 my $o_delim = "="; # -d
00050 my $o_initialsource = "blank.html"; # -i
00051 my @o_incdirs = ("."); # GetOptions will append additional dirs for each "-I".
00052 $tabstop = 4; # defined and used by Text::Tabs - see "expand()" function
00053
00054 GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
00055 if (defined($o_help)) {
00056 help(); # if -h had a value, it would be in $opt_h
00057 }
Note the use of the Perl built-in function GetOptions. This function
does the work of parsing the command-line and setting new values
into the global option variables. See the Perl
documentation if you are not familiar with the construct.
The program assumes that there is a single master sldoc
file supplied on the command line. So, early on we get that file
name:
00059 if (scalar(@ARGV) != 1) {
00060 usage("Error, .sldoc file missing");
00061 }
00062 $main_doc_filename = $ARGV[0];
00063 if ( ! -r "$main_doc_filename" ) {
00064 usage("Error, could not read '$main_doc_filename'");
00065 }
Note the use of the perl construct scalar(@array) to
determine the number of elements in the array. There are shorter
ways to do this (in terms of keystrokes), but I prefer the
explicit construct. Makes it easier to spot. Also note the use of
-r file to test if
the file exists and is readable.
Once the file name is determined, we now process that input file:
00072 # Main loop; read each line in doc file
00073
00074 my $doc_html_str = process_doc_file($main_doc_filename);
next ref last ref
Note that the function process_doc_file()
returns the html output for the documentation.
An sldoc file might insert the same block of source code
multiple times. The source html will link to the first doc
reference to it. How does the reader find the other references? At
the end of an inserted source block, links will be inserted to
adjacent references within the doc.
However, the sldoc file is processed sequentially. When a
reference (insert command) is seen, it is not yet known
if there will be subsequent references. So, a fixup step is added
after the sldoc file is fully processed. That fixup step
looks at each source block that has multiple references and adds
the appropriate links (next/prev).
Since this step needs a count of the number of references, we
need code in the semlit_cmd()
function to do the counting:
00223 my $num_refs = 1;
00224 my $block_ref_name = $block_name;
00225 if (defined($block_numrefs{$block_name})) {
00226 $num_refs = $block_numrefs{$block_name} + 1;
00227 $block_ref_name = $block_name . "_ref_$num_refs";
00228 }
00229 $block_numrefs{$block_name} = $num_refs;
If $block_numrefs{$block_name}
already exists, then this is not the first insert for
this block.
Back to main and the fix-up step:
00076 # fix up multiple source references
00077 foreach my $blockname (keys(%block_numrefs)) {
00078 if ($block_numrefs{$blockname} > 1) {
00079 # First ref points to next and last
00080 my $refnum = 1;
00081 my $this_block = $blockname . "_ref_" . ($refnum);
00082 my $first_block = $this_block;
00083 my $last_block = $blockname . "_ref_" . $block_numrefs{$blockname};
00084 my $next_block = $blockname . "_ref_" . ($refnum + 1);
00085 $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$next_block">next ref<\/a> <a href="#$last_block">last ref<\/a><\/pre>/s;
00086
00087 # Middle refs point to previous and next
00088 my $prev_block = $this_block;
00089 for ($refnum = 2; $refnum <= $block_numrefs{$blockname} - 1; $refnum ++) {
00090 # middle refs point to prev and next
00091 $this_block = $blockname . "_ref_" . ($refnum);
00092 $next_block = $blockname . "_ref_" . ($refnum + 1);
00093 $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$next_block">next ref<\/a> <a href="#$prev_block">prev ref<\/a><\/pre>/s;
00094 $prev_block = $this_block;
00095 }
00096
00097 # last ref points to first and previous
00098 $this_block = $blockname . "_ref_" . ($refnum);
00099 $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$first_block">first ref<\/a> <a href="#$prev_block">prev ref<\/a><\/pre>/s;
00100 }
00101 }
The
first reference has no "prev", so a link to the last reference is
included. The last reference has no "next", so a link to the first
reference is included. (The source html always links to
the first reference.)
A bit earlier in the program, the output file was opened:
00067 # open main doc file
00068
00069 $doc_html_filename = basename($main_doc_filename) . ".html"; # strip directory
00070 open($doc_html_outfd, ">", $doc_html_filename) || die "Error, could not open htmlfile '$doc_html_filename'";
Then, after the input file is
processed, the returned html output is written to the output file:
00103 # write doc html file
00104
00105 print $doc_html_outfd "$doc_html_str\n";
00106 close($doc_html_outfd);
00107
00108 # Create frameset page
00109
00110 my $index_o_file;
00111 open($index_o_file, ">", "index.html") || die "Error, could not open htmlfile 'index.html'";
00112 print $index_o_file <<__EOF__;
00113 <html><head></head>
00114 <frameset cols="50%,*">
00115 <frame src="$doc_html_filename" name="doc">
00116 <frame src="$o_initialsource" name="src">
00117 </frameset>
00118 </html>
00119 __EOF__
00120 close($index_o_file);
00121
00122 # Create blank page for initial source frame
00123
00124 my $blank_o_file;
00125 open($blank_o_file, ">", "blank.html") || die "Error, could not open htmlfile 'blank.html'";
00126 print $blank_o_file "<html><head></head><body>Click a source line number to see the line in context.</body></html>\n";
00127 close($blank_o_file);
00128
00129 # All done.
00130 exit($exit_status);
The variable $exit_status is initialized to zero in main
and is incremented each time an error is reported. Thus, if no
errors are reported, the program exits with success (0).
There are a couple of misc support files needed by the html
documentation. Since the documentation is best viewed with html
frames, the first file is the frameset:
00108 # Create frameset page
00109
00110 my $index_o_file;
00111 open($index_o_file, ">", "index.html") || die "Error, could not open htmlfile 'index.html'";
00112 print $index_o_file <<__EOF__;
00113 <html><head></head>
00114 <frameset cols="50%,*">
00115 <frame src="$doc_html_filename" name="doc">
00116 <frame src="$o_initialsource" name="src">
00117 </frameset>
00118 </html>
00119 __EOF__
00120 close($index_o_file);
Note the use of Perl's "here document" <<__EOF__ ... __EOF__ See Wikipedia
and the Perl
documentation if you are not familiar with this construct.
Contrast this with the next code fragment.
By default, when the documentation is first brought up, there is no
source to display in the source frame. So we need an almost blank page:
00122 # Create blank page for initial source frame
00123
00124 my $blank_o_file;
00125 open($blank_o_file, ">", "blank.html") || die "Error, could not open htmlfile 'blank.html'";
00126 print $blank_o_file "<html><head></head><body>Click a source line number to see the line in context.</body></html>\n";
00127 close($blank_o_file);
I didn't use a "here" document for this, but
I could have.
That initial blank page can be overridden with the "-i" command-line option, or with the "initialsource" semlit command.
The process_doc_file()
function is called from the main program:
00072 # Main loop; read each line in doc file
00073
00074 my $doc_html_str = process_doc_file($main_doc_filename);
first ref prev ref
It is also called (recursively) from the semlit command include.
The function starts out with:
00136 sub process_doc_file {
00137 my ($doc_filename) = @_;
Since this function can be called recursively, we save any
existing file name and line number (they are restored before
return).
The main function is to open an sldoc file, read the file
and process the semlit commands, and return the documentation html
output. Note that the sldoc input file is already in html
form; once the file is read, all that remains to be done is to
process the semlit commands. In the degenerate simple case of an sldoc
file containing no
semlit commands at all, the output html file will be
exactly equal to the sldoc input file. (An unlikely case
since it is the semlit commands which provide the value of
semi-literate documentation.)
Here are the high-level steps it performs:
In an earlier version, the program read the sldoc input
file one line at a time. The line was tested to determine if it
was an semlit command. If so, the line was executed and the line
dropped. This approach worked fine when I used a simple text
editor to create the html sldoc file. But editing html
with a simple text editor is painful, error-prone, and
inefficient. I really wanted to use an html editor. But html
editors generally do not make it easy to control how the content
is arranged on lines of the physical html file. I discovered that
multiple semlit commands can be packed onto a single line, and
even split across lines.
Fortunately, Perl provides powerful constructs which make this
kind of processing easy. Instead of looking at the file
line-by-line, the entire file can be read into a single string
variable, and regular expression matching can be used to find the
commands, one at a time. You will see this below.
The semlit program supports having libraries of sldoc
files for standard boilerplate. I chose to model this after the C
compiler: the programmer simply specifies the name of the file,
and one or more search directories can be specified on the command
line with the -I
option. These directories are set up in the main program thus:
00051 my @o_incdirs = ("."); # GetOptions will append additional dirs for each "-I".
...
00054 GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
next ref last ref
The GetOptions()
function will parse out as many -I options that the user supplies and leaves
the @incdirs array
set with the directories (including ".").
Back in the process_doc_file() function, the file is opened as
follows:
00140 # open source file, using one or more search directories
00141
00142 my $incdir;
00143 my $open_success = 0;
00144 foreach $incdir (@o_incdirs) {
00145 if (open($doc_infd, "<", "$incdir/$doc_filename")) {
00146 $open_success = 1;
00147 last; # break out of foreach
00148 }
00149 }
00150 if (! $open_success) {
00151 err("could not open doc file '$doc_filename', skipping");
00152 return;
00153 }
It loops through the array
of include directories until the file can be opened. Note that
almost the same code exists in process_src_file().
Perl makes this easy:
00155 # Read entire file into memory
00156
00157 my @doctexts = <$doc_infd>;
00158 close($doc_infd);
00159 chomp(@doctexts); # remove line delims from every line
00160 my $num_lines = scalar(@doctexts); # count lines in file
00161 my $doctext = join("\n", @doctexts) . "\n"; # combine as a single string
00162 $doctext =~ s/\r//gs; # remove carriage returns, if any
The chomp()
function removes line delimiters, which, depending on platform,
might be something other than linefeeds, and the join()
combines the lines into one long string, and inserts linefeeds as
line endings. Then, carriage returns, if any, are removed. This
unifies different platforms. (Similar
code can be found in process_src_file().)
The basic algorithm is to find the first semlit command in the file, execute it,
and then replace that semlit command with the results of the
command. Note that it is possible that the replacement text
contains semlit commands. So, each time a semlit command is
processed, the entire file must be re-scanned from the beginning.
Instead of trying to process the file line-by-line, the loop
executes until there are no more commands to execute:
00167 # process semlit commands
00168 while ($doctext =~ /$o_delim\s*semlit\s*$o_fs\s*([^$o_delim]+)$o_delim/is) {
00169 my $cmd = $1; # text of command (minus standard stuff)
The variable $o_delim contains the
semlit command delimiter and defaults to "=". The variable $o_fs contains the semlit
command field separator and defaults to ",".
Note that the delimiter will certainly be used in the code
without being associated with a semlit command, so the mere
presence of the delimiter does not flag the start of a semlit
command. The word "semlit"
must immediately follow the delimiter ("=semlit"), and that must be followed by a
field separator (",").
Also note that the match pattern captures the text between that
field separator and the ending delimiter. That represents the
semlit command keyword and parameters.
The next lines captures the file contents before and after the
semlit command:
00170 my $prefix = $PREMATCH; # text preceiding the command
00171 my $suffix = $POSTMATCH; # text after the command
Note the use of the $PREMATCH and $POSTMATCH built-in
variables. See the Perl
documentation if you are not familiar with this construct.
If errors are detected in the source files, it is helpful to
print error messages which include the line number. This is
challenging in this function because we are not processing on a
line-by-line basis. It is further complicated by the fact that as
commands are replaced by the returned text form command execution,
the number of lines in the $doctext
changes over time. However, since the commands are processed in
order, we can calculate the line number by looking at the number
of lines following the
command, and subtracting it from the total number:
00173 # calculate line number containing the start of this semlit command
00174 $cur_file_linenum = $num_lines - scalar(my @t = split("\n", $suffix)) + 1;
As you know from the Perl
documentation, the split()
function will by default not create an empty entry at the end
(after the final '\n').
Now execute the command and capture the replacement text:
00176 my $repl = semlit_cmd($cmd);
The semlit_cmd()
function returns formatted html.
Replacing the matched semlit command with the resulting text is
straight-forward:
00178 # Commands are removed, and often replaced with some result
00179 $doctext = $prefix . $repl . $suffix;
00180 } # while
This is where the
number of lines in $doctext
can change.
Finally:
00184 return $doctext;
00185 } # process_doc_file
For the most part, Perl takes care of allowing functions to be
recursive. However, we do have some global variables for error
reporting which adds an extra challange:
00029 my $cur_file_name = "";
00030 my $cur_file_linenum = 0;
By keeping the file name and line
number as global variables, any function can report a
user-friendly error message with a minimum of fuss.
Once the initial steps are completed, the process_doc_file()
function is ready to start handling commands. But before it does,
we want to save those global variables:
00164 my ($save_doc_filename, $save_doc_linenum) = ($cur_file_name, $cur_file_linenum);
00165 ($cur_file_name, $cur_file_linenum) = ($doc_filename, 0);
Then the command loop executes. When that is done, just before
returning, we want to restore the global variables:
00182 ($cur_file_name, $cur_file_linenum) = ($save_doc_filename, $save_doc_linenum);
You will see very similar code in the
process_src_file()
function.
The process_doc_file() and process_src_file()
functions read input files, find semlit commands, and call semlit_cmd() to execute
them. The delimiters, "semlit",
and the first field separator are stripped, so that the passed-in
command string starts with the command name.
The tabstop command
simply updates the $tabstop
global variable:
00192 # semlit tabstop - doc: source tab expansion
00193 if ($cmd =~ /^tabstop\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00194 if ($1 =~ /^\d+$/) {
00195 $tabstop = $1; # used by Text::Tabs
00196 return "";
00197 } else {
00198 err("Tabstop value '$1' must be numeric");
00199 return "";
00200 }
00201 }
The $tabstop variable is used
directly by the built-in Text::Tabs
Perl module:
00020 use Text::Tabs;
Note that the $tabstop variable can
also be set on the command line:
00054 GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
next ref prev ref
So, where does this actually get used? Right here:
00373 $iline = expand($iline); # expand tabs according to $tabstop.
(In the function process_src_file().)
The expand()
function is part of the Text::Tabs
module.
The tabstop command
does not return any text (returns "" - empty string).
The srcfile command
is used to scan an slsrc file:
00203 # semlit srcfile - doc: read and process source file
00204 elsif ($cmd =~ /^srcfile\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00205 return process_src_file($1, $2);
00206 }
During the scan, any semlit commands found in the slsrc
file are processed (usually block/endblock commands), and
the source output files are created (both html and src).
The returned text is a link to the plaintext output src file. The author of the sldoc file expects this and positions the srcfile command such that a link to the src file makes sense. For example, near the top of this sldoc file, the srcfile commands are arranged in a bullet list. Each srcfile command is followed by the hint "(right-click and save)" followed by a short description of the file.
The initialsource command
is used to specify a file for initial display in the source frame:
00208 # semlit initialsource - doc: set initial source frame
00209 elsif ($cmd =~ /^initialsource\s*$o_fs\s*([^\s$o_fs]+)\s*/i) {
00210 $o_initialsource = $1;
00211 return "";
00212 }
The file must be the final file name, ending with ".html".
There is no returned text.
The include command
is used to scan an sldoc file:
00214 # semlit include - doc: read and process doc file
00215 elsif ($cmd =~ /^include\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00216 return process_doc_file($1);
00217 }
Note that this represents a recursive call to process_doc_file().
During the scan, the included file is processed the same way as
the master sldoc file.
The returned text is simply the processed html of the included
file.
The insert command is used to insert into the document
output html file a named block of source lines (from slsrc
files):
00219 # semlit insert - doc: insert a source block
00220 elsif ($cmd =~ /^insert\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00221 my $block_name = $1;
00222 if (exists($srcblocks{$block_name})) {
00223 my $num_refs = 1;
00224 my $block_ref_name = $block_name;
00225 if (defined($block_numrefs{$block_name})) {
00226 $num_refs = $block_numrefs{$block_name} + 1;
00227 $block_ref_name = $block_name . "_ref_$num_refs";
00228 }
00229 $block_numrefs{$block_name} = $num_refs;
00230
00231 my $block_str = $srcblocks{$block_name};
00232 return <<__EOF__;
00233 <a name="$block_ref_name" id="$block_ref_name"><\/a>
00234 <small><pre>
00235 $block_str
00236 <\/pre><!-- endblock $block_ref_name --></small>\n
00237 __EOF__
00238 } else {
00239 err("attempt to insert block named '$block_name' but block not defined");
00240 return "";
00241 }
00242 }
Note the use of Perl's "here
document" <<__EOF__
... __EOF__ See Wikipedia
and the Perl
documentation if you are not familiar with this construct.
Also note that the source blocks are
stored in the %srcblocks
hash by the process_src_file() function.
The endblock html comment becomes important during the final fix up step; for blocks inserted
multiple times, the comment is replaced with links to other
references.
The returned text is the processed html of the source block.
The block command essentially tells the process_src_file() function
to start storing source lines into a named block:
00244 # semlit block - src: start a named block of source
00245 elsif ($cmd =~ /^block\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00246 my $block_name = $1;
00247 if (defined($srcblocks{$block_name})) {
00248 err("block '$block_name' already defined");
00249 return "";
00250 }
00251 $srcblocks{$block_name} = "";
00252 $block_numrefs{$block_name} = 0;
00253 $active_srcblocks{$block_name} = $cur_file_linenum;
00254
00255 $global_src_buffer = "<span name=\"$block_name\" id=\"$block_name\"><\/span>";
00256 return "";
00257 }
Since it is possible for source lines to be
included in multiple named blocks, the global hash %active_src_blocks
is used to indicate which named blocks are accumulating.
The endblock command essentially tells the process_src_file() function
to stop storing source lines into the named block:
00259 # semlit endblock - src: end a named block of source
00260 elsif ($cmd =~ /^endblock\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00261 my $block_name = $1;
00262 if (exists($active_srcblocks{$block_name})) {
00263 delete($active_srcblocks{$block_name});
00264 $srcblocks{$block_name} =~ s/\n$//s;
00265 return "";
00266 } else {
00267 err("found endblock for '$block_name', which is not active");
00268 return "";
00269 }
00270 }
00271
00272 # semlit tooltip - create hover over text for a phrase
00273 elsif ($cmd =~ /^tooltip\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00274 my $text_source = $1;
00275 my $text_link = $2;
00276 my $contents = file_get_contents($text_source);
00277 return <<__EOF__;
00278 <a href="#" title="$contents" style="color:2222ee;border-bottom:1px dotted #2222ee;text-decoration: none;">$text_link</a>
00279 __EOF__
00280 }
00281
The block name needs to be supplied
because a nested block does not need to be fully-contained by the
outer block. I.e. if multiple blocks are active, the endblock
does not necessarily end the most-recently opened block.
The tooltip command creates a keyword in block of text
that can be hovered over for additional information.
00272 # semlit tooltip - create hover over text for a phrase
00273 elsif ($cmd =~ /^tooltip\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00274 my $text_source = $1;
00275 my $text_link = $2;
00276 my $contents = file_get_contents($text_source);
00277 return <<__EOF__;
00278 <a href="#" title="$contents" style="color:2222ee;border-bottom:1px dotted #2222ee;text-decoration: none;">$text_link</a>
00279 __EOF__
00280 }
The tooltip data is loaded from a
file on the local server, and the keyword is supplied as the last
parameter. Note: the keyword cannot contain any spaces.
The process_src_file() function is called from the srcfile command. It reads
and processes an slsrc file (program source code). Part of
the processing is to generate two output files: a source html
file, and a src file, stripped of its semlit commands and
ready for compilation.
This function has conceptual similarities with the process_doc_file() function,
but there are important differences. The biggest difference is the
overall approach to processing the file. Here, we process the slsrc
file line-at-a-time. A line must contain either program source
code, or a single semlit command (although that command might be
enclosed in comment delimiters). Instead of replacing the command
with returned content, the line containing the command is simply
discarded.
Here are the high-level steps performed:
Almost the same code exists in process_doc_file(). Here we
are opening an slsrc file which might be in any of the
directories in the array @o_incdirs:
00299 # open source file, using one or more search directories
00300 my $incdir;
00301 my $open_success = 0;
00302 foreach $incdir (@o_incdirs) {
00303 if (open($slsrc_infd, "<", "$incdir/$src_filename")) {
00304 $open_success = 1;
00305 last; # break out of foreach
00306 }
00307 }
00308 if (! $open_success) {
00309 err("could not open src file '$src_filename', skipping");
00310 return "";
00311 }
The output source html file is opened and some initial
content is written:
00313 # create and write initial content to html-ified source file
00314 if (! open($src_html_outfd, ">", "$src_filename.html")) {
00315 err("could not open output source html file '$src_filename.html', skipping");
00316 close($slsrc_infd);
00317 return "";
00318 }
00319 print $src_html_outfd <<__EOF__;
00320 <!DOCTYPE html><html><head><title>$plain_src_filename</title>
00321 <link rel="stylesheet" href="//code.jquery.com/ui/1.11.4/themes/smoothness/jquery-ui.css">
00322 <script src="//code.jquery.com/jquery-1.10.2.js"></script>
00323 <script src="//code.jquery.com/ui/1.11.4/jquery-ui.js"></script>
00324 <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.5/styles/default.min.css">
00325 <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.5/highlight.min.js"></script>
00326 <script>
00327 \$(function() {
00328 \$( document ).tooltip();
00329 });
00330 </script>
00331 <style>
00332 #code {background-color:#ffffff;};
00333 </style>
00334 </head>
00335 <body><h1>$plain_src_filename</h1>
00336 <script>hljs.initHighlightingOnLoad();</script>
00337 <small><pre><code id="code"><table border=0 cellpadding=0 cellspacing=0><tr>
00338 __EOF__
In addition to
the initial html source, it also contains a link to the plain text
src file which can be downloaded.
The output src file is opened:
00340 # Create plaintext source file (without semlit commands)
00341 if (! open($src_outfd, ">", "$plain_src_filename")) {
00342 err("could not open output src '$plain_src_filename', skipping");
00343 close($slsrc_infd);
00344 close($src_html_outfd);
00345 return "";
00346 }
Read the slsrc file, line-by-line:
00355 my $iline;
00356 while (defined($iline = <$slsrc_infd>)) {
00357 chomp($iline); # remove line delim
00358 $iline .= "\n"; # add newline
00359 $iline =~ s/\r//gs; # remove carriage returns, if any
00360 $cur_file_linenum ++;
The chomp() function removes
line delimiter, which, depending on platform, might be something
other than a linefeeds, and the next line adds a newline to be the
line delimiter. Then, carriage returns, if any, are removed. This
unifies different platforms. (Similar
code can be found in process_doc_file().)
The slsrc file contains semlit commands, usually block and endblock. Find and process
them:
00362 # check for semlit commands
00363 if ($iline =~ /$o_delim\s*semlit\s*$o_fs\s*([^$o_delim]+)$o_delim/i) {
00364 semlit_cmd($1);
00365 # discard command line
00366 }
If the slsrc line does not contain a semlit command, then
it is normal source code.
00367 else {
00368 $src_linenum ++; # don't count semlit command lines
Note that
source code lines are counted separately than input slsrc
file lines ($cur_file_linenum counted above). The input
line count ($cur_file_linenum) includes semlit command
lines and is used when printing error messages by err(), while the source code
line count ($src_linenum) does not include semlit command
lines and is used as the user-visible line number in the output
source html file.
Write the plaintext source line to the src file:
00370 print $src_outfd $iline;
In advance of writing the line to the output source html
file, expand tabs and convert the special characters '&',
'<' and '>' to their html forms:
00372 # fix up source for html rendering (tab expansion, special char encoding)
00373 $iline = expand($iline); # expand tabs according to $tabstop.
00374 $iline =~ s/\&/\&/g; $iline =~ s/</\</g; $iline =~ s/>/\>/g;
Note that the expand()
function is part of the Text::Tabs Perl module and uses
the $tabstop global variable to control how it expands.
Check to see if this source line is inside one or more block/endblock
constructs:
00376 # if we are in at least one block, link the source to the earliest block's first doc reference
00377 if (scalar(keys(%active_srcblocks)) > 0) {
There is at least one named block active. Assuming there might be
more than one, find the active block which was most-recently
opened (is at the highest-numbered input line):
00378 # descending sort so that elemet 0 is largest
00379 my @active_blocks = sort { $active_srcblocks{$b} cmp $active_srcblocks{$a} } keys(%active_srcblocks);
This sort construct orders the keys
by descending content of the %active_srcblocks
hash. Thus, $active_blocks[0] is the name of that
most-recently opened block. This is used to construct the link
back to the doc:
00380 my $targ = $active_blocks[0] . "_ref_1";
00381 $src_lines_td .= sprintf("<a href=\"$doc_html_filename#$targ\" target=\"doc\">%05d<\/a>\n", $src_linenum);
00382 if ($global_src_buffer) {
00383 $src_content_td .= sprintf("%s %s", $global_src_buffer, $iline);
00384 $global_src_buffer = "";
00385 }
00386 else {
00387 $src_content_td .= sprintf(" %s", $iline);
00388 }
For each active named block/endblock
construct, add the source code to the %srcblocks hash:
00390 # for each open source block on this line of source, link the doc block to the that source block
00391 foreach my $block_name (keys(%active_srcblocks)) {
00392 my $a = sprintf("<a href=\"$cur_file_name.html#$block_name\" target=\"src\">%05d<\/a> %s", $src_linenum, $iline);
00393 $srcblocks{$block_name} .= $a;
00394 }
This is used by the insert semlit command to
insert the source block into the output doc html file.
For source lines which are not contained in a block/endblock
construct, no doc link is needed when writing to the output source
html file:
00395 } else {
00396 # no active blocks
00397 my $a = sprintf("%05d\n", $src_linenum);
00398 my $c = sprintf(" %s", $iline);
00399 $src_lines_td .= $a;
00400 $src_content_td .= $c;
00401 }
Close the files:
00418 close($slsrc_infd);
00419 close($src_outfd);
00420
00421 print $src_html_outfd "</tr></table></code>\n";
00422 print $src_html_outfd "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n";
00423 print $src_html_outfd "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n";
00424 print $src_html_outfd "</pre></small></body></html>\n";
00425 close($src_html_outfd);
Note the many newlines
printed. This is so that clicking on a line number which is near the
end of the source file will still position that line of source at the
top of the screen.
Also, if the user
accidentally started one or more named blocks but did not end
them, print errors and force them ended:
00427 # if the source file started a block but reached eof without ending it, end it here.
00428 foreach (keys(%active_srcblocks)) {
00429 err("block named '$_' started but not ended");
00430 semlit_cmd("endblock$o_fs$_"); # end it for the user
00431 }
Do it by calling semlit_cmd() function,
passing it the endblock
command (as if the slsrc file had it).
But before we return, restore the $cur_file_name and $cur_file_linenum
variables to their previous state. Then return.
00433 # the semlit.srcfile command writes a link to the plaintext source file
00434 ($cur_file_name, $cur_file_linenum) = ($save_doc_filename, $save_doc_linenum);
00435 return "<a href=\"$plain_src_filename\">$plain_src_filename</a>";
Since the call to process_src_file()
came from semlit_cmd() executing the srcfile
command, and that command is used in the sldoc file, the
thing we return is a link to the plaintext output src
file.
First, let's declare a couple of globals that will be used for
helping the user:
00025 my $tool = "semlit.pl";
00026 my $usage_str = "$tool [-h] [-d delim] [-f fs] [-I dir] [-t tabstop] [files]";
next ref last ref
(The usage() function also uses
this.) In the main code, the "-h"option calls help()
to print a more-extensive help:
00470 sub help {
00471 my($err_str) = @_;
00472
00473 if (defined $err_str) {
00474 print "$tool: $err_str\n\n";
00475 }
00476 print <<__EOF__;
00477 Usage: $usage_str
00478 Where:
00479 -h - print help screen
00480 -d delim - delimiter character at start and end of a semlit command.
00481 (default to '=')
00482 -f fs - field separator character within a semlit command.
00483 (default to ',')
00484 -i initialsource - file name for initial source frame.
00485 (default to "blank.htmo") Also, initialsource semlit command.
00486 -I dir - directory to find files for 'srcfile' and 'include' commands.
00487 (default to ".") The "-I dir" option can be repeated.
00488 -t tabstop - convert tabs to "tabstop" spaces.
00489 (default to '4')
00490 files - zero or more input files. If omitted, inputs from stdin.
00491
00492 __EOF__
00493
00494 exit($exit_status);
00495 } # help
Note the use
of Perl's "here document" <<__EOF__
... __EOF__ See Wikipedia
and the Perl
documentation if you are not familiar with this construct.
This is a simple routine to read the contents of a file for the purpose
of filling in the tooltips:
00458 sub file_get_contents{
00459 my ($text_file) = @_;
00460 open FILE, $text_file or die $!;
00461 flock FILE, 1 or die $!; # wait for lock
00462 seek(FILE, 0, 0); # move pointer to beginning
00463 my $slurp = do{local $/; <FILE>};
00464 flock FILE, 8; # release the lock
00465 close(FILE);
00466
00467 return $slurp;
00468 } # file_get_contents
When there is an obvious user error in the invocation of the
semlit program, the usage() function is
called.
For example:
00054 GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
first ref prev ref
Note the use of the
logical OR construct (||).
Because of Perl's C-like short-circuit
evaluation, the right-hand expression (function call to usage()) is only executed
if the left-hand side (function call to GetOptions()) returns false. I.e. usage is
called if GetOptions()
fails. Non-Perl programmers might be tempted to use a simple if/then construct, but
the logical OR construct is such a common Perl idiom that the wise
reader will learn it.
In other cases, an error is discovered in one of the input files,
sldoc and/or slsrc. In those cases, the err() function is called.
First, let's declare a couple of globals that will be used for
helping the user:
00025 my $tool = "semlit.pl";
00026 my $usage_str = "$tool [-h] [-d delim] [-f fs] [-I dir] [-t tabstop] [files]";
first ref prev ref
(The help() function also uses
this.) The usage() function allows an optional error
message to be passed in, which is printed before the usage string:
00447 sub usage {
00448 my($err_str) = @_;
00449
00450 if (defined $err_str) {
00451 print STDERR "$tool: $err_str\n\n";
00452 }
00453 print STDERR "Usage: $usage_str\n\n";
00454 $exit_status ++;
00455 exit($exit_status);
00456 } # usage
The function err() is a programmer convenience which prints an
error message, along with the file name and line number where the
error was discovered.
00439 sub err {
00440 my ($msg) = @_;
00441
00442 print STDERR "Error [$cur_file_name:$cur_file_linenum], $msg\n";
00443 $exit_status ++;
00444 } # err
It also increments
$exit_status, which starts out at zero (success):
00043 my $exit_status = 0; # assume success
and is used when exiting the program:
00129 # All done.
00130 exit($exit_status);
Thus, a non-zero (failure) exit status also
indicates how many errors there were.
As mentioned above, the first line of semlit.pl is a
fairly common shebang:
00004 #!/usr/local/bin/perl -w
This Unix construct, combined with setting the executable bit on
the file, is intended to allow the tool to be run by simply typing
the file name as a command (assuming that the PATH environment variable
is set up right). However, it does require that the physical
location of the Perl interpreter be encoded directly in the file.
Unfortunately, different Unix systems install Perl in different
places - sometimes under /usr/local, sometimes in /bin,
sometimes under /opt.
I vaguely remember that there is a clever way to re-code that
line such that the shell will search for the Perl interpreter in
the PATH environment variable. But I don't remember how
to do it.
One way to run semlit.pl without having to set the Perl
interpreter's full location is:
perl SemLitPath/semlit.pl ...
But even that is somewhat unsatisfying for an experienced Unix
user. Lazy as we are, we would prefer to let PATH do the work of
finding the program, so we just enter:
semlit
...
One common way to accomplish this is with a wrapper shell script
which encapsulates any annoying details of running the tool. Hence
the semlit file.
The first line of semlit is the standard shebang:
00004 #!/bin/sh
But this time it references the universally-respected location
for a Bourne-compatible
shell. No chance that this won't work on some flavor of
Unix.
Next save the initial working directory (so that we can get back
to it).
00008 IWD=`pwd` # remember initial working directory
I establish the convention that the wrapper script must
be in the same directory as the perl script. So, the next thing to
do is figure out which directory contains the wrapper which is
running:
00010 # Find dir where tool is stored (useful for finding related files)
00011 TOOLDIR=`dirname $0`
00012 # Make sure TOOLDIR is a full path name (not relative)
00013 cd $TOOLDIR; TOOLDIR=`pwd`; cd $IWD
This might look more complicated than it needs to be. Isn't the
first line enough?
00011 TOOLDIR=`dirname $0`
No, it isn't. Suppose you are in your home directory, and you
have placed the semlit files in $HOME/bin. Then let's
say you execute the program by entering:
bin/semlit
...
That is perfectly legal, and dirname $0 will, not
surprisingly, return bin, a relative path. But I want a
fully-qualified path, so I include the three commands:
00013 cd $TOOLDIR; TOOLDIR=`pwd`; cd $IWD
You simply cd to that potentially-relative location and use pwd to get the full path. Then cd back to the initial working directory. This is easier than trying to parse all the possible return values for dirname.
All that remains to be done is to run the perl interpreter as a
comand (such that PATH is used):
00015 perl $TOOLDIR/semlit.pl $*