This is a semi-literate document for the program semlit.pl and it's shell wrapper semlit.sh. This program is used to create semi-literate documentation (of which this document is an example). Semi-literate documents like this are intended to explain the internals of a program, which is different than user documentation. If you are an end user, you probably want the user document.
Copyright 2012, 2015 Steven Ford http://geeky-boy.com and licensed "public domain" style under:
To the extent possible under law, Steven Ford has
waived all copyright and related or neighboring rights to
this work. This work is published from: United States. The project
home is https://github.com/fordsfords/semlit/tree/gh-pages.
To contact me, Steve Ford, project owner, you can find my email
address at http://geeky-boy.com.
Can't see it? Keep looking.
This document describes the internals of the semlit.pl
program, and is intended to be read by a programmer who wants to
understand, maintain, and perhaps reuse the code. Before
reading this documentation, you are expected to have a good
user-level understanding of SEMLIT. Please be familiar user
document with before starting this, unless you are
just getting a feel for what SEMLIT documentation is like.
The main program is written in Perl. The reader is assumed to
have at least entry-level
knowledge of Perl. Some of the more-advanced Perl constructs
are explained for the benefit of the novice.
There are two program source files:
Here are the high-level steps performed by the main semlit
program:
Let's declare some global variables to hold values for program
options, and set their default values:
00048 my $o_help; # -h
00049 my $o_fs = ","; # -f
00050 my $o_delim = "="; # -d
00051 my $o_initialsource = "blank.html"; # -i
00052 my @o_incdirs = ("."); # GetOptions will append additional dirs for each "-I".
00053 $tabstop = 4; # defined and used by Text::Tabs - see "expand()" function
00054
00055 GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
00056 if (defined($o_help)) {
00057 help(); # if -h had a value, it would be in $opt_h
00058 }
Note the use of the Perl built-in function GetOptions. This function
does the work of parsing the command-line and setting new values
into the global option variables. See the Perl
documentation if you are not familiar with the construct.
The program assumes that there is a single master sldoc
file supplied on the command line. So, early on we get that file
name:
00060 if (scalar(@ARGV) != 1) {
00061 usage("Error, .sldoc file missing");
00062 }
00063 $main_doc_filename = $ARGV[0];
00064 if ( ! -r "$main_doc_filename" ) {
00065 usage("Error, could not read '$main_doc_filename'");
00066 }
Note the use of the perl construct scalar(@array) to
determine the number of elements in the array. There are shorter
ways to do this (in terms of keystrokes), but I prefer the
explicit construct. Makes it easier to spot. Also note the use of
-r file to test if
the file exists and is readable.
Once the file name is determined, we now process that input file:
00073 # Main loop; read each line in doc file
00074
00075 my $doc_html_str = process_doc_file($main_doc_filename);
next ref last ref
Note that the function process_doc_file()
returns the html output for the documentation.
An sldoc file might insert the same block of source code
multiple times. The source html will link to the first doc
reference to it. How does the reader find the other references? At
the end of an inserted source block, links will be inserted to
adjacent references within the doc.
However, the sldoc file is processed sequentially. When a
reference (insert command) is seen, it is not yet known
if there will be subsequent references. So, a fixup step is added
after the sldoc file is fully processed. That fixup step
looks at each source block that has multiple references and adds
the appropriate links (next/prev).
Since this step needs a count of the number of references, we
need code in the semlit_cmd()
function to do the counting:
00224 my $num_refs = 1;
00225 my $block_ref_name = $block_name;
00226 if (defined($block_numrefs{$block_name})) {
00227 $num_refs = $block_numrefs{$block_name} + 1;
00228 $block_ref_name = $block_name . "_ref_$num_refs";
00229 }
00230 $block_numrefs{$block_name} = $num_refs;
If $block_numrefs{$block_name}
already exists, then this is not the first insert for
this block.
Back to main and the fix-up step:
00077 # fix up multiple source references
00078 foreach my $blockname (keys(%block_numrefs)) {
00079 if ($block_numrefs{$blockname} > 1) {
00080 # First ref points to next and last
00081 my $refnum = 1;
00082 my $this_block = $blockname . "_ref_" . ($refnum);
00083 my $first_block = $this_block;
00084 my $last_block = $blockname . "_ref_" . $block_numrefs{$blockname};
00085 my $next_block = $blockname . "_ref_" . ($refnum + 1);
00086 $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$next_block">next ref<\/a> <a href="#$last_block">last ref<\/a><\/pre>/s;
00087
00088 # Middle refs point to previous and next
00089 my $prev_block = $this_block;
00090 for ($refnum = 2; $refnum <= $block_numrefs{$blockname} - 1; $refnum ++) {
00091 # middle refs point to prev and next
00092 $this_block = $blockname . "_ref_" . ($refnum);
00093 $next_block = $blockname . "_ref_" . ($refnum + 1);
00094 $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$next_block">next ref<\/a> <a href="#$prev_block">prev ref<\/a><\/pre>/s;
00095 $prev_block = $this_block;
00096 }
00097
00098 # last ref points to first and previous
00099 $this_block = $blockname . "_ref_" . ($refnum);
00100 $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$first_block">first ref<\/a> <a href="#$prev_block">prev ref<\/a><\/pre>/s;
00101 }
00102 }
The
first reference has no "prev", so a link to the last reference is
included. The last reference has no "next", so a link to the first
reference is included. (The source html always links to
the first reference.)
A bit earlier in the program, the output file was opened:
00068 # open main doc file
00069
00070 $doc_html_filename = basename($main_doc_filename) . ".html"; # strip directory
00071 open($doc_html_outfd, ">", $doc_html_filename) || die "Error, could not open htmlfile '$doc_html_filename'";
Then, after the input file is
processed, the returned html output is written to the output file:
00104 # write doc html file
00105
00106 print $doc_html_outfd "$doc_html_str\n";
00107 close($doc_html_outfd);
00108
00109 # Create frameset page
00110
00111 my $index_o_file;
00112 open($index_o_file, ">", "index.html") || die "Error, could not open htmlfile 'index.html'";
00113 print $index_o_file <<__EOF__;
00114 <html><head></head>
00115 <frameset cols="50%,*">
00116 <frame src="$doc_html_filename" name="doc">
00117 <frame src="$o_initialsource" name="src">
00118 </frameset>
00119 </html>
00120 __EOF__
00121 close($index_o_file);
00122
00123 # Create blank page for initial source frame
00124
00125 my $blank_o_file;
00126 open($blank_o_file, ">", "blank.html") || die "Error, could not open htmlfile 'blank.html'";
00127 print $blank_o_file "<html><head></head><body>Click a source line number to see the line in context.</body></html>\n";
00128 close($blank_o_file);
00129
00130 # All done.
00131 exit($exit_status);
The variable $exit_status is initialized to zero in main
and is incremented each time an error is reported. Thus, if no
errors are reported, the program exits with success (0).
There are a couple of misc support files needed by the html
documentation. Since the documentation is best viewed with html
frames, the first file is the frameset:
00109 # Create frameset page
00110
00111 my $index_o_file;
00112 open($index_o_file, ">", "index.html") || die "Error, could not open htmlfile 'index.html'";
00113 print $index_o_file <<__EOF__;
00114 <html><head></head>
00115 <frameset cols="50%,*">
00116 <frame src="$doc_html_filename" name="doc">
00117 <frame src="$o_initialsource" name="src">
00118 </frameset>
00119 </html>
00120 __EOF__
00121 close($index_o_file);
Note the use of Perl's "here document" <<__EOF__ ... __EOF__ See Wikipedia
and the Perl
documentation if you are not familiar with this construct.
Contrast this with the next code fragment.
By default, when the documentation is first brought up, there is no
source to display in the source frame. So we need an almost blank page:
00123 # Create blank page for initial source frame
00124
00125 my $blank_o_file;
00126 open($blank_o_file, ">", "blank.html") || die "Error, could not open htmlfile 'blank.html'";
00127 print $blank_o_file "<html><head></head><body>Click a source line number to see the line in context.</body></html>\n";
00128 close($blank_o_file);
I didn't use a "here" document for this, but
I could have.
That initial blank page can be overridden with the "-i" command-line option, or with the "initialsource" semlit command.
The process_doc_file()
function is called from the main program:
00073 # Main loop; read each line in doc file
00074
00075 my $doc_html_str = process_doc_file($main_doc_filename);
first ref prev ref
It is also called (recursively) from the semlit command include.
The function starts out with:
00137 sub process_doc_file {
00138 my ($doc_filename) = @_;
Since this function can be called recursively, we save any
existing file name and line number (they are restored before
return).
The main function is to open an sldoc file, read the file
and process the semlit commands, and return the documentation html
output. Note that the sldoc input file is already in html
form; once the file is read, all that remains to be done is to
process the semlit commands. In the degenerate simple case of an sldoc
file containing no
semlit commands at all, the output html file will be
exactly equal to the sldoc input file. (An unlikely case
since it is the semlit commands which provide the value of
semi-literate documentation.)
Here are the high-level steps it performs:
In an earlier version, the program read the sldoc input
file one line at a time. The line was tested to determine if it
was an semlit command. If so, the line was executed and the line
dropped. This approach worked fine when I used a simple text
editor to create the html sldoc file. But editing html
with a simple text editor is painful, error-prone, and
inefficient. I really wanted to use an html editor. But html
editors generally do not make it easy to control how the content
is arranged on lines of the physical html file. I discovered that
multiple semlit commands can be packed onto a single line, and
even split across lines.
Fortunately, Perl provides powerful constructs which make this
kind of processing easy. Instead of looking at the file
line-by-line, the entire file can be read into a single string
variable, and regular expression matching can be used to find the
commands, one at a time. You will see this below.
The semlit program supports having libraries of sldoc
files for standard boilerplate. I chose to model this after the C
compiler: the programmer simply specifies the name of the file,
and one or more search directories can be specified on the command
line with the -I
option. These directories are set up in the main program thus:
00052 my @o_incdirs = ("."); # GetOptions will append additional dirs for each "-I".
...
00055 GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
next ref last ref
The GetOptions()
function will parse out as many -I options that the user supplies and leaves
the @incdirs array
set with the directories (including ".").
Back in the process_doc_file() function, the file is opened as
follows:
00141 # open source file, using one or more search directories
00142
00143 my $incdir;
00144 my $open_success = 0;
00145 foreach $incdir (@o_incdirs) {
00146 if (open($doc_infd, "<", "$incdir/$doc_filename")) {
00147 $open_success = 1;
00148 last; # break out of foreach
00149 }
00150 }
00151 if (! $open_success) {
00152 err("could not open doc file '$doc_filename', skipping");
00153 return;
00154 }
It loops through the array
of include directories until the file can be opened. Note that
almost the same code exists in process_src_file().
Perl makes this easy:
00156 # Read entire file into memory
00157
00158 my @doctexts = <$doc_infd>;
00159 close($doc_infd);
00160 chomp(@doctexts); # remove line delims from every line
00161 my $num_lines = scalar(@doctexts); # count lines in file
00162 my $doctext = join("\n", @doctexts) . "\n"; # combine as a single string
00163 $doctext =~ s/\r//gs; # remove carriage returns, if any
The chomp()
function removes line delimiters, which, depending on platform,
might be something other than linefeeds, and the join()
combines the lines into one long string, and inserts linefeeds as
line endings. Then, carriage returns, if any, are removed. This
unifies different platforms. (Similar
code can be found in process_src_file().)
The basic algorithm is to find the first semlit command in the file, execute it,
and then replace that semlit command with the results of the
command. Note that it is possible that the replacement text
contains semlit commands. So, each time a semlit command is
processed, the entire file must be re-scanned from the beginning.
Instead of trying to process the file line-by-line, the loop
executes until there are no more commands to execute:
00168 # process semlit commands
00169 while ($doctext =~ /$o_delim\s*semlit\s*$o_fs\s*([^$o_delim]+)$o_delim/is) {
00170 my $cmd = $1; # text of command (minus standard stuff)
The variable $o_delim contains the
semlit command delimiter and defaults to "=". The variable $o_fs contains the semlit
command field separator and defaults to ",".
Note that the delimiter will certainly be used in the code
without being associated with a semlit command, so the mere
presence of the delimiter does not flag the start of a semlit
command. The word "semlit"
must immediately follow the delimiter ("=semlit"), and that must be followed by a
field separator (",").
Also note that the match pattern captures the text between that
field separator and the ending delimiter. That represents the
semlit command keyword and parameters.
The next lines captures the file contents before and after the
semlit command:
00171 my $prefix = $PREMATCH; # text preceiding the command
00172 my $suffix = $POSTMATCH; # text after the command
Note the use of the $PREMATCH and $POSTMATCH built-in
variables. See the Perl
documentation if you are not familiar with this construct.
If errors are detected in the source files, it is helpful to
print error messages which include the line number. This is
challenging in this function because we are not processing on a
line-by-line basis. It is further complicated by the fact that as
commands are replaced by the returned text form command execution,
the number of lines in the $doctext
changes over time. However, since the commands are processed in
order, we can calculate the line number by looking at the number
of lines following the
command, and subtracting it from the total number:
00174 # calculate line number containing the start of this semlit command
00175 $cur_file_linenum = $num_lines - scalar(my @t = split("\n", $suffix)) + 1;
As you know from the Perl
documentation, the split()
function will by default not create an empty entry at the end
(after the final '\n').
Now execute the command and capture the replacement text:
00177 my $repl = semlit_cmd($cmd);
The semlit_cmd()
function returns formatted html.
Replacing the matched semlit command with the resulting text is
straight-forward:
00179 # Commands are removed, and often replaced with some result
00180 $doctext = $prefix . $repl . $suffix;
00181 } # while
This is where the
number of lines in $doctext
can change.
Finally:
00185 return $doctext;
00186 } # process_doc_file
For the most part, Perl takes care of allowing functions to be
recursive. However, we do have some global variables for error
reporting which adds an extra challange:
00030 my $cur_file_name = "";
00031 my $cur_file_linenum = 0;
By keeping the file name and line
number as global variables, any function can report a
user-friendly error message with a minimum of fuss.
Once the initial steps are completed, the process_doc_file()
function is ready to start handling commands. But before it does,
we want to save those global variables:
00165 my ($save_doc_filename, $save_doc_linenum) = ($cur_file_name, $cur_file_linenum);
00166 ($cur_file_name, $cur_file_linenum) = ($doc_filename, 0);
Then the command loop executes. When that is done, just before
returning, we want to restore the global variables:
00183 ($cur_file_name, $cur_file_linenum) = ($save_doc_filename, $save_doc_linenum);
You will see very similar code in the
process_src_file()
function.
The process_doc_file() and process_src_file()
functions read input files, find semlit commands, and call semlit_cmd() to execute
them. The delimiters, "semlit",
and the first field separator are stripped, so that the passed-in
command string starts with the command name.
The tabstop command
simply updates the $tabstop
global variable:
00193 # semlit tabstop - doc: source tab expansion
00194 if ($cmd =~ /^tabstop\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00195 if ($1 =~ /^\d+$/) {
00196 $tabstop = $1; # used by Text::Tabs
00197 return "";
00198 } else {
00199 err("Tabstop value '$1' must be numeric");
00200 return "";
00201 }
00202 }
The $tabstop variable is used
directly by the built-in Text::Tabs
Perl module:
00021 use Text::Tabs;
Note that the $tabstop variable can
also be set on the command line:
00055 GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
next ref prev ref
So, where does this actually get used? Right here:
00374 $iline = expand($iline); # expand tabs according to $tabstop.
(In the function process_src_file().)
The expand()
function is part of the Text::Tabs
module.
The tabstop command
does not return any text (returns "" - empty string).
The srcfile command
is used to scan an slsrc file:
00204 # semlit srcfile - doc: read and process source file
00205 elsif ($cmd =~ /^srcfile\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00206 return process_src_file($1, $2);
00207 }
During the scan, any semlit commands found in the slsrc
file are processed (usually block/endblock commands), and
the source output files are created (both html and src).
The returned text is a link to the plaintext output src file. The author of the sldoc file expects this and positions the srcfile command such that a link to the src file makes sense. For example, near the top of this sldoc file, the srcfile commands are arranged in a bullet list. Each srcfile command is followed by the hint "(right-click and save)" followed by a short description of the file.
The initialsource command
is used to specify a file for initial display in the source frame:
00209 # semlit initialsource - doc: set initial source frame
00210 elsif ($cmd =~ /^initialsource\s*$o_fs\s*([^\s$o_fs]+)\s*/i) {
00211 $o_initialsource = $1;
00212 return "";
00213 }
The file must be the final file name, ending with ".html".
There is no returned text.
The include command
is used to scan an sldoc file:
00215 # semlit include - doc: read and process doc file
00216 elsif ($cmd =~ /^include\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00217 return process_doc_file($1);
00218 }
Note that this represents a recursive call to process_doc_file().
During the scan, the included file is processed the same way as
the master sldoc file.
The returned text is simply the processed html of the included
file.
The insert command is used to insert into the document
output html file a named block of source lines (from slsrc
files):
00220 # semlit insert - doc: insert a source block
00221 elsif ($cmd =~ /^insert\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00222 my $block_name = $1;
00223 if (exists($srcblocks{$block_name})) {
00224 my $num_refs = 1;
00225 my $block_ref_name = $block_name;
00226 if (defined($block_numrefs{$block_name})) {
00227 $num_refs = $block_numrefs{$block_name} + 1;
00228 $block_ref_name = $block_name . "_ref_$num_refs";
00229 }
00230 $block_numrefs{$block_name} = $num_refs;
00231
00232 my $block_str = $srcblocks{$block_name};
00233 return <<__EOF__;
00234 <a name="$block_ref_name" id="$block_ref_name"><\/a>
00235 <small><pre>
00236 $block_str
00237 <\/pre><!-- endblock $block_ref_name --></small>\n
00238 __EOF__
00239 } else {
00240 err("attempt to insert block named '$block_name' but block not defined");
00241 return "";
00242 }
00243 }
Note the use of Perl's "here
document" <<__EOF__
... __EOF__ See Wikipedia
and the Perl
documentation if you are not familiar with this construct.
Also note that the source blocks are
stored in the %srcblocks
hash by the process_src_file() function.
The endblock html comment becomes important during the final fix up step; for blocks inserted
multiple times, the comment is replaced with links to other
references.
The returned text is the processed html of the source block.
The block command essentially tells the process_src_file() function
to start storing source lines into a named block:
00245 # semlit block - src: start a named block of source
00246 elsif ($cmd =~ /^block\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00247 my $block_name = $1;
00248 if (defined($srcblocks{$block_name})) {
00249 err("block '$block_name' already defined");
00250 return "";
00251 }
00252 $srcblocks{$block_name} = "";
00253 $block_numrefs{$block_name} = 0;
00254 $active_srcblocks{$block_name} = $cur_file_linenum;
00255
00256 $global_src_buffer = "<span name=\"$block_name\" id=\"$block_name\"><\/span>";
00257 return "";
00258 }
Since it is possible for source lines to be
included in multiple named blocks, the global hash %active_src_blocks
is used to indicate which named blocks are accumulating.
The endblock command essentially tells the process_src_file() function
to stop storing source lines into the named block:
00260 # semlit endblock - src: end a named block of source
00261 elsif ($cmd =~ /^endblock\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00262 my $block_name = $1;
00263 if (exists($active_srcblocks{$block_name})) {
00264 delete($active_srcblocks{$block_name});
00265 $srcblocks{$block_name} =~ s/\n$//s;
00266 return "";
00267 } else {
00268 err("found endblock for '$block_name', which is not active");
00269 return "";
00270 }
00271 }
00272
00273 # semlit tooltip - create hover over text for a phrase
00274 elsif ($cmd =~ /^tooltip\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00275 my $text_source = $1;
00276 my $text_link = $2;
00277 my $contents = file_get_contents($text_source);
00278 return <<__EOF__;
00279 <a href="#" title="$contents" style="color:2222ee;border-bottom:1px dotted #2222ee;text-decoration: none;">$text_link</a>
00280 __EOF__
00281 }
00282
The block name needs to be supplied
because a nested block does not need to be fully-contained by the
outer block. I.e. if multiple blocks are active, the endblock
does not necessarily end the most-recently opened block.
The tooltip command creates a keyword in block of text
that can be hovered over for additional information.
00273 # semlit tooltip - create hover over text for a phrase
00274 elsif ($cmd =~ /^tooltip\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00275 my $text_source = $1;
00276 my $text_link = $2;
00277 my $contents = file_get_contents($text_source);
00278 return <<__EOF__;
00279 <a href="#" title="$contents" style="color:2222ee;border-bottom:1px dotted #2222ee;text-decoration: none;">$text_link</a>
00280 __EOF__
00281 }
The tooltip data is loaded from a
file on the local server, and the keyword is supplied as the last
parameter. Note: the keyword cannot contain any spaces.
The process_src_file() function is called from the srcfile command. It reads
and processes an slsrc file (program source code). Part of
the processing is to generate two output files: a source html
file, and a src file, stripped of its semlit commands and
ready for compilation.
This function has conceptual similarities with the process_doc_file() function,
but there are important differences. The biggest difference is the
overall approach to processing the file. Here, we process the slsrc
file line-at-a-time. A line must contain either program source
code, or a single semlit command (although that command might be
enclosed in comment delimiters). Instead of replacing the command
with returned content, the line containing the command is simply
discarded.
Here are the high-level steps performed:
Almost the same code exists in process_doc_file(). Here we
are opening an slsrc file which might be in any of the
directories in the array @o_incdirs:
00300 # open source file, using one or more search directories
00301 my $incdir;
00302 my $open_success = 0;
00303 foreach $incdir (@o_incdirs) {
00304 if (open($slsrc_infd, "<", "$incdir/$src_filename")) {
00305 $open_success = 1;
00306 last; # break out of foreach
00307 }
00308 }
00309 if (! $open_success) {
00310 err("could not open src file '$src_filename', skipping");
00311 return "";
00312 }
The output source html file is opened and some initial
content is written:
00314 # create and write initial content to html-ified source file
00315 if (! open($src_html_outfd, ">", "$src_filename.html")) {
00316 err("could not open output source html file '$src_filename.html', skipping");
00317 close($slsrc_infd);
00318 return "";
00319 }
00320 print $src_html_outfd <<__EOF__;
00321 <!DOCTYPE html><html><head><title>$plain_src_filename</title>
00322 <link rel="stylesheet" href="//code.jquery.com/ui/1.11.4/themes/smoothness/jquery-ui.css">
00323 <script src="//code.jquery.com/jquery-1.10.2.js"></script>
00324 <script src="//code.jquery.com/ui/1.11.4/jquery-ui.js"></script>
00325 <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.5/styles/default.min.css">
00326 <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.5/highlight.min.js"></script>
00327 <script>
00328 \$(function() {
00329 \$( document ).tooltip();
00330 });
00331 </script>
00332 <style>
00333 #code {background-color:#ffffff;};
00334 </style>
00335 </head>
00336 <body><h1>$plain_src_filename</h1>
00337 <script>hljs.initHighlightingOnLoad();</script>
00338 <small><pre><code id="code"><table border=0 cellpadding=0 cellspacing=0><tr>
00339 __EOF__
In addition to
the initial html source, it also contains a link to the plain text
src file which can be downloaded.
The output src file is opened:
00341 # Create plaintext source file (without semlit commands)
00342 if (! open($src_outfd, ">", "$plain_src_filename")) {
00343 err("could not open output src '$plain_src_filename', skipping");
00344 close($slsrc_infd);
00345 close($src_html_outfd);
00346 return "";
00347 }
Read the slsrc file, line-by-line:
00356 my $iline;
00357 while (defined($iline = <$slsrc_infd>)) {
00358 chomp($iline); # remove line delim
00359 $iline .= "\n"; # add newline
00360 $iline =~ s/\r//gs; # remove carriage returns, if any
00361 $cur_file_linenum ++;
The chomp() function removes
line delimiter, which, depending on platform, might be something
other than a linefeeds, and the next line adds a newline to be the
line delimiter. Then, carriage returns, if any, are removed. This
unifies different platforms. (Similar
code can be found in process_doc_file().)
The slsrc file contains semlit commands, usually block and endblock. Find and process
them:
00363 # check for semlit commands
00364 if ($iline =~ /$o_delim\s*semlit\s*$o_fs\s*([^$o_delim]+)$o_delim/i) {
00365 semlit_cmd($1);
00366 # discard command line
00367 }
If the slsrc line does not contain a semlit command, then
it is normal source code.
00368 else {
00369 $src_linenum ++; # don't count semlit command lines
Note that
source code lines are counted separately than input slsrc
file lines ($cur_file_linenum counted above). The input
line count ($cur_file_linenum) includes semlit command
lines and is used when printing error messages by err(), while the source code
line count ($src_linenum) does not include semlit command
lines and is used as the user-visible line number in the output
source html file.
Write the plaintext source line to the src file:
00371 print $src_outfd $iline;
In advance of writing the line to the output source html
file, expand tabs and convert the special characters '&',
'<' and '>' to their html forms:
00373 # fix up source for html rendering (tab expansion, special char encoding)
00374 $iline = expand($iline); # expand tabs according to $tabstop.
00375 $iline =~ s/\&/\&/g; $iline =~ s/</\</g; $iline =~ s/>/\>/g;
Note that the expand()
function is part of the Text::Tabs Perl module and uses
the $tabstop global variable to control how it expands.
Check to see if this source line is inside one or more block/endblock
constructs:
00377 # if we are in at least one block, link the source to the earliest block's first doc reference
00378 if (scalar(keys(%active_srcblocks)) > 0) {
There is at least one named block active. Assuming there might be
more than one, find the active block which was most-recently
opened (is at the highest-numbered input line):
00379 # descending sort so that elemet 0 is largest
00380 my @active_blocks = sort { $active_srcblocks{$b} cmp $active_srcblocks{$a} } keys(%active_srcblocks);
This sort construct orders the keys
by descending content of the %active_srcblocks
hash. Thus, $active_blocks[0] is the name of that
most-recently opened block. This is used to construct the link
back to the doc:
00381 my $targ = $active_blocks[0] . "_ref_1";
00382 $src_lines_td .= sprintf("<a href=\"$doc_html_filename#$targ\" target=\"doc\">%05d<\/a>\n", $src_linenum);
00383 if ($global_src_buffer) {
00384 $src_content_td .= sprintf("%s %s", $global_src_buffer, $iline);
00385 $global_src_buffer = "";
00386 }
00387 else {
00388 $src_content_td .= sprintf(" %s", $iline);
00389 }
For each active named block/endblock
construct, add the source code to the %srcblocks hash:
00391 # for each open source block on this line of source, link the doc block to the that source block
00392 foreach my $block_name (keys(%active_srcblocks)) {
00393 my $a = sprintf("<a href=\"$cur_file_name.html#$block_name\" target=\"src\">%05d<\/a> %s", $src_linenum, $iline);
00394 $srcblocks{$block_name} .= $a;
00395 }
This is used by the insert semlit command to
insert the source block into the output doc html file.
For source lines which are not contained in a block/endblock
construct, no doc link is needed when writing to the output source
html file:
00396 } else {
00397 # no active blocks
00398 my $a = sprintf("%05d\n", $src_linenum);
00399 my $c = sprintf(" %s", $iline);
00400 $src_lines_td .= $a;
00401 $src_content_td .= $c;
00402 }
Close the files:
00419 close($slsrc_infd);
00420 close($src_outfd);
00421
00422 print $src_html_outfd "</tr></table></code>\n";
00423 print $src_html_outfd "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n";
00424 print $src_html_outfd "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n";
00425 print $src_html_outfd "</pre></small></body></html>\n";
00426 close($src_html_outfd);
Note the many newlines
printed. This is so that clicking on a line number which is near the
end of the source file will still position that line of source at the
top of the screen.
Also, if the user
accidentally started one or more named blocks but did not end
them, print errors and force them ended:
00428 # if the source file started a block but reached eof without ending it, end it here.
00429 foreach (keys(%active_srcblocks)) {
00430 err("block named '$_' started but not ended");
00431 semlit_cmd("endblock$o_fs$_"); # end it for the user
00432 }
Do it by calling semlit_cmd() function,
passing it the endblock
command (as if the slsrc file had it).
But before we return, restore the $cur_file_name and $cur_file_linenum
variables to their previous state. Then return.
00434 # the semlit.srcfile command writes a link to the plaintext source file
00435 ($cur_file_name, $cur_file_linenum) = ($save_doc_filename, $save_doc_linenum);
00436 return "<a href=\"$plain_src_filename\">$plain_src_filename</a>";
Since the call to process_src_file()
came from semlit_cmd() executing the srcfile
command, and that command is used in the sldoc file, the
thing we return is a link to the plaintext output src
file.
First, let's declare a couple of globals that will be used for
helping the user:
00026 my $tool = "semlit.pl";
00027 my $usage_str = "$tool [-h] [-d delim] [-f fs] [-I dir] [-t tabstop] [files]";
next ref last ref
(The usage() function also uses
this.) In the main code, the "-h"option calls help()
to print a more-extensive help:
00471 sub help {
00472 my($err_str) = @_;
00473
00474 if (defined $err_str) {
00475 print "$tool: $err_str\n\n";
00476 }
00477 print <<__EOF__;
00478 Usage: $usage_str
00479 Where:
00480 -h - print help screen
00481 -d delim - delimiter character at start and end of a semlit command.
00482 (default to '=')
00483 -f fs - field separator character within a semlit command.
00484 (default to ',')
00485 -i initialsource - file name for initial source frame.
00486 (default to "blank.htmo") Also, initialsource semlit command.
00487 -I dir - directory to find files for 'srcfile' and 'include' commands.
00488 (default to ".") The "-I dir" option can be repeated.
00489 -t tabstop - convert tabs to "tabstop" spaces.
00490 (default to '4')
00491 files - zero or more input files. If omitted, inputs from stdin.
00492
00493 __EOF__
00494
00495 exit($exit_status);
00496 } # help
Note the use
of Perl's "here document" <<__EOF__
... __EOF__ See Wikipedia
and the Perl
documentation if you are not familiar with this construct.
This is a simple routine to read the contents of a file for the purpose
of filling in the tooltips:
00459 sub file_get_contents{
00460 my ($text_file) = @_;
00461 open FILE, $text_file or die $!;
00462 flock FILE, 1 or die $!; # wait for lock
00463 seek(FILE, 0, 0); # move pointer to beginning
00464 my $slurp = do{local $/; <FILE>};
00465 flock FILE, 8; # release the lock
00466 close(FILE);
00467
00468 return $slurp;
00469 } # file_get_contents
When there is an obvious user error in the invocation of the
semlit program, the usage() function is
called.
For example:
00055 GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
first ref prev ref
Note the use of the
logical OR construct (||).
Because of Perl's C-like short-circuit
evaluation, the right-hand expression (function call to usage()) is only executed
if the left-hand side (function call to GetOptions()) returns false. I.e. usage is
called if GetOptions()
fails. Non-Perl programmers might be tempted to use a simple if/then construct, but
the logical OR construct is such a common Perl idiom that the wise
reader will learn it.
In other cases, an error is discovered in one of the input files,
sldoc and/or slsrc. In those cases, the err() function is called.
First, let's declare a couple of globals that will be used for
helping the user:
00026 my $tool = "semlit.pl";
00027 my $usage_str = "$tool [-h] [-d delim] [-f fs] [-I dir] [-t tabstop] [files]";
first ref prev ref
(The help() function also uses
this.) The usage() function allows an optional error
message to be passed in, which is printed before the usage string:
00448 sub usage {
00449 my($err_str) = @_;
00450
00451 if (defined $err_str) {
00452 print STDERR "$tool: $err_str\n\n";
00453 }
00454 print STDERR "Usage: $usage_str\n\n";
00455 $exit_status ++;
00456 exit($exit_status);
00457 } # usage
The function err() is a programmer convenience which prints an
error message, along with the file name and line number where the
error was discovered.
00440 sub err {
00441 my ($msg) = @_;
00442
00443 print STDERR "Error [$cur_file_name:$cur_file_linenum], $msg\n";
00444 $exit_status ++;
00445 } # err
It also increments
$exit_status, which starts out at zero (success):
00044 my $exit_status = 0; # assume success
and is used when exiting the program:
00130 # All done.
00131 exit($exit_status);
Thus, a non-zero (failure) exit status also
indicates how many errors there were.
As mentioned above, the first line of semlit.pl is a
fairly common shebang:
00004 #!/usr/local/bin/perl -w
This Unix construct, combined with setting the executable bit on
the file, is intended to allow the tool to be run by simply typing
the file name as a command (assuming that the PATH environment variable
is set up right). However, it does require that the physical
location of the Perl interpreter be encoded directly in the file.
Unfortunately, different Unix systems install Perl in different
places - sometimes under /usr/local, sometimes in /bin,
sometimes under /opt.
I vaguely remember that there is a clever way to re-code that
line such that the shell will search for the Perl interpreter in
the PATH environment variable. But I don't remember how
to do it.
One way to run semlit.pl without having to set the Perl
interpreter's full location is:
perl SemLitPath/semlit.pl ...
But even that is somewhat unsatisfying for an experienced Unix
user. Lazy as we are, we would prefer to let PATH do the work of
finding the program, so we just enter:
semlit
...
One common way to accomplish this is with a wrapper shell script
which encapsulates any annoying details of running the tool. Hence
the semlit file.
The first line of semlit is the standard shebang:
00004 #!/bin/sh
But this time it references the universally-respected location
for a Bourne-compatible
shell. No chance that this won't work on some flavor of
Unix.
Next save the initial working directory (so that we can get back
to it).
00008 IWD=`pwd` # remember initial working directory
I establish the convention that the wrapper script must
be in the same directory as the perl script. So, the next thing to
do is figure out which directory contains the wrapper which is
running:
00010 # Find dir where tool is stored (useful for finding related files)
00011 TOOLDIR=`dirname $0`
00012 # Make sure TOOLDIR is a full path name (not relative)
00013 cd $TOOLDIR; TOOLDIR=`pwd`; cd $IWD
This might look more complicated than it needs to be. Isn't the
first line enough?
00011 TOOLDIR=`dirname $0`
No, it isn't. Suppose you are in your home directory, and you
have placed the semlit files in $HOME/bin. Then let's
say you execute the program by entering:
bin/semlit
...
That is perfectly legal, and dirname $0 will, not
surprisingly, return bin, a relative path. But I want a
fully-qualified path, so I include the three commands:
00013 cd $TOOLDIR; TOOLDIR=`pwd`; cd $IWD
You simply cd to that potentially-relative location and use pwd to get the full path. Then cd back to the initial working directory. This is easier than trying to parse all the possible return values for dirname.
All that remains to be done is to run the perl interpreter as a
comand (such that PATH is used):
00015 perl $TOOLDIR/semlit.pl $*