semlit Program Description

This is a semi-literate document for the program semlit.pl and it's shell wrapper semlit.sh.  This program is used to create semi-literate documentation (of which this document is an example).  Semi-literate documents like this are intended to explain the internals of a program, which is different than user documentation.  If you are an end user, you probably want the user document.

Copyright 2012, 2015 Steven Ford http://geeky-boy.com and licensed "public domain" style under:

CC0
To the extent possible under law, Steven Ford has waived all copyright and related or neighboring rights to this work. This work is published from: United States.  The project home is https://github.com/fordsfords/semlit/tree/gh-pages.  To contact me, Steve Ford, project owner, you can find my email address at http://geeky-boy.com.  Can't see it?  Keep looking.



  1. semlit Program Description
    1. Introduction
    2. Program explanation: semlit.pl
      1. Main
        1. Process command line
        2. Read and process the input sldoc file
        3. Fix up multiple source block references
        4. Write out the main documentation html file and exit
        5. Create a few html output files
      2. Function: process_doc_file()
        1. Open sldoc input file
        2. Read the file into memory
        3. For each semlit command contained in that file
        4. Execute the command
        5. Replace the command with the html returned by the command execution and loop
        6. return the processed html
        7. Recursion
      3. Function: semlit_cmd()
        1. Command: tabstop
        2. Command: srcfile
        3. Command: initialsource
        4. Command: include
        5. Command: insert
        6. Command: block
        7. Command: endblock
        8. Command: tooltip
      4. Function: process_src_file()
        1. Open the slsrc input file
        2. Open the output source html file
        3. Open the output src file
        4. For each input line in the slsrc file:
        5. If it is a semlit command, process it
        6. Else it is a source line:
        7. Write the line to the src file
        8. Html-ify the line
        9. If there are active source blocks accumulating:
        10. Create doc link and write source html file
        11. Add the source line to all of the active blocks
        12. Else no active block, write source html file without link
        13. Close files and wrap up
        14. Return html link to the src file
      5. Function: help()
      6. Function: file_get_contents()
      7. Error handling
        1. Function: usage()
        2. Function: err()
    3. Wrapper Shell Script

Introduction

This document describes the internals of the semlit.pl program, and is intended to be read by a programmer who wants to understand, maintain, and perhaps reuse the code.  Before reading this documentation, you are expected to have a good user-level understanding of SEMLIT.  Please be familiar user document with  before starting this, unless you are just getting a feel for what SEMLIT documentation is like.

The main program is written in Perl. The reader is assumed to have at least entry-level knowledge of Perl. Some of the more-advanced Perl constructs are explained for the benefit of the novice.

There are two program source files:

Program explanation: semlit.pl


Main

Here are the high-level steps performed by the main semlit program:

  1. Process command line.
  2. Read and process the input sldoc file. This has the side effect of reading the slsrc files and generating the src files.
  3. Fix up multiple source block references.
  4. Write out the main documentation html file and exit.
  5. Create a few html output files.

Process command line

Let's declare some global variables to hold values for program options, and set their default values:

00048  my $o_help;        # -h
00049  my $o_fs = ",";    # -f
00050  my $o_delim = "="; # -d
00051  my $o_initialsource = "blank.html";  # -i
00052  my @o_incdirs = (".");  # GetOptions will append additional dirs for each "-I".
00053  $tabstop = 4;  # defined and used by Text::Tabs - see "expand()" function
00054  
00055  GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
00056  if (defined($o_help)) {
00057      help();  # if -h had a value, it would be in $opt_h
00058  }
Note the use of the Perl built-in function GetOptions. This function does the work of parsing the command-line and setting new values into the global option variables. See the Perl documentation if you are not familiar with the construct.

Read and process the input sldoc file

The program assumes that there is a single master sldoc file supplied on the command line. So, early on we get that file name:

00060  if (scalar(@ARGV) != 1) {
00061      usage("Error, .sldoc file missing");
00062  }
00063  $main_doc_filename = $ARGV[0];
00064  if ( ! -r "$main_doc_filename" ) {
00065      usage("Error, could not read '$main_doc_filename'");
00066  }
Note the use of the perl construct scalar(@array) to determine the number of elements in the array. There are shorter ways to do this (in terms of keystrokes), but I prefer the explicit construct. Makes it easier to spot. Also note the use of -r file to test if the file exists and is readable.

Once the file name is determined, we now process that input file:

00073  # Main loop; read each line in doc file
00074  
00075  my $doc_html_str = process_doc_file($main_doc_filename);
next ref  last ref
Note that the function process_doc_file() returns the html output for the documentation.

Fix up multiple source block references

An sldoc file might insert the same block of source code multiple times. The source html will link to the first doc reference to it. How does the reader find the other references? At the end of an inserted source block, links will be inserted to adjacent references within the doc.

However, the sldoc file is processed sequentially. When a reference (insert command) is seen, it is not yet known if there will be subsequent references. So, a fixup step is added after the sldoc file is fully processed. That fixup step looks at each source block that has multiple references and adds the appropriate links (next/prev).

Since this step needs a count of the number of references, we need code in the semlit_cmd() function to do the counting:

00224              my $num_refs = 1;
00225              my $block_ref_name = $block_name;
00226              if (defined($block_numrefs{$block_name})) {
00227                  $num_refs = $block_numrefs{$block_name} + 1;
00228                  $block_ref_name = $block_name . "_ref_$num_refs";
00229              }
00230              $block_numrefs{$block_name} = $num_refs;
If $block_numrefs{$block_name} already exists, then this is not the first insert for this block.

Back to main and the fix-up step:

00077  # fix up multiple source references
00078  foreach my $blockname (keys(%block_numrefs)) {
00079      if ($block_numrefs{$blockname} > 1) {
00080          # First ref points to next and last
00081          my $refnum = 1;
00082          my $this_block = $blockname . "_ref_" . ($refnum);
00083          my $first_block = $this_block;
00084          my $last_block = $blockname . "_ref_" . $block_numrefs{$blockname};
00085          my $next_block = $blockname . "_ref_" . ($refnum + 1);
00086          $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$next_block">next ref<\/a>  <a href="#$last_block">last ref<\/a><\/pre>/s;
00087  
00088          # Middle refs point to previous and next
00089          my $prev_block = $this_block;
00090          for ($refnum = 2; $refnum <= $block_numrefs{$blockname} - 1; $refnum ++) {
00091              # middle refs point to prev and next
00092              $this_block = $blockname . "_ref_" . ($refnum);
00093              $next_block = $blockname . "_ref_" . ($refnum + 1);
00094              $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$next_block">next ref<\/a>  <a href="#$prev_block">prev ref<\/a><\/pre>/s;
00095              $prev_block = $this_block;
00096          }
00097  
00098          # last ref points to first and previous
00099          $this_block = $blockname . "_ref_" . ($refnum);
00100          $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$first_block">first ref<\/a>  <a href="#$prev_block">prev ref<\/a><\/pre>/s;
00101      }
00102  }
The first reference has no "prev", so a link to the last reference is included. The last reference has no "next", so a link to the first reference is included. (The source html always links to the first reference.)

Write out the main documentation html file and exit

A bit earlier in the program, the output file was opened:

00068  # open main doc file
00069  
00070  $doc_html_filename = basename($main_doc_filename) . ".html";  # strip directory
00071  open($doc_html_outfd, ">", $doc_html_filename) || die "Error, could not open htmlfile '$doc_html_filename'";
Then, after the input file is processed, the returned html output is written to the output file:
00104  # write doc html file
00105  
00106  print $doc_html_outfd "$doc_html_str\n";
00107  close($doc_html_outfd);
00108  
00109  # Create frameset page
00110  
00111  my $index_o_file;
00112  open($index_o_file, ">", "index.html") || die "Error, could not open htmlfile 'index.html'";
00113  print $index_o_file <<__EOF__;
00114  <html><head></head>
00115  <frameset cols="50%,*">
00116  <frame src="$doc_html_filename" name="doc">
00117  <frame src="$o_initialsource" name="src">
00118  </frameset>
00119  </html>
00120  __EOF__
00121  close($index_o_file);
00122  
00123  # Create blank page for initial source frame
00124  
00125  my $blank_o_file;
00126  open($blank_o_file, ">", "blank.html") || die "Error, could not open htmlfile 'blank.html'";
00127  print $blank_o_file "<html><head></head><body>Click a source line number to see the line in context.</body></html>\n";
00128  close($blank_o_file);
00129  
00130  # All done.
00131  exit($exit_status);
The variable $exit_status is initialized to zero in main and is incremented each time an error is reported. Thus, if no errors are reported, the program exits with success (0).

Create a few html output files

There are a couple of misc support files needed by the html documentation. Since the documentation is best viewed with html frames, the first file is the frameset:

00109  # Create frameset page
00110  
00111  my $index_o_file;
00112  open($index_o_file, ">", "index.html") || die "Error, could not open htmlfile 'index.html'";
00113  print $index_o_file <<__EOF__;
00114  <html><head></head>
00115  <frameset cols="50%,*">
00116  <frame src="$doc_html_filename" name="doc">
00117  <frame src="$o_initialsource" name="src">
00118  </frameset>
00119  </html>
00120  __EOF__
00121  close($index_o_file);
Note the use of Perl's "here document" <<__EOF__ ... __EOF__ See Wikipedia and the Perl documentation if you are not familiar with this construct. Contrast this with the next code fragment.

By default, when the documentation is first brought up, there is no source to display in the source frame. So we need an almost blank page:

00123  # Create blank page for initial source frame
00124  
00125  my $blank_o_file;
00126  open($blank_o_file, ">", "blank.html") || die "Error, could not open htmlfile 'blank.html'";
00127  print $blank_o_file "<html><head></head><body>Click a source line number to see the line in context.</body></html>\n";
00128  close($blank_o_file);
I didn't use a "here" document for this, but I could have.

That initial blank page can be overridden with the "-i" command-line option, or with the "initialsource" semlit command.

Function: process_doc_file()

The process_doc_file() function is called from the main program:

00073  # Main loop; read each line in doc file
00074  
00075  my $doc_html_str = process_doc_file($main_doc_filename);
first ref  prev ref
It is also called (recursively) from the semlit command include.

The function starts out with:

00137  sub process_doc_file {
00138      my ($doc_filename) = @_;
Since this function can be called recursively, we save any existing file name and line number (they are restored before return).

The main function is to open an sldoc file, read the file and process the semlit commands, and return the documentation html output. Note that the sldoc input file is already in html form; once the file is read, all that remains to be done is to process the semlit commands. In the degenerate simple case of an sldoc file containing no semlit commands at all, the output html file will be exactly equal to the sldoc input file. (An unlikely case since it is the semlit commands which provide the value of semi-literate documentation.)

Here are the high-level steps it performs:

  1. Open sldoc input file.
  2. Read the file into memory.
  3. For each semlit command contained in that file:
    1. Execute the command.
    2. Replace the command with the html returned by the command execution and loop.
  4. Return the processed html.

In an earlier version, the program read the sldoc input file one line at a time. The line was tested to determine if it was an semlit command. If so, the line was executed and the line dropped. This approach worked fine when I used a simple text editor to create the html sldoc file. But editing html with a simple text editor is painful, error-prone, and inefficient. I really wanted to use an html editor. But html editors generally do not make it easy to control how the content is arranged on lines of the physical html file. I discovered that multiple semlit commands can be packed onto a single line, and even split across lines.

Fortunately, Perl provides powerful constructs which make this kind of processing easy. Instead of looking at the file line-by-line, the entire file can be read into a single string variable, and regular expression matching can be used to find the commands, one at a time. You will see this below.

Open sldoc input file

The semlit program supports having libraries of sldoc files for standard boilerplate. I chose to model this after the C compiler: the programmer simply specifies the name of the file, and one or more search directories can be specified on the command line with the -I option. These directories are set up in the main program thus:

00052  my @o_incdirs = (".");  # GetOptions will append additional dirs for each "-I".
...
00055  GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
next ref  last ref
The GetOptions() function will parse out as many -I options that the user supplies and leaves the @incdirs array set with the directories (including ".").

Back in the process_doc_file() function, the file is opened as follows:

00141      # open source file, using one or more search directories
00142  
00143      my $incdir;
00144      my $open_success = 0;
00145      foreach $incdir (@o_incdirs) {
00146          if (open($doc_infd, "<", "$incdir/$doc_filename")) {
00147              $open_success = 1;
00148              last;  # break out of foreach
00149          }
00150      }
00151      if (! $open_success) {
00152          err("could not open doc file '$doc_filename', skipping");
00153          return;
00154      }
It loops through the array of include directories until the file can be opened. Note that almost the same code exists in process_src_file().

Read the file into memory

Perl makes this easy:

00156      # Read entire file into memory
00157  
00158      my @doctexts = <$doc_infd>;
00159      close($doc_infd);
00160      chomp(@doctexts);  # remove line delims from every line
00161      my $num_lines = scalar(@doctexts);  # count lines in file
00162      my $doctext = join("\n", @doctexts) . "\n";  # combine as a single string
00163      $doctext =~ s/\r//gs;  # remove carriage returns, if any
The chomp() function removes line delimiters, which, depending on platform, might be something other than linefeeds, and the join() combines the lines into one long string, and inserts linefeeds as line endings. Then, carriage returns, if any, are removed. This unifies different platforms. (Similar code can be found in process_src_file().)

For each semlit command contained in that file

The basic algorithm is to find the first semlit command in the file, execute it, and then replace that semlit command with the results of the command. Note that it is possible that the replacement text contains semlit commands. So, each time a semlit command is processed, the entire file must be re-scanned from the beginning. Instead of trying to process the file line-by-line, the loop executes until there are no more commands to execute:

00168      # process semlit commands
00169      while ($doctext =~ /$o_delim\s*semlit\s*$o_fs\s*([^$o_delim]+)$o_delim/is) {
00170          my $cmd = $1;  # text of command (minus standard stuff)
The variable $o_delim contains the semlit command delimiter and defaults to "=". The variable $o_fs contains the semlit command field separator and defaults to ",".

Note that the delimiter will certainly be used in the code without being associated with a semlit command, so the mere presence of the delimiter does not flag the start of a semlit command. The word "semlit" must immediately follow the delimiter ("=semlit"), and that must be followed by a field separator (","). Also note that the match pattern captures the text between that field separator and the ending delimiter. That represents the semlit command keyword and parameters.

The next lines captures the file contents before and after the semlit command:

00171          my $prefix = $PREMATCH;  # text preceiding the command
00172          my $suffix = $POSTMATCH;  # text after the command
Note the use of the $PREMATCH and $POSTMATCH built-in variables. See the Perl documentation if you are not familiar with this construct.

If errors are detected in the source files, it is helpful to print error messages which include the line number. This is challenging in this function because we are not processing on a line-by-line basis. It is further complicated by the fact that as commands are replaced by the returned text form command execution, the number of lines in the $doctext changes over time. However, since the commands are processed in order, we can calculate the line number by looking at the number of lines following the command, and subtracting it from the total number:

00174          # calculate line number containing the start of this semlit command
00175          $cur_file_linenum = $num_lines - scalar(my @t = split("\n", $suffix)) + 1;
As you know from the Perl documentation, the split() function will by default not create an empty entry at the end (after the final '\n').

Execute the command

Now execute the command and capture the replacement text:

00177          my $repl = semlit_cmd($cmd);
The semlit_cmd() function returns formatted html.

Replace the command with the html returned by the command execution and loop

Replacing the matched semlit command with the resulting text is straight-forward:

00179          # Commands are removed, and often replaced with some result
00180          $doctext = $prefix . $repl . $suffix;
00181      }  # while
This is where the number of lines in $doctext can change.

return the processed html

Finally:

00185      return $doctext;
00186  }  # process_doc_file

Recursion

For the most part, Perl takes care of allowing functions to be recursive. However, we do have some global variables for error reporting which adds an extra challange:

00030  my $cur_file_name = "";
00031  my $cur_file_linenum = 0;
By keeping the file name and line number as global variables, any function can report a user-friendly error message with a minimum of fuss.

Once the initial steps are completed, the process_doc_file() function is ready to start handling commands. But before it does, we want to save those global variables:

00165      my ($save_doc_filename, $save_doc_linenum) = ($cur_file_name, $cur_file_linenum);
00166      ($cur_file_name, $cur_file_linenum) = ($doc_filename, 0);
Then the command loop executes. When that is done, just before returning, we want to restore the global variables:
00183      ($cur_file_name, $cur_file_linenum) = ($save_doc_filename, $save_doc_linenum);
You will see very similar code in the process_src_file() function.

Function: semlit_cmd()

The process_doc_file() and process_src_file() functions read input files, find semlit commands, and call semlit_cmd() to execute them. The delimiters, "semlit", and the first field separator are stripped, so that the passed-in command string starts with the command name.

Command: tabstop

The tabstop command simply updates the $tabstop global variable:

00193      # semlit tabstop - doc: source tab expansion
00194      if ($cmd =~ /^tabstop\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00195          if ($1 =~ /^\d+$/) {
00196              $tabstop = $1;  # used by Text::Tabs
00197              return "";
00198          } else {
00199              err("Tabstop value '$1' must be numeric");
00200              return "";
00201          }
00202      }
The $tabstop variable is used directly by the built-in Text::Tabs Perl module:
00021  use Text::Tabs;
Note that the $tabstop variable can also be set on the command line:
00055  GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
next ref  prev ref
So, where does this actually get used? Right here:
00374              $iline = expand($iline);  # expand tabs according to $tabstop.
(In the function process_src_file().) The expand() function is part of the Text::Tabs module.

The tabstop command does not return any text (returns "" - empty string).

Command: srcfile

The srcfile command is used to scan an slsrc file:

00204      # semlit srcfile - doc: read and process source file
00205      elsif ($cmd =~ /^srcfile\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00206          return process_src_file($1, $2);
00207      }
During the scan, any semlit commands found in the slsrc file are processed (usually block/endblock commands), and the source output files are created (both html and src).

The returned text is a link to the plaintext output src file. The author of the sldoc file expects this and positions the srcfile command such that a link to the src file makes sense. For example, near the top of this sldoc file, the srcfile commands are arranged in a bullet list. Each srcfile command is followed by the hint "(right-click and save)" followed by a short description of the file.

Command: initialsource

The initialsource command is used to specify a file for initial display in the source frame:

00209      # semlit initialsource - doc: set initial source frame
00210      elsif ($cmd =~ /^initialsource\s*$o_fs\s*([^\s$o_fs]+)\s*/i) {
00211          $o_initialsource = $1;
00212          return "";
00213      }
The file must be the final file name, ending with ".html".

There is no returned text.

Command: include

The include command is used to scan an sldoc file:

00215      # semlit include - doc: read and process doc file
00216      elsif ($cmd =~ /^include\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00217          return process_doc_file($1);
00218      }
Note that this represents a recursive call to process_doc_file(). During the scan, the included file is processed the same way as the master sldoc file.

The returned text is simply the processed html of the included file.

Command: insert

The insert command is used to insert into the document output html file a named block of source lines (from slsrc files):

00220      # semlit insert - doc: insert a source block
00221      elsif ($cmd =~ /^insert\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00222          my $block_name = $1;
00223          if (exists($srcblocks{$block_name})) {
00224              my $num_refs = 1;
00225              my $block_ref_name = $block_name;
00226              if (defined($block_numrefs{$block_name})) {
00227                  $num_refs = $block_numrefs{$block_name} + 1;
00228                  $block_ref_name = $block_name . "_ref_$num_refs";
00229              }
00230              $block_numrefs{$block_name} = $num_refs;
00231  
00232              my $block_str = $srcblocks{$block_name};
00233              return <<__EOF__;
00234  <a name="$block_ref_name" id="$block_ref_name"><\/a>
00235  <small><pre>
00236  $block_str
00237  <\/pre><!-- endblock $block_ref_name --></small>\n
00238  __EOF__
00239          } else {
00240              err("attempt to insert block named '$block_name' but block not defined");
00241              return "";
00242          }
00243      }
Note the use of Perl's "here document" <<__EOF__ ... __EOF__ See Wikipedia and the Perl documentation if you are not familiar with this construct. Also note that the source blocks are stored in the %srcblocks hash by the process_src_file() function.

The endblock html comment becomes important during the final fix up step; for blocks inserted multiple times, the comment is replaced with links to other references.

The returned text is the processed html of the source block.

Command: block

The block command essentially tells the process_src_file() function to start storing source lines into a named block:

00245      # semlit block - src: start a named block of source
00246      elsif ($cmd =~ /^block\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00247          my $block_name = $1;
00248          if (defined($srcblocks{$block_name})) {
00249              err("block '$block_name' already defined");
00250              return "";
00251          }
00252          $srcblocks{$block_name} = "";
00253          $block_numrefs{$block_name} = 0;
00254          $active_srcblocks{$block_name} = $cur_file_linenum;
00255          
00256          $global_src_buffer = "<span name=\"$block_name\" id=\"$block_name\"><\/span>";
00257          return "";
00258      }
Since it is possible for source lines to be included in multiple named blocks, the global hash %active_src_blocks is used to indicate which named blocks are accumulating.

Command: endblock

The endblock command essentially tells the process_src_file() function to stop storing source lines into the named block:

00260      # semlit endblock - src: end a named block of source
00261      elsif ($cmd =~ /^endblock\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00262          my $block_name = $1;
00263          if (exists($active_srcblocks{$block_name})) {
00264              delete($active_srcblocks{$block_name});
00265              $srcblocks{$block_name} =~ s/\n$//s;
00266              return "";
00267          } else {
00268              err("found endblock for '$block_name', which is not active");
00269              return "";
00270          }
00271      }
00272  
00273      # semlit tooltip - create hover over text for a phrase
00274      elsif ($cmd =~ /^tooltip\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00275          my $text_source = $1;
00276          my $text_link = $2;
00277          my $contents = file_get_contents($text_source);
00278          return <<__EOF__;
00279  <a href="#" title="$contents" style="color:2222ee;border-bottom:1px dotted #2222ee;text-decoration: none;">$text_link</a>
00280  __EOF__
00281      }
00282  
The block name needs to be supplied because a nested block does not need to be fully-contained by the outer block. I.e. if multiple blocks are active, the endblock does not necessarily end the most-recently opened block.

Command: tooltip

The tooltip command creates a keyword in block of text that can be hovered over for additional information.

00273      # semlit tooltip - create hover over text for a phrase
00274      elsif ($cmd =~ /^tooltip\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00275          my $text_source = $1;
00276          my $text_link = $2;
00277          my $contents = file_get_contents($text_source);
00278          return <<__EOF__;
00279  <a href="#" title="$contents" style="color:2222ee;border-bottom:1px dotted #2222ee;text-decoration: none;">$text_link</a>
00280  __EOF__
00281      }
The tooltip data is loaded from a file on the local server, and the keyword is supplied as the last parameter. Note: the keyword cannot contain any spaces.

Function: process_src_file()

The process_src_file() function is called from the srcfile command. It reads and processes an slsrc file (program source code). Part of the processing is to generate two output files: a source html file, and a src file, stripped of its semlit commands and ready for compilation.

This function has conceptual similarities with the process_doc_file() function, but there are important differences. The biggest difference is the overall approach to processing the file. Here, we process the slsrc file line-at-a-time. A line must contain either program source code, or a single semlit command (although that command might be enclosed in comment delimiters). Instead of replacing the command with returned content, the line containing the command is simply discarded.

Here are the high-level steps performed:

  1. Open the slsrc input file.
  2. Open the output source html file.
  3. Open the output src file.
  4. For each input line in the slsrc file:
    1. If it is a semlit command, process it.
    2. Else it is a source line:
      1. Write the line to the src file.
      2. Html-ify the line.
      3. If there are active source blocks accumulating:
        1. Create doc link and write source html file.
        2. Add the source line to all of the active blocks.
      4. Else no active block, write source html file without link.
  5. Close files and wrap up.
  6. Return html link to the src file.

Open the slsrc input file

Almost the same code exists in process_doc_file(). Here we are opening an slsrc file which might be in any of the directories in the array @o_incdirs:

00300      # open source file, using one or more search directories
00301      my $incdir;
00302      my $open_success = 0;
00303      foreach $incdir (@o_incdirs) {
00304          if (open($slsrc_infd, "<", "$incdir/$src_filename")) {
00305              $open_success = 1;
00306              last;  # break out of foreach
00307          }
00308      }
00309      if (! $open_success) {
00310          err("could not open src file '$src_filename', skipping");
00311          return "";
00312      }

Open the output source html file

The output source html file is opened and some initial content is written:

00314      # create and write initial content to html-ified source file
00315      if (! open($src_html_outfd, ">", "$src_filename.html")) {
00316          err("could not open output source html file '$src_filename.html', skipping");
00317          close($slsrc_infd);
00318          return "";
00319      }
00320      print $src_html_outfd <<__EOF__;
00321  <!DOCTYPE html><html><head><title>$plain_src_filename</title>
00322  <link rel="stylesheet" href="//code.jquery.com/ui/1.11.4/themes/smoothness/jquery-ui.css">
00323  <script src="//code.jquery.com/jquery-1.10.2.js"></script>
00324  <script src="//code.jquery.com/ui/1.11.4/jquery-ui.js"></script>
00325  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.5/styles/default.min.css">
00326  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.5/highlight.min.js"></script>
00327  <script>
00328    \$(function() {
00329      \$( document ).tooltip();
00330    });
00331  </script>
00332  <style>
00333  #code {background-color:#ffffff;};
00334  </style>
00335  </head>
00336  <body><h1>$plain_src_filename</h1>
00337  <script>hljs.initHighlightingOnLoad();</script>
00338  <small><pre><code id="code"><table border=0 cellpadding=0 cellspacing=0><tr>
00339  __EOF__
In addition to the initial html source, it also contains a link to the plain text src file which can be downloaded.

Open the output src file

The output src file is opened:

00341      # Create plaintext source file (without semlit commands)
00342      if (! open($src_outfd, ">", "$plain_src_filename")) {
00343          err("could not open output src '$plain_src_filename', skipping");
00344          close($slsrc_infd);
00345          close($src_html_outfd);
00346          return "";
00347      }

For each input line in the slsrc file:

Read the slsrc file, line-by-line:

00356      my $iline;
00357      while (defined($iline = <$slsrc_infd>)) {
00358          chomp($iline);  # remove line delim
00359          $iline .= "\n";  # add newline
00360          $iline =~ s/\r//gs;  # remove carriage returns, if any
00361          $cur_file_linenum ++;
The chomp() function removes line delimiter, which, depending on platform, might be something other than a linefeeds, and the next line adds a newline to be the line delimiter. Then, carriage returns, if any, are removed. This unifies different platforms. (Similar code can be found in process_doc_file().)

If it is a semlit command, process it

The slsrc file contains semlit commands, usually block and endblock. Find and process them:

00363          # check for semlit commands
00364          if ($iline =~ /$o_delim\s*semlit\s*$o_fs\s*([^$o_delim]+)$o_delim/i) {
00365              semlit_cmd($1);
00366              # discard command line
00367          }

Else it is a source line:

If the slsrc line does not contain a semlit command, then it is normal source code.

00368          else {
00369              $src_linenum ++;  # don't count semlit command lines
Note that source code lines are counted separately than input slsrc file lines ($cur_file_linenum counted above). The input line count ($cur_file_linenum) includes semlit command lines and is used when printing error messages by err(), while the source code line count ($src_linenum) does not include semlit command lines and is used as the user-visible line number in the output source html file.

Write the line to the src file

Write the plaintext source line to the src file:

00371              print $src_outfd $iline;

Html-ify the line

In advance of writing the line to the output source html file, expand tabs and convert the special characters '&', '<' and '>' to their html forms:

00373              # fix up source for html rendering (tab expansion, special char encoding)
00374              $iline = expand($iline);  # expand tabs according to $tabstop.
00375              $iline =~ s/\&/\&amp;/g;  $iline =~ s/</\&lt;/g;  $iline =~ s/>/\&gt;/g;
Note that the expand() function is part of the Text::Tabs Perl module and uses the $tabstop global variable to control how it expands.

If there are active source blocks accumulating:

Check to see if this source line is inside one or more block/endblock constructs:

00377              # if we are in at least one block, link the source to the earliest block's first doc reference
00378              if (scalar(keys(%active_srcblocks)) > 0) {

Create doc link and write source html file

There is at least one named block active. Assuming there might be more than one, find the active block which was most-recently opened (is at the highest-numbered input line):

00379                  # descending sort so that elemet 0 is largest
00380                  my @active_blocks = sort { $active_srcblocks{$b} cmp $active_srcblocks{$a} } keys(%active_srcblocks);
This sort construct orders the keys by descending content of the %active_srcblocks hash. Thus, $active_blocks[0] is the name of that most-recently opened block. This is used to construct the link back to the doc:
00381                  my $targ = $active_blocks[0] . "_ref_1";
00382                  $src_lines_td .= sprintf("<a href=\"$doc_html_filename#$targ\" target=\"doc\">%05d<\/a>\n", $src_linenum);
00383                  if ($global_src_buffer) {
00384                      $src_content_td .= sprintf("%s  %s", $global_src_buffer, $iline);
00385                      $global_src_buffer = "";
00386                  }
00387                  else {
00388                      $src_content_td .= sprintf("  %s", $iline);
00389                  }

Add the source line to all of the active blocks

For each active named block/endblock construct, add the source code to the %srcblocks hash:

00391                  # for each open source block on this line of source, link the doc block to the that source block
00392                  foreach my $block_name (keys(%active_srcblocks)) {
00393                      my $a = sprintf("<a href=\"$cur_file_name.html#$block_name\" target=\"src\">%05d<\/a>  %s", $src_linenum, $iline);
00394                      $srcblocks{$block_name} .= $a;
00395                  }
This is used by the insert semlit command to insert the source block into the output doc html file.

Else no active block, write source html file without link

For source lines which are not contained in a block/endblock construct, no doc link is needed when writing to the output source html file:

00396              } else {
00397                  # no active blocks
00398                  my $a = sprintf("%05d\n", $src_linenum);
00399                  my $c = sprintf("  %s", $iline);
00400                  $src_lines_td .= $a;
00401                  $src_content_td .= $c;
00402              }

Close files and wrap up

Close the files:

00419      close($slsrc_infd);
00420      close($src_outfd);
00421  
00422      print $src_html_outfd "</tr></table></code>\n";
00423      print $src_html_outfd "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n";
00424      print $src_html_outfd "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n";
00425      print $src_html_outfd "</pre></small></body></html>\n";
00426      close($src_html_outfd);
Note the many newlines printed. This is so that clicking on a line number which is near the end of the source file will still position that line of source at the top of the screen.

Also, if the user accidentally started one or more named blocks but did not end them, print errors and force them ended:

00428      # if the source file started a block but reached eof without ending it, end it here.
00429      foreach (keys(%active_srcblocks)) {
00430          err("block named '$_' started but not ended");
00431          semlit_cmd("endblock$o_fs$_");  # end it for the user
00432      }
Do it by calling semlit_cmd() function, passing it the endblock command (as if the slsrc file had it).

Return html link to the src file

But before we return, restore the $cur_file_name and $cur_file_linenum variables to their previous state. Then return.

00434      # the semlit.srcfile command writes a link to the plaintext source file
00435      ($cur_file_name, $cur_file_linenum) = ($save_doc_filename, $save_doc_linenum);
00436      return "<a href=\"$plain_src_filename\">$plain_src_filename</a>";
Since the call to process_src_file() came from semlit_cmd() executing the srcfile command, and that command is used in the sldoc file, the thing we return is a link to the plaintext output src file.

Function: help()

First, let's declare a couple of globals that will be used for helping the user:

00026  my $tool = "semlit.pl";
00027  my $usage_str = "$tool [-h] [-d delim] [-f fs] [-I dir] [-t tabstop] [files]";
next ref  last ref
(The usage() function also uses this.) In the main code, the "-h"option calls help() to print a more-extensive help:
00471  sub help {
00472      my($err_str) = @_;
00473  
00474      if (defined $err_str) {
00475          print "$tool: $err_str\n\n";
00476      }
00477      print <<__EOF__;
00478  Usage: $usage_str
00479  Where:
00480      -h - print help screen
00481      -d delim - delimiter character at start and end of a semlit command.
00482              (default to '=')
00483      -f fs - field separator character within a semlit command.
00484              (default to ',')
00485      -i initialsource - file name for initial source frame.
00486              (default to "blank.htmo")  Also, initialsource semlit command.
00487      -I dir - directory to find files for 'srcfile' and 'include' commands.
00488              (default to ".")  The "-I dir" option can be repeated.
00489      -t tabstop - convert tabs to "tabstop" spaces.
00490              (default to '4')
00491      files - zero or more input files.  If omitted, inputs from stdin.
00492  
00493  __EOF__
00494  
00495      exit($exit_status);
00496  }  # help
Note the use of Perl's "here document" <<__EOF__ ... __EOF__ See Wikipedia and the Perl documentation if you are not familiar with this construct.

Function: file_get_contents()

This is a simple routine to read the contents of a file for the purpose of filling in the tooltips:

00459  sub file_get_contents{
00460      my ($text_file) = @_;
00461      open FILE, $text_file or die $!;
00462      flock FILE, 1 or die $!;        # wait for lock
00463      seek(FILE, 0, 0);       # move pointer to beginning
00464      my $slurp = do{local $/; <FILE>};
00465      flock FILE, 8;          # release the lock
00466      close(FILE);
00467  
00468      return $slurp;
00469  } # file_get_contents

Error handling

When there is an obvious user error in the invocation of the semlit program, the usage() function is called.

For example:

00055  GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
first ref  prev ref
Note the use of the logical OR construct (||). Because of Perl's C-like short-circuit evaluation, the right-hand expression (function call to usage()) is only executed if the left-hand side (function call to GetOptions()) returns false. I.e. usage is called if GetOptions() fails. Non-Perl programmers might be tempted to use a simple if/then construct, but the logical OR construct is such a common Perl idiom that the wise reader will learn it.

In other cases, an error is discovered in one of the input files, sldoc and/or slsrc. In those cases, the err() function is called.

Function: usage()

First, let's declare a couple of globals that will be used for helping the user:

00026  my $tool = "semlit.pl";
00027  my $usage_str = "$tool [-h] [-d delim] [-f fs] [-I dir] [-t tabstop] [files]";
first ref  prev ref
(The help() function also uses this.) The usage() function allows an optional error message to be passed in, which is printed before the usage string:
00448  sub usage {
00449      my($err_str) = @_;
00450  
00451      if (defined $err_str) {
00452          print STDERR "$tool: $err_str\n\n";
00453      }
00454      print STDERR "Usage: $usage_str\n\n";
00455      $exit_status ++;
00456      exit($exit_status);
00457  }  # usage

Function: err()

The function err() is a programmer convenience which prints an error message, along with the file name and line number where the error was discovered.

00440  sub err {
00441      my ($msg) = @_;
00442  
00443      print STDERR "Error [$cur_file_name:$cur_file_linenum], $msg\n";
00444      $exit_status ++;
00445  }  # err
It also increments $exit_status, which starts out at zero (success):
00044  my $exit_status = 0;  # assume success
and is used when exiting the program:
00130  # All done.
00131  exit($exit_status);
Thus, a non-zero (failure) exit status also indicates how many errors there were.

Program Explanation: semlit.sh

As mentioned above, the first line of semlit.pl is a fairly common shebang:

00004  #!/usr/local/bin/perl -w

This Unix construct, combined with setting the executable bit on the file, is intended to allow the tool to be run by simply typing the file name as a command (assuming that the PATH environment variable is set up right). However, it does require that the physical location of the Perl interpreter be encoded directly in the file. Unfortunately, different Unix systems install Perl in different places - sometimes under /usr/local, sometimes in /bin, sometimes under /opt.

I vaguely remember that there is a clever way to re-code that line such that the shell will search for the Perl interpreter in the PATH environment variable. But I don't remember how to do it.

One way to run semlit.pl without having to set the Perl interpreter's full location is:
    perl SemLitPath/semlit.pl ...

But even that is somewhat unsatisfying for an experienced Unix user. Lazy as we are, we would prefer to let PATH do the work of finding the program, so we just enter:
    semlit ...

One common way to accomplish this is with a wrapper shell script which encapsulates any annoying details of running the tool. Hence the semlit file.

The first line of semlit is the standard shebang:

00004  #!/bin/sh

But this time it references the universally-respected location for a Bourne-compatible shell. No chance that this won't work on some flavor of Unix.

Next save the initial working directory (so that we can get back to it).

00008  IWD=`pwd`                    # remember initial working directory

I establish the convention that the wrapper script must be in the same directory as the perl script. So, the next thing to do is figure out which directory contains the wrapper which is running:

00010  # Find dir where tool is stored (useful for finding related files)
00011  TOOLDIR=`dirname $0`
00012  # Make sure TOOLDIR is a full path name (not relative)
00013  cd $TOOLDIR; TOOLDIR=`pwd`; cd $IWD

This might look more complicated than it needs to be. Isn't the first line enough?

00011  TOOLDIR=`dirname $0`

No, it isn't. Suppose you are in your home directory, and you have placed the semlit files in $HOME/bin. Then let's say you execute the program by entering:
    bin/semlit ...

That is perfectly legal, and dirname $0 will, not surprisingly, return bin, a relative path. But I want a fully-qualified path, so I include the three commands:

00013  cd $TOOLDIR; TOOLDIR=`pwd`; cd $IWD

You simply cd to that potentially-relative location and use pwd to get the full path. Then cd back to the initial working directory. This is easier than trying to parse all the possible return values for dirname.

All that remains to be done is to run the perl interpreter as a comand (such that PATH is used):

00015  perl $TOOLDIR/semlit.pl $*