This is a semi-literate document for the program semlit.pl and it's shell wrapper semlit.sh. This program is used to create semi-literate documentation (of which this document is an example). Semi-literate documents like this are intended to explain the internals of a program, which is different than user documentation. If you are an end user, you probably want the user document.
Copyright 2012, 2015 Steven Ford http://geeky-boy.com and licensed "public domain" style under:
   
  
      To the extent possible under law, the contributors to this project
      have waived all copyright and related or neighboring rights to
      this work. This work is published from:  United States.  The project
      home is https://github.com/fordsfords/semlit/tree/gh-pages. 
      To contact me, Steve Ford, project owner, you can find my email
      address at http://geeky-boy.com. 
      Can't see it?  Keep looking.
This document describes the internals of the semlit.pl
      program, and is intended to be read by a programmer who wants to
      understand, maintain, and perhaps reuse the code.  Before
      reading this documentation, you are expected to have a good
      user-level understanding of SEMLIT.  Please be familiar user
        document with  before starting this, unless you are
      just getting a feel for what SEMLIT documentation is like.
    
The main program is written in Perl. The reader is assumed to
      have at least entry-level
        knowledge of Perl. Some of the more-advanced Perl constructs
      are explained for the benefit of the novice.
    
There are two program source files:
    
Here are the high-level steps performed by the main semlit
      program:
    
Let's declare some global variables to hold values for program
      options, and set their default values: 
00047  my $o_help;        # -h
00048  my $o_fs = ",";    # -f
00049  my $o_delim = "="; # -d
00050  my $o_initialsource = "blank.html";  # -i
00051  my @o_incdirs = (".");  # GetOptions will append additional dirs for each "-I".
00052  $tabstop = 4;  # defined and used by Text::Tabs - see "expand()" function
00053  
00054  GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
00055  if (defined($o_help)) {
00056      help();  # if -h had a value, it would be in $opt_h
00057  }
      Note the use of the Perl built-in function GetOptions. This function
      does the work of parsing the command-line and setting new values
      into the global option variables. See the Perl
        documentation if you are not familiar with the construct.
    
The program assumes that there is a single master sldoc
      file supplied on the command line. So, early on we get that file
      name: 
00059  if (scalar(@ARGV) != 1) {
00060      usage("Error, .sldoc file missing");
00061  }
00062  $main_doc_filename = $ARGV[0];
00063  if ( ! -r "$main_doc_filename" ) {
00064      usage("Error, could not read '$main_doc_filename'");
00065  }
 Note the use of the perl construct scalar(@array) to
      determine the number of elements in the array. There are shorter
      ways to do this (in terms of keystrokes), but I prefer the
      explicit construct. Makes it easier to spot. Also note the use of
      -r file to test if
      the file exists and is readable.
    
Once the file name is determined, we now process that input file:
      
00072  # Main loop; read each line in doc file
00073  
00074  my $doc_html_str = process_doc_file($main_doc_filename);
next ref  last ref
 Note that the function process_doc_file()
      returns the html output for the documentation.
    
An sldoc file might insert the same block of source code
      multiple times. The source html will link to the first doc
      reference to it. How does the reader find the other references? At
      the end of an inserted source block, links will be inserted to
      adjacent references within the doc.
    
However, the sldoc file is processed sequentially. When a
      reference (insert command) is seen, it is not yet known
      if there will be subsequent references. So, a fixup step is added
      after the sldoc file is fully processed. That fixup step
      looks at each source block that has multiple references and adds
      the appropriate links (next/prev).
    
Since this step needs a count of the number of references, we
      need code in the semlit_cmd()
      function to do the counting: 
00223              my $num_refs = 1;
00224              my $block_ref_name = $block_name;
00225              if (defined($block_numrefs{$block_name})) {
00226                  $num_refs = $block_numrefs{$block_name} + 1;
00227                  $block_ref_name = $block_name . "_ref_$num_refs";
00228              }
00229              $block_numrefs{$block_name} = $num_refs;
 If $block_numrefs{$block_name}
      already exists, then this is not the first insert for
      this block.
    
Back to main and the fix-up step: 
00076  # fix up multiple source references
00077  foreach my $blockname (keys(%block_numrefs)) {
00078      if ($block_numrefs{$blockname} > 1) {
00079          # First ref points to next and last
00080          my $refnum = 1;
00081          my $this_block = $blockname . "_ref_" . ($refnum);
00082          my $first_block = $this_block;
00083          my $last_block = $blockname . "_ref_" . $block_numrefs{$blockname};
00084          my $next_block = $blockname . "_ref_" . ($refnum + 1);
00085          $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$next_block">next ref<\/a>  <a href="#$last_block">last ref<\/a><\/pre>/s;
00086  
00087          # Middle refs point to previous and next
00088          my $prev_block = $this_block;
00089          for ($refnum = 2; $refnum <= $block_numrefs{$blockname} - 1; $refnum ++) {
00090              # middle refs point to prev and next
00091              $this_block = $blockname . "_ref_" . ($refnum);
00092              $next_block = $blockname . "_ref_" . ($refnum + 1);
00093              $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$next_block">next ref<\/a>  <a href="#$prev_block">prev ref<\/a><\/pre>/s;
00094              $prev_block = $this_block;
00095          }
00096  
00097          # last ref points to first and previous
00098          $this_block = $blockname . "_ref_" . ($refnum);
00099          $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$first_block">first ref<\/a>  <a href="#$prev_block">prev ref<\/a><\/pre>/s;
00100      }
00101  }
 The
      first reference has no "prev", so a link to the last reference is
      included. The last reference has no "next", so a link to the first
      reference is included. (The source html always links to
      the first reference.)
    
A bit earlier in the program, the output file was opened:
      
00067  # open main doc file
00068  
00069  $doc_html_filename = basename($main_doc_filename) . ".html";  # strip directory
00070  open($doc_html_outfd, ">", $doc_html_filename) || die "Error, could not open htmlfile '$doc_html_filename'";
 Then, after the input file is
      processed, the returned html output is written to the output file:
      
00103  # write doc html file
00104  
00105  print $doc_html_outfd "$doc_html_str\n";
00106  close($doc_html_outfd);
00107  
00108  # Create frameset page
00109  
00110  my $index_o_file;
00111  open($index_o_file, ">", "index.html") || die "Error, could not open htmlfile 'index.html'";
00112  print $index_o_file <<__EOF__;
00113  <html><head></head>
00114  <frameset cols="50%,*">
00115  <frame src="$doc_html_filename" name="doc">
00116  <frame src="$o_initialsource" name="src">
00117  </frameset>
00118  </html>
00119  __EOF__
00120  close($index_o_file);
00121  
00122  # Create blank page for initial source frame
00123  
00124  my $blank_o_file;
00125  open($blank_o_file, ">", "blank.html") || die "Error, could not open htmlfile 'blank.html'";
00126  print $blank_o_file "<html><head></head><body>Click a source line number to see the line in context.</body></html>\n";
00127  close($blank_o_file);
00128  
00129  # All done.
00130  exit($exit_status);
 The variable $exit_status is initialized to zero in main
      and is incremented each time an error is reported. Thus, if no
      errors are reported, the program exits with success (0).
    
There are a couple of misc support files needed by the html
      documentation. Since the documentation is best viewed with html
      frames, the first file is the frameset: 
00108  # Create frameset page
00109  
00110  my $index_o_file;
00111  open($index_o_file, ">", "index.html") || die "Error, could not open htmlfile 'index.html'";
00112  print $index_o_file <<__EOF__;
00113  <html><head></head>
00114  <frameset cols="50%,*">
00115  <frame src="$doc_html_filename" name="doc">
00116  <frame src="$o_initialsource" name="src">
00117  </frameset>
00118  </html>
00119  __EOF__
00120  close($index_o_file);
      Note the use of Perl's "here document" <<__EOF__ ... __EOF__ See Wikipedia
      and the Perl
        documentation if you are not familiar with this construct.
      Contrast this with the next code fragment.
    
By default, when the documentation is first brought up, there is no
      source to display in the source frame. So we need an almost blank page:
      
00122  # Create blank page for initial source frame
00123  
00124  my $blank_o_file;
00125  open($blank_o_file, ">", "blank.html") || die "Error, could not open htmlfile 'blank.html'";
00126  print $blank_o_file "<html><head></head><body>Click a source line number to see the line in context.</body></html>\n";
00127  close($blank_o_file);
 I didn't use a "here" document for this, but
      I could have. 
That initial blank page can be overridden with the "-i" command-line option, or with the "initialsource" semlit command.
The process_doc_file()
      function is called from the main program: 
00072  # Main loop; read each line in doc file
00073  
00074  my $doc_html_str = process_doc_file($main_doc_filename);
first ref  prev ref
      It is also called (recursively) from the semlit command include.
    
The function starts out with: 
00136  sub process_doc_file {
00137      my ($doc_filename) = @_;
      Since this function can be called recursively, we save any
      existing file name and line number (they are restored before
      return).
    
The main function is to open an sldoc file, read the file
      and process the semlit commands, and return the documentation html
      output. Note that the sldoc input file is already in html
      form; once the file is read, all that remains to be done is to
      process the semlit commands. In the degenerate simple case of an sldoc
      file containing no
      semlit commands at all, the output html file will be
      exactly equal to the sldoc input file. (An unlikely case
      since it is the semlit commands which provide the value of
      semi-literate documentation.)
    
Here are the high-level steps it performs:
    
In an earlier version, the program read the sldoc input
      file one line at a time. The line was tested to determine if it
      was an semlit command. If so, the line was executed and the line
      dropped. This approach worked fine when I used a simple text
      editor to create the html sldoc file. But editing html
      with a simple text editor is painful, error-prone, and
      inefficient. I really wanted to use an html editor. But html
      editors generally do not make it easy to control how the content
      is arranged on lines of the physical html file. I discovered that
      multiple semlit commands can be packed onto a single line, and
      even split across lines.
    
Fortunately, Perl provides powerful constructs which make this
      kind of processing easy. Instead of looking at the file
      line-by-line, the entire file can be read into a single string
      variable, and regular expression matching can be used to find the
      commands, one at a time. You will see this below.
    
The semlit program supports having libraries of sldoc
      files for standard boilerplate. I chose to model this after the C
      compiler: the programmer simply specifies the name of the file,
      and one or more search directories can be specified on the command
      line with the -I
      option. These directories are set up in the main program thus:
      
00051  my @o_incdirs = (".");  # GetOptions will append additional dirs for each "-I".
 ... 
00054  GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
next ref  last ref
      The GetOptions()
      function will parse out as many -I options that the user supplies and leaves
      the @incdirs array
      set with the directories (including ".").
    
Back in the process_doc_file() function, the file is opened as
      follows: 
00140      # open source file, using one or more search directories
00141  
00142      my $incdir;
00143      my $open_success = 0;
00144      foreach $incdir (@o_incdirs) {
00145          if (open($doc_infd, "<", "$incdir/$doc_filename")) {
00146              $open_success = 1;
00147              last;  # break out of foreach
00148          }
00149      }
00150      if (! $open_success) {
00151          err("could not open doc file '$doc_filename', skipping");
00152          return;
00153      }
 It loops through the array
      of include directories until the file can be opened. Note that
      almost the same code exists in process_src_file().
    
Perl makes this easy: 
00155      # Read entire file into memory
00156  
00157      my @doctexts = <$doc_infd>;
00158      close($doc_infd);
00159      chomp(@doctexts);  # remove line delims from every line
00160      my $num_lines = scalar(@doctexts);  # count lines in file
00161      my $doctext = join("\n", @doctexts) . "\n";  # combine as a single string
00162      $doctext =~ s/\r//gs;  # remove carriage returns, if any
 The chomp()
      function removes line delimiters, which, depending on platform,
      might be something other than linefeeds, and the join()
      combines the lines into one long string, and inserts linefeeds as
      line endings. Then, carriage returns, if any, are removed. This
      unifies different platforms. (Similar
        code can be found in process_src_file().)
    
The basic algorithm is to find the first semlit command in the file, execute it,
      and then replace that semlit command with the results of the
      command. Note that it is possible that the replacement text
      contains semlit commands. So, each time a semlit command is
      processed, the entire file must be re-scanned from the beginning.
      Instead of trying to process the file line-by-line, the loop
      executes until there are no more commands to execute:
      
00167      # process semlit commands
00168      while ($doctext =~ /$o_delim\s*semlit\s*$o_fs\s*([^$o_delim]+)$o_delim/is) {
00169          my $cmd = $1;  # text of command (minus standard stuff)
 The variable $o_delim contains the
      semlit command delimiter and defaults to "=". The variable $o_fs contains the semlit
      command field separator and defaults to ",".
    
Note that the delimiter will certainly be used in the code
      without being associated with a semlit command, so the mere
      presence of the delimiter does not flag the start of a semlit
      command. The word "semlit"
      must immediately follow the delimiter ("=semlit"), and that must be followed by a
      field separator (",").
      Also note that the match pattern captures the text between that
      field separator and the ending delimiter. That represents the
      semlit command keyword and parameters.
    
The next lines captures the file contents before and after the
      semlit command: 
00170          my $prefix = $PREMATCH;  # text preceiding the command
00171          my $suffix = $POSTMATCH;  # text after the command
 Note the use of the $PREMATCH and $POSTMATCH built-in
      variables. See the Perl
        documentation if you are not familiar with this construct.
    
If errors are detected in the source files, it is helpful to
      print error messages which include the line number. This is
      challenging in this function because we are not processing on a
      line-by-line basis. It is further complicated by the fact that as
      commands are replaced by the returned text form command execution,
      the number of lines in the $doctext
      changes over time. However, since the commands are processed in
      order, we can calculate the line number by looking at the number
      of lines following the
      command, and subtracting it from the total number:
      
00173          # calculate line number containing the start of this semlit command
00174          $cur_file_linenum = $num_lines - scalar(my @t = split("\n", $suffix)) + 1;
 As you know from the Perl
        documentation, the split()
      function will by default not create an empty entry at the end
      (after the final '\n').
    
Now execute the command and capture the replacement text:
      
00176          my $repl = semlit_cmd($cmd);
 The semlit_cmd()
      function returns formatted html.
    
Replacing the matched semlit command with the resulting text is
      straight-forward: 
00178          # Commands are removed, and often replaced with some result
00179          $doctext = $prefix . $repl . $suffix;
00180      }  # while
 This is where the
      number of lines in $doctext
      can change.
    
Finally: 
00184      return $doctext;
00185  }  # process_doc_file
    
For the most part, Perl takes care of allowing functions to be
      recursive. However, we do have some global variables for error
      reporting which adds an extra challange:
      
00029  my $cur_file_name = "";
00030  my $cur_file_linenum = 0;
 By keeping the file name and line
      number as global variables, any function can report a
      user-friendly error message with a minimum of fuss.
    
Once the initial steps are completed, the process_doc_file()
      function is ready to start handling commands. But before it does,
      we want to save those global variables: 
00164      my ($save_doc_filename, $save_doc_linenum) = ($cur_file_name, $cur_file_linenum);
00165      ($cur_file_name, $cur_file_linenum) = ($doc_filename, 0);
      Then the command loop executes. When that is done, just before
      returning, we want to restore the global variables:
      
00182      ($cur_file_name, $cur_file_linenum) = ($save_doc_filename, $save_doc_linenum);
 You will see very similar code in the
      process_src_file()
      function.
    
The process_doc_file() and process_src_file()
      functions read input files, find semlit commands, and call semlit_cmd() to execute
      them. The delimiters, "semlit",
      and the first field separator are stripped, so that the passed-in
      command string starts with the command name.
    
The tabstop command
      simply updates the $tabstop
      global variable: 
00192      # semlit tabstop - doc: source tab expansion
00193      if ($cmd =~ /^tabstop\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00194          if ($1 =~ /^\d+$/) {
00195              $tabstop = $1;  # used by Text::Tabs
00196              return "";
00197          } else {
00198              err("Tabstop value '$1' must be numeric");
00199              return "";
00200          }
00201      }
 The $tabstop variable is used
      directly by the built-in Text::Tabs
      Perl module: 
00020  use Text::Tabs;
 Note that the $tabstop variable can
      also be set on the command line: 
00054  GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
next ref  prev ref
      So, where does this actually get used? Right here:
      
00373              $iline = expand($iline);  # expand tabs according to $tabstop.
 (In the function process_src_file().)
      The expand()
      function is part of the Text::Tabs
      module.
    
The tabstop command
      does not return any text (returns "" - empty string).
    
The srcfile command
      is used to scan an slsrc file: 
00203      # semlit srcfile - doc: read and process source file
00204      elsif ($cmd =~ /^srcfile\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00205          return process_src_file($1, $2);
00206      }
      During the scan, any semlit commands found in the slsrc
      file are processed (usually block/endblock commands), and
      the source output files are created (both html and src).
    
The returned text is a link to the plaintext output src file. The author of the sldoc file expects this and positions the srcfile command such that a link to the src file makes sense. For example, near the top of this sldoc file, the srcfile commands are arranged in a bullet list. Each srcfile command is followed by the hint "(right-click and save)" followed by a short description of the file.
The initialsource command
      is used to specify a file for initial display in the source frame:
00208      # semlit initialsource - doc: set initial source frame
00209      elsif ($cmd =~ /^initialsource\s*$o_fs\s*([^\s$o_fs]+)\s*/i) {
00210          $o_initialsource = $1;
00211          return "";
00212      }
      The file must be the final file name, ending with ".html".
    
There is no returned text.
The include command
      is used to scan an sldoc file: 
00214      # semlit include - doc: read and process doc file
00215      elsif ($cmd =~ /^include\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00216          return process_doc_file($1);
00217      }
      Note that this represents a recursive call to process_doc_file().
      During the scan, the included file is processed the same way as
      the master sldoc file.
    
The returned text is simply the processed html of the included
      file.
    
The insert command is used to insert into the document
      output html file a named block of source lines (from slsrc
      files): 
00219      # semlit insert - doc: insert a source block
00220      elsif ($cmd =~ /^insert\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00221          my $block_name = $1;
00222          if (exists($srcblocks{$block_name})) {
00223              my $num_refs = 1;
00224              my $block_ref_name = $block_name;
00225              if (defined($block_numrefs{$block_name})) {
00226                  $num_refs = $block_numrefs{$block_name} + 1;
00227                  $block_ref_name = $block_name . "_ref_$num_refs";
00228              }
00229              $block_numrefs{$block_name} = $num_refs;
00230  
00231              my $block_str = $srcblocks{$block_name};
00232              return <<__EOF__;
00233  <a name="$block_ref_name" id="$block_ref_name"><\/a>
00234  <small><pre>
00235  $block_str
00236  <\/pre><!-- endblock $block_ref_name --></small>\n
00237  __EOF__
00238          } else {
00239              err("attempt to insert block named '$block_name' but block not defined");
00240              return "";
00241          }
00242      }
 Note the use of Perl's "here
      document" <<__EOF__
      ... __EOF__ See Wikipedia
      and the Perl
        documentation if you are not familiar with this construct.
      Also note that the source blocks are
        stored in the %srcblocks
      hash by the process_src_file() function.
    
The endblock html comment becomes important during the final fix up step; for blocks inserted
      multiple times, the comment is replaced with links to other
      references.
    
The returned text is the processed html of the source block.
    
The block command essentially tells the process_src_file() function
      to start storing source lines into a named block:
      
00244      # semlit block - src: start a named block of source
00245      elsif ($cmd =~ /^block\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00246          my $block_name = $1;
00247          if (defined($srcblocks{$block_name})) {
00248              err("block '$block_name' already defined");
00249              return "";
00250          }
00251          $srcblocks{$block_name} = "";
00252          $block_numrefs{$block_name} = 0;
00253          $active_srcblocks{$block_name} = $cur_file_linenum;
00254          
00255          $global_src_buffer = "<span name=\"$block_name\" id=\"$block_name\"><\/span>";
00256          return "";
00257      }
 Since it is possible for source lines to be
      included in multiple named blocks, the global hash %active_src_blocks
      is used to indicate which named blocks are accumulating.
    
The endblock command essentially tells the process_src_file() function
      to stop storing source lines into the named block:
      
00259      # semlit endblock - src: end a named block of source
00260      elsif ($cmd =~ /^endblock\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00261          my $block_name = $1;
00262          if (exists($active_srcblocks{$block_name})) {
00263              delete($active_srcblocks{$block_name});
00264              $srcblocks{$block_name} =~ s/\n$//s;
00265              return "";
00266          } else {
00267              err("found endblock for '$block_name', which is not active");
00268              return "";
00269          }
00270      }
00271  
00272      # semlit tooltip - create hover over text for a phrase
00273      elsif ($cmd =~ /^tooltip\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00274          my $text_source = $1;
00275          my $text_link = $2;
00276          my $contents = file_get_contents($text_source);
00277          return <<__EOF__;
00278  <a href="#" title="$contents" style="color:2222ee;border-bottom:1px dotted #2222ee;text-decoration: none;">$text_link</a>
00279  __EOF__
00280      }
00281  
 The block name needs to be supplied
      because a nested block does not need to be fully-contained by the
      outer block. I.e. if multiple blocks are active, the endblock
      does not necessarily end the most-recently opened block.
    
The tooltip command creates a keyword in block of text
      that can be hovered over for additional information.
      
00272      # semlit tooltip - create hover over text for a phrase
00273      elsif ($cmd =~ /^tooltip\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00274          my $text_source = $1;
00275          my $text_link = $2;
00276          my $contents = file_get_contents($text_source);
00277          return <<__EOF__;
00278  <a href="#" title="$contents" style="color:2222ee;border-bottom:1px dotted #2222ee;text-decoration: none;">$text_link</a>
00279  __EOF__
00280      }
 The tooltip data is loaded from a 
      file on the local server, and the keyword is supplied as the last
      parameter. Note: the keyword cannot contain any spaces. 
    
The process_src_file() function is called from the srcfile command. It reads
      and processes an slsrc file (program source code). Part of
      the processing is to generate two output files: a source html
      file, and a src file, stripped of its semlit commands and
      ready for compilation.
    
This function has conceptual similarities with the process_doc_file() function,
      but there are important differences. The biggest difference is the
      overall approach to processing the file. Here, we process the slsrc
      file line-at-a-time. A line must contain either program source
      code, or a single semlit command (although that command might be
      enclosed in comment delimiters). Instead of replacing the command
      with returned content, the line containing the command is simply
      discarded.
    
Here are the high-level steps performed:
    
Almost the same code exists in process_doc_file(). Here we
      are opening an slsrc file which might be in any of the
      directories in the array @o_incdirs:
      
00299      # open source file, using one or more search directories
00300      my $incdir;
00301      my $open_success = 0;
00302      foreach $incdir (@o_incdirs) {
00303          if (open($slsrc_infd, "<", "$incdir/$src_filename")) {
00304              $open_success = 1;
00305              last;  # break out of foreach
00306          }
00307      }
00308      if (! $open_success) {
00309          err("could not open src file '$src_filename', skipping");
00310          return "";
00311      }
    
The output source html file is opened and some initial
      content is written: 
00313      # create and write initial content to html-ified source file
00314      if (! open($src_html_outfd, ">", "$src_filename.html")) {
00315          err("could not open output source html file '$src_filename.html', skipping");
00316          close($slsrc_infd);
00317          return "";
00318      }
00319      print $src_html_outfd <<__EOF__;
00320  <!DOCTYPE html><html><head><title>$plain_src_filename</title>
00321  <link rel="stylesheet" href="//code.jquery.com/ui/1.11.4/themes/smoothness/jquery-ui.css">
00322  <script src="//code.jquery.com/jquery-1.10.2.js"></script>
00323  <script src="//code.jquery.com/ui/1.11.4/jquery-ui.js"></script>
00324  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.5/styles/default.min.css">
00325  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.5/highlight.min.js"></script>
00326  <script>
00327    \$(function() {
00328      \$( document ).tooltip();
00329    });
00330  </script>
00331  <style>
00332  #code {background-color:#ffffff;};
00333  </style>
00334  </head>
00335  <body><h1>$plain_src_filename</h1>
00336  <script>hljs.initHighlightingOnLoad();</script>
00337  <small><pre><code id="code"><table border=0 cellpadding=0 cellspacing=0><tr>
00338  __EOF__
 In addition to
      the initial html source, it also contains a link to the plain text
      src file which can be downloaded.
    
The output src file is opened: 
00340      # Create plaintext source file (without semlit commands)
00341      if (! open($src_outfd, ">", "$plain_src_filename")) {
00342          err("could not open output src '$plain_src_filename', skipping");
00343          close($slsrc_infd);
00344          close($src_html_outfd);
00345          return "";
00346      }
    
Read the slsrc file, line-by-line:
      
00355      my $iline;
00356      while (defined($iline = <$slsrc_infd>)) {
00357          chomp($iline);  # remove line delim
00358          $iline .= "\n";  # add newline
00359          $iline =~ s/\r//gs;  # remove carriage returns, if any
00360          $cur_file_linenum ++;
 The chomp() function removes
      line delimiter, which, depending on platform, might be something
      other than a linefeeds, and the next line adds a newline to be the
      line delimiter. Then, carriage returns, if any, are removed. This
      unifies different platforms. (Similar
        code can be found in process_doc_file().)
The slsrc file contains semlit commands, usually block and endblock. Find and process
      them: 
00362          # check for semlit commands
00363          if ($iline =~ /$o_delim\s*semlit\s*$o_fs\s*([^$o_delim]+)$o_delim/i) {
00364              semlit_cmd($1);
00365              # discard command line
00366          }
    
If the slsrc line does not contain a semlit command, then
      it is normal source code. 
00367          else {
00368              $src_linenum ++;  # don't count semlit command lines
 Note that
      source code lines are counted separately than input slsrc
      file lines ($cur_file_linenum counted above). The input
      line count ($cur_file_linenum) includes semlit command
      lines and is used when printing error messages by err(), while the source code
      line count ($src_linenum) does not include semlit command
      lines and is used as the user-visible line number in the output
      source html file.
    
Write the plaintext source line to the src file:
      
00370              print $src_outfd $iline;
    
In advance of writing the line to the output source html
      file, expand tabs and convert the special characters '&',
      '<' and '>' to their html forms:
      
00372              # fix up source for html rendering (tab expansion, special char encoding)
00373              $iline = expand($iline);  # expand tabs according to $tabstop.
00374              $iline =~ s/\&/\&/g;  $iline =~ s/</\</g;  $iline =~ s/>/\>/g;
 Note that the expand()
      function is part of the Text::Tabs Perl module and uses
      the $tabstop global variable to control how it expands.
    
Check to see if this source line is inside one or more block/endblock
      constructs: 
00376              # if we are in at least one block, link the source to the earliest block's first doc reference
00377              if (scalar(keys(%active_srcblocks)) > 0) {
    
There is at least one named block active. Assuming there might be
      more than one, find the active block which was most-recently
      opened (is at the highest-numbered input line):
      
00378                  # descending sort so that elemet 0 is largest
00379                  my @active_blocks = sort { $active_srcblocks{$b} cmp $active_srcblocks{$a} } keys(%active_srcblocks);
 This sort construct orders the keys
      by descending content of the %active_srcblocks
      hash. Thus, $active_blocks[0] is the name of that
      most-recently opened block. This is used to construct the link
      back to the doc: 
00380                  my $targ = $active_blocks[0] . "_ref_1";
00381                  $src_lines_td .= sprintf("<a href=\"$doc_html_filename#$targ\" target=\"doc\">%05d<\/a>\n", $src_linenum);
00382                  if ($global_src_buffer) {
00383                      $src_content_td .= sprintf("%s  %s", $global_src_buffer, $iline);
00384                      $global_src_buffer = "";
00385                  }
00386                  else {
00387                      $src_content_td .= sprintf("  %s", $iline);
00388                  }
    
For each active named block/endblock
      construct, add the source code to the %srcblocks hash:
      
00390                  # for each open source block on this line of source, link the doc block to the that source block
00391                  foreach my $block_name (keys(%active_srcblocks)) {
00392                      my $a = sprintf("<a href=\"$cur_file_name.html#$block_name\" target=\"src\">%05d<\/a>  %s", $src_linenum, $iline);
00393                      $srcblocks{$block_name} .= $a;
00394                  }
 This is used by the insert semlit command to
      insert the source block into the output doc html file.
    
For source lines which are not contained in a block/endblock
      construct, no doc link is needed when writing to the output source
      html file: 
00395              } else {
00396                  # no active blocks
00397                  my $a = sprintf("%05d\n", $src_linenum);
00398                  my $c = sprintf("  %s", $iline);
00399                  $src_lines_td .= $a;
00400                  $src_content_td .= $c;
00401              }
    
Close the files: 
00418      close($slsrc_infd);
00419      close($src_outfd);
00420  
00421      print $src_html_outfd "</tr></table></code>\n";
00422      print $src_html_outfd "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n";
00423      print $src_html_outfd "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n";
00424      print $src_html_outfd "</pre></small></body></html>\n";
00425      close($src_html_outfd);
 Note the many newlines
      printed. This is so that clicking on a line number which is near the
      end of the source file will still position that line of source at the
      top of the screen.
Also, if the user
      accidentally started one or more named blocks but did not end
      them, print errors and force them ended:
      
00427      # if the source file started a block but reached eof without ending it, end it here.
00428      foreach (keys(%active_srcblocks)) {
00429          err("block named '$_' started but not ended");
00430          semlit_cmd("endblock$o_fs$_");  # end it for the user
00431      }
 Do it by calling semlit_cmd() function,
      passing it the endblock
      command (as if the slsrc file had it).
    
But before we return, restore the $cur_file_name and $cur_file_linenum
      variables to their previous state. Then return.
      
00433      # the semlit.srcfile command writes a link to the plaintext source file
00434      ($cur_file_name, $cur_file_linenum) = ($save_doc_filename, $save_doc_linenum);
00435      return "<a href=\"$plain_src_filename\">$plain_src_filename</a>";
 Since the call to process_src_file()
      came from semlit_cmd() executing the srcfile
      command, and that command is used in the sldoc file, the
      thing we return is a link to the plaintext output src
      file.
    
First, let's declare a couple of globals that will be used for
      helping the user: 
00025  my $tool = "semlit.pl";
00026  my $usage_str = "$tool [-h] [-d delim] [-f fs] [-I dir] [-t tabstop] [files]";
next ref  last ref
 (The usage() function also uses
      this.) In the main code, the "-h"option calls help()
      to print a more-extensive help: 
00470  sub help {
00471      my($err_str) = @_;
00472  
00473      if (defined $err_str) {
00474          print "$tool: $err_str\n\n";
00475      }
00476      print <<__EOF__;
00477  Usage: $usage_str
00478  Where:
00479      -h - print help screen
00480      -d delim - delimiter character at start and end of a semlit command.
00481              (default to '=')
00482      -f fs - field separator character within a semlit command.
00483              (default to ',')
00484      -i initialsource - file name for initial source frame.
00485              (default to "blank.htmo")  Also, initialsource semlit command.
00486      -I dir - directory to find files for 'srcfile' and 'include' commands.
00487              (default to ".")  The "-I dir" option can be repeated.
00488      -t tabstop - convert tabs to "tabstop" spaces.
00489              (default to '4')
00490      files - zero or more input files.  If omitted, inputs from stdin.
00491  
00492  __EOF__
00493  
00494      exit($exit_status);
00495  }  # help
 Note the use
      of Perl's "here document" <<__EOF__
      ... __EOF__ See Wikipedia
      and the Perl
        documentation if you are not familiar with this construct. 
This is a simple routine to read the contents of a file for the purpose
      of filling in the tooltips: 
00458  sub file_get_contents{
00459      my ($text_file) = @_;
00460      open FILE, $text_file or die $!;
00461      flock FILE, 1 or die $!;        # wait for lock
00462      seek(FILE, 0, 0);       # move pointer to beginning
00463      my $slurp = do{local $/; <FILE>};
00464      flock FILE, 8;          # release the lock
00465      close(FILE);
00466  
00467      return $slurp;
00468  } # file_get_contents
 
When there is an obvious user error in the invocation of the
      semlit program, the usage() function is
      called.
    
For example: 
00054  GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
first ref  prev ref
 Note the use of the
      logical OR construct (||).
      Because of Perl's C-like short-circuit
      evaluation, the right-hand expression (function call to usage()) is only executed
      if the left-hand side (function call to GetOptions()) returns false. I.e. usage is
      called if GetOptions()
      fails. Non-Perl programmers might be tempted to use a simple if/then construct, but
      the logical OR construct is such a common Perl idiom that the wise
      reader will learn it.
    
In other cases, an error is discovered in one of the input files,
      sldoc and/or slsrc. In those cases, the err() function is called.
    
First, let's declare a couple of globals that will be used for
      helping the user: 
00025  my $tool = "semlit.pl";
00026  my $usage_str = "$tool [-h] [-d delim] [-f fs] [-I dir] [-t tabstop] [files]";
first ref  prev ref
 (The help() function also uses
      this.) The usage() function allows an optional error
      message to be passed in, which is printed before the usage string:
      
00447  sub usage {
00448      my($err_str) = @_;
00449  
00450      if (defined $err_str) {
00451          print STDERR "$tool: $err_str\n\n";
00452      }
00453      print STDERR "Usage: $usage_str\n\n";
00454      $exit_status ++;
00455      exit($exit_status);
00456  }  # usage
    
The function err() is a programmer convenience which prints an
      error message, along with the file name and line number where the
      error was discovered. 
00439  sub err {
00440      my ($msg) = @_;
00441  
00442      print STDERR "Error [$cur_file_name:$cur_file_linenum], $msg\n";
00443      $exit_status ++;
00444  }  # err
 It also increments
      $exit_status, which starts out at zero (success):
      
00043  my $exit_status = 0;  # assume success
 and is used when exiting the program:
      
00129  # All done.
00130  exit($exit_status);
 Thus, a non-zero (failure) exit status also
      indicates how many errors there were.
    
As mentioned above, the first line of semlit.pl is a
      fairly common shebang:
      
00004  #!/usr/local/bin/perl -w
 
This Unix construct, combined with setting the executable bit on
      the file, is intended to allow the tool to be run by simply typing
      the file name as a command (assuming that the PATH environment variable
      is set up right). However, it does require that the physical
      location of the Perl interpreter be encoded directly in the file.
      Unfortunately, different Unix systems install Perl in different
      places - sometimes under /usr/local, sometimes in /bin,
      sometimes under /opt. 
    
I vaguely remember that there is a clever way to re-code that
      line such that the shell will search for the Perl interpreter in
      the PATH environment variable. But I don't remember how
      to do it. 
    
One way to run semlit.pl without having to set the Perl
      interpreter's full location is:
          perl SemLitPath/semlit.pl ... 
    
But even that is somewhat unsatisfying for an experienced Unix
      user. Lazy as we are, we would prefer to let PATH do the work of
      finding the program, so we just enter:
          semlit
      ...
One common way to accomplish this is with a wrapper shell script
      which encapsulates any annoying details of running the tool. Hence
      the semlit file. 
    
The first line of semlit is the standard shebang:
      
00004  #!/bin/sh
 
But this time it references the universally-respected location
      for a Bourne-compatible
        shell. No chance that this won't work on some flavor of
      Unix. 
    
Next save the initial working directory (so that we can get back
      to it). 
00008  IWD=`pwd`                    # remember initial working directory
 
I establish the convention that the wrapper script must
      be in the same directory as the perl script. So, the next thing to
      do is figure out which directory contains the wrapper which is
      running: 
00010  # Find dir where tool is stored (useful for finding related files)
00011  TOOLDIR=`dirname $0`
00012  # Make sure TOOLDIR is a full path name (not relative)
00013  cd $TOOLDIR; TOOLDIR=`pwd`; cd $IWD
 
This might look more complicated than it needs to be. Isn't the
      first line enough? 
00011  TOOLDIR=`dirname $0`
 
No, it isn't. Suppose you are in your home directory, and you
      have placed the semlit files in $HOME/bin. Then let's
      say you execute the program by entering:
          bin/semlit
      ...
That is perfectly legal, and dirname $0 will, not
      surprisingly, return bin, a relative path. But I want a
      fully-qualified path, so I include the three commands:
      
00013  cd $TOOLDIR; TOOLDIR=`pwd`; cd $IWD
 
You simply cd to that potentially-relative location and use pwd to get the full path. Then cd back to the initial working directory. This is easier than trying to parse all the possible return values for dirname.
All that remains to be done is to run the perl interpreter as a
      comand (such that PATH is used):
      
00015  perl $TOOLDIR/semlit.pl $*