semlit Program Description

This is a semi-literate document for the program semlit.pl and it's shell wrapper semlit.sh.  This program is used to create semi-literate documentation (of which this document is an example).  Semi-literate documents like this are intended to explain the internals of a program, which is different than user documentation.  If you are an end user, you probably want the user document.

Copyright 2012, 2015 Steven Ford http://geeky-boy.com and licensed "public domain" style under:

CC0
To the extent possible under law, the contributors to this project have waived all copyright and related or neighboring rights to this work. This work is published from: United States.  The project home is https://github.com/fordsfords/semlit/tree/gh-pages.  To contact me, Steve Ford, project owner, you can find my email address at http://geeky-boy.com.  Can't see it?  Keep looking.



  1. semlit Program Description
    1. Introduction
    2. Program explanation: semlit.pl
      1. Main
        1. Process command line
        2. Read and process the input sldoc file
        3. Fix up multiple source block references
        4. Write out the main documentation html file and exit
        5. Create a few html output files
      2. Function: process_doc_file()
        1. Open sldoc input file
        2. Read the file into memory
        3. For each semlit command contained in that file
        4. Execute the command
        5. Replace the command with the html returned by the command execution and loop
        6. return the processed html
        7. Recursion
      3. Function: semlit_cmd()
        1. Command: tabstop
        2. Command: srcfile
        3. Command: initialsource
        4. Command: include
        5. Command: insert
        6. Command: block
        7. Command: endblock
        8. Command: tooltip
      4. Function: process_src_file()
        1. Open the slsrc input file
        2. Open the output source html file
        3. Open the output src file
        4. For each input line in the slsrc file:
        5. If it is a semlit command, process it
        6. Else it is a source line:
        7. Write the line to the src file
        8. Html-ify the line
        9. If there are active source blocks accumulating:
        10. Create doc link and write source html file
        11. Add the source line to all of the active blocks
        12. Else no active block, write source html file without link
        13. Close files and wrap up
        14. Return html link to the src file
      5. Function: help()
      6. Function: file_get_contents()
      7. Error handling
        1. Function: usage()
        2. Function: err()
    3. Wrapper Shell Script

Introduction

This document describes the internals of the semlit.pl program, and is intended to be read by a programmer who wants to understand, maintain, and perhaps reuse the code.  Before reading this documentation, you are expected to have a good user-level understanding of SEMLIT.  Please be familiar user document with  before starting this, unless you are just getting a feel for what SEMLIT documentation is like.

The main program is written in Perl. The reader is assumed to have at least entry-level knowledge of Perl. Some of the more-advanced Perl constructs are explained for the benefit of the novice.

There are two program source files:

Program explanation: semlit.pl


Main

Here are the high-level steps performed by the main semlit program:

  1. Process command line.
  2. Read and process the input sldoc file. This has the side effect of reading the slsrc files and generating the src files.
  3. Fix up multiple source block references.
  4. Write out the main documentation html file and exit.
  5. Create a few html output files.

Process command line

Let's declare some global variables to hold values for program options, and set their default values:

00047  my $o_help;        # -h
00048  my $o_fs = ",";    # -f
00049  my $o_delim = "="; # -d
00050  my $o_initialsource = "blank.html";  # -i
00051  my @o_incdirs = (".");  # GetOptions will append additional dirs for each "-I".
00052  $tabstop = 4;  # defined and used by Text::Tabs - see "expand()" function
00053  
00054  GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
00055  if (defined($o_help)) {
00056      help();  # if -h had a value, it would be in $opt_h
00057  }
Note the use of the Perl built-in function GetOptions. This function does the work of parsing the command-line and setting new values into the global option variables. See the Perl documentation if you are not familiar with the construct.

Read and process the input sldoc file

The program assumes that there is a single master sldoc file supplied on the command line. So, early on we get that file name:

00059  if (scalar(@ARGV) != 1) {
00060      usage("Error, .sldoc file missing");
00061  }
00062  $main_doc_filename = $ARGV[0];
00063  if ( ! -r "$main_doc_filename" ) {
00064      usage("Error, could not read '$main_doc_filename'");
00065  }
Note the use of the perl construct scalar(@array) to determine the number of elements in the array. There are shorter ways to do this (in terms of keystrokes), but I prefer the explicit construct. Makes it easier to spot. Also note the use of -r file to test if the file exists and is readable.

Once the file name is determined, we now process that input file:

00072  # Main loop; read each line in doc file
00073  
00074  my $doc_html_str = process_doc_file($main_doc_filename);
next ref  last ref
Note that the function process_doc_file() returns the html output for the documentation.

Fix up multiple source block references

An sldoc file might insert the same block of source code multiple times. The source html will link to the first doc reference to it. How does the reader find the other references? At the end of an inserted source block, links will be inserted to adjacent references within the doc.

However, the sldoc file is processed sequentially. When a reference (insert command) is seen, it is not yet known if there will be subsequent references. So, a fixup step is added after the sldoc file is fully processed. That fixup step looks at each source block that has multiple references and adds the appropriate links (next/prev).

Since this step needs a count of the number of references, we need code in the semlit_cmd() function to do the counting:

00223              my $num_refs = 1;
00224              my $block_ref_name = $block_name;
00225              if (defined($block_numrefs{$block_name})) {
00226                  $num_refs = $block_numrefs{$block_name} + 1;
00227                  $block_ref_name = $block_name . "_ref_$num_refs";
00228              }
00229              $block_numrefs{$block_name} = $num_refs;
If $block_numrefs{$block_name} already exists, then this is not the first insert for this block.

Back to main and the fix-up step:

00076  # fix up multiple source references
00077  foreach my $blockname (keys(%block_numrefs)) {
00078      if ($block_numrefs{$blockname} > 1) {
00079          # First ref points to next and last
00080          my $refnum = 1;
00081          my $this_block = $blockname . "_ref_" . ($refnum);
00082          my $first_block = $this_block;
00083          my $last_block = $blockname . "_ref_" . $block_numrefs{$blockname};
00084          my $next_block = $blockname . "_ref_" . ($refnum + 1);
00085          $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$next_block">next ref<\/a>  <a href="#$last_block">last ref<\/a><\/pre>/s;
00086  
00087          # Middle refs point to previous and next
00088          my $prev_block = $this_block;
00089          for ($refnum = 2; $refnum <= $block_numrefs{$blockname} - 1; $refnum ++) {
00090              # middle refs point to prev and next
00091              $this_block = $blockname . "_ref_" . ($refnum);
00092              $next_block = $blockname . "_ref_" . ($refnum + 1);
00093              $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$next_block">next ref<\/a>  <a href="#$prev_block">prev ref<\/a><\/pre>/s;
00094              $prev_block = $this_block;
00095          }
00096  
00097          # last ref points to first and previous
00098          $this_block = $blockname . "_ref_" . ($refnum);
00099          $doc_html_str =~ s/<\/pre><!-- endblock $this_block -->/<a href="#$first_block">first ref<\/a>  <a href="#$prev_block">prev ref<\/a><\/pre>/s;
00100      }
00101  }
The first reference has no "prev", so a link to the last reference is included. The last reference has no "next", so a link to the first reference is included. (The source html always links to the first reference.)

Write out the main documentation html file and exit

A bit earlier in the program, the output file was opened:

00067  # open main doc file
00068  
00069  $doc_html_filename = basename($main_doc_filename) . ".html";  # strip directory
00070  open($doc_html_outfd, ">", $doc_html_filename) || die "Error, could not open htmlfile '$doc_html_filename'";
Then, after the input file is processed, the returned html output is written to the output file:
00103  # write doc html file
00104  
00105  print $doc_html_outfd "$doc_html_str\n";
00106  close($doc_html_outfd);
00107  
00108  # Create frameset page
00109  
00110  my $index_o_file;
00111  open($index_o_file, ">", "index.html") || die "Error, could not open htmlfile 'index.html'";
00112  print $index_o_file <<__EOF__;
00113  <html><head></head>
00114  <frameset cols="50%,*">
00115  <frame src="$doc_html_filename" name="doc">
00116  <frame src="$o_initialsource" name="src">
00117  </frameset>
00118  </html>
00119  __EOF__
00120  close($index_o_file);
00121  
00122  # Create blank page for initial source frame
00123  
00124  my $blank_o_file;
00125  open($blank_o_file, ">", "blank.html") || die "Error, could not open htmlfile 'blank.html'";
00126  print $blank_o_file "<html><head></head><body>Click a source line number to see the line in context.</body></html>\n";
00127  close($blank_o_file);
00128  
00129  # All done.
00130  exit($exit_status);
The variable $exit_status is initialized to zero in main and is incremented each time an error is reported. Thus, if no errors are reported, the program exits with success (0).

Create a few html output files

There are a couple of misc support files needed by the html documentation. Since the documentation is best viewed with html frames, the first file is the frameset:

00108  # Create frameset page
00109  
00110  my $index_o_file;
00111  open($index_o_file, ">", "index.html") || die "Error, could not open htmlfile 'index.html'";
00112  print $index_o_file <<__EOF__;
00113  <html><head></head>
00114  <frameset cols="50%,*">
00115  <frame src="$doc_html_filename" name="doc">
00116  <frame src="$o_initialsource" name="src">
00117  </frameset>
00118  </html>
00119  __EOF__
00120  close($index_o_file);
Note the use of Perl's "here document" <<__EOF__ ... __EOF__ See Wikipedia and the Perl documentation if you are not familiar with this construct. Contrast this with the next code fragment.

By default, when the documentation is first brought up, there is no source to display in the source frame. So we need an almost blank page:

00122  # Create blank page for initial source frame
00123  
00124  my $blank_o_file;
00125  open($blank_o_file, ">", "blank.html") || die "Error, could not open htmlfile 'blank.html'";
00126  print $blank_o_file "<html><head></head><body>Click a source line number to see the line in context.</body></html>\n";
00127  close($blank_o_file);
I didn't use a "here" document for this, but I could have.

That initial blank page can be overridden with the "-i" command-line option, or with the "initialsource" semlit command.

Function: process_doc_file()

The process_doc_file() function is called from the main program:

00072  # Main loop; read each line in doc file
00073  
00074  my $doc_html_str = process_doc_file($main_doc_filename);
first ref  prev ref
It is also called (recursively) from the semlit command include.

The function starts out with:

00136  sub process_doc_file {
00137      my ($doc_filename) = @_;
Since this function can be called recursively, we save any existing file name and line number (they are restored before return).

The main function is to open an sldoc file, read the file and process the semlit commands, and return the documentation html output. Note that the sldoc input file is already in html form; once the file is read, all that remains to be done is to process the semlit commands. In the degenerate simple case of an sldoc file containing no semlit commands at all, the output html file will be exactly equal to the sldoc input file. (An unlikely case since it is the semlit commands which provide the value of semi-literate documentation.)

Here are the high-level steps it performs:

  1. Open sldoc input file.
  2. Read the file into memory.
  3. For each semlit command contained in that file:
    1. Execute the command.
    2. Replace the command with the html returned by the command execution and loop.
  4. Return the processed html.

In an earlier version, the program read the sldoc input file one line at a time. The line was tested to determine if it was an semlit command. If so, the line was executed and the line dropped. This approach worked fine when I used a simple text editor to create the html sldoc file. But editing html with a simple text editor is painful, error-prone, and inefficient. I really wanted to use an html editor. But html editors generally do not make it easy to control how the content is arranged on lines of the physical html file. I discovered that multiple semlit commands can be packed onto a single line, and even split across lines.

Fortunately, Perl provides powerful constructs which make this kind of processing easy. Instead of looking at the file line-by-line, the entire file can be read into a single string variable, and regular expression matching can be used to find the commands, one at a time. You will see this below.

Open sldoc input file

The semlit program supports having libraries of sldoc files for standard boilerplate. I chose to model this after the C compiler: the programmer simply specifies the name of the file, and one or more search directories can be specified on the command line with the -I option. These directories are set up in the main program thus:

00051  my @o_incdirs = (".");  # GetOptions will append additional dirs for each "-I".
...
00054  GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
next ref  last ref
The GetOptions() function will parse out as many -I options that the user supplies and leaves the @incdirs array set with the directories (including ".").

Back in the process_doc_file() function, the file is opened as follows:

00140      # open source file, using one or more search directories
00141  
00142      my $incdir;
00143      my $open_success = 0;
00144      foreach $incdir (@o_incdirs) {
00145          if (open($doc_infd, "<", "$incdir/$doc_filename")) {
00146              $open_success = 1;
00147              last;  # break out of foreach
00148          }
00149      }
00150      if (! $open_success) {
00151          err("could not open doc file '$doc_filename', skipping");
00152          return;
00153      }
It loops through the array of include directories until the file can be opened. Note that almost the same code exists in process_src_file().

Read the file into memory

Perl makes this easy:

00155      # Read entire file into memory
00156  
00157      my @doctexts = <$doc_infd>;
00158      close($doc_infd);
00159      chomp(@doctexts);  # remove line delims from every line
00160      my $num_lines = scalar(@doctexts);  # count lines in file
00161      my $doctext = join("\n", @doctexts) . "\n";  # combine as a single string
00162      $doctext =~ s/\r//gs;  # remove carriage returns, if any
The chomp() function removes line delimiters, which, depending on platform, might be something other than linefeeds, and the join() combines the lines into one long string, and inserts linefeeds as line endings. Then, carriage returns, if any, are removed. This unifies different platforms. (Similar code can be found in process_src_file().)

For each semlit command contained in that file

The basic algorithm is to find the first semlit command in the file, execute it, and then replace that semlit command with the results of the command. Note that it is possible that the replacement text contains semlit commands. So, each time a semlit command is processed, the entire file must be re-scanned from the beginning. Instead of trying to process the file line-by-line, the loop executes until there are no more commands to execute:

00167      # process semlit commands
00168      while ($doctext =~ /$o_delim\s*semlit\s*$o_fs\s*([^$o_delim]+)$o_delim/is) {
00169          my $cmd = $1;  # text of command (minus standard stuff)
The variable $o_delim contains the semlit command delimiter and defaults to "=". The variable $o_fs contains the semlit command field separator and defaults to ",".

Note that the delimiter will certainly be used in the code without being associated with a semlit command, so the mere presence of the delimiter does not flag the start of a semlit command. The word "semlit" must immediately follow the delimiter ("=semlit"), and that must be followed by a field separator (","). Also note that the match pattern captures the text between that field separator and the ending delimiter. That represents the semlit command keyword and parameters.

The next lines captures the file contents before and after the semlit command:

00170          my $prefix = $PREMATCH;  # text preceiding the command
00171          my $suffix = $POSTMATCH;  # text after the command
Note the use of the $PREMATCH and $POSTMATCH built-in variables. See the Perl documentation if you are not familiar with this construct.

If errors are detected in the source files, it is helpful to print error messages which include the line number. This is challenging in this function because we are not processing on a line-by-line basis. It is further complicated by the fact that as commands are replaced by the returned text form command execution, the number of lines in the $doctext changes over time. However, since the commands are processed in order, we can calculate the line number by looking at the number of lines following the command, and subtracting it from the total number:

00173          # calculate line number containing the start of this semlit command
00174          $cur_file_linenum = $num_lines - scalar(my @t = split("\n", $suffix)) + 1;
As you know from the Perl documentation, the split() function will by default not create an empty entry at the end (after the final '\n').

Execute the command

Now execute the command and capture the replacement text:

00176          my $repl = semlit_cmd($cmd);
The semlit_cmd() function returns formatted html.

Replace the command with the html returned by the command execution and loop

Replacing the matched semlit command with the resulting text is straight-forward:

00178          # Commands are removed, and often replaced with some result
00179          $doctext = $prefix . $repl . $suffix;
00180      }  # while
This is where the number of lines in $doctext can change.

return the processed html

Finally:

00184      return $doctext;
00185  }  # process_doc_file

Recursion

For the most part, Perl takes care of allowing functions to be recursive. However, we do have some global variables for error reporting which adds an extra challange:

00029  my $cur_file_name = "";
00030  my $cur_file_linenum = 0;
By keeping the file name and line number as global variables, any function can report a user-friendly error message with a minimum of fuss.

Once the initial steps are completed, the process_doc_file() function is ready to start handling commands. But before it does, we want to save those global variables:

00164      my ($save_doc_filename, $save_doc_linenum) = ($cur_file_name, $cur_file_linenum);
00165      ($cur_file_name, $cur_file_linenum) = ($doc_filename, 0);
Then the command loop executes. When that is done, just before returning, we want to restore the global variables:
00182      ($cur_file_name, $cur_file_linenum) = ($save_doc_filename, $save_doc_linenum);
You will see very similar code in the process_src_file() function.

Function: semlit_cmd()

The process_doc_file() and process_src_file() functions read input files, find semlit commands, and call semlit_cmd() to execute them. The delimiters, "semlit", and the first field separator are stripped, so that the passed-in command string starts with the command name.

Command: tabstop

The tabstop command simply updates the $tabstop global variable:

00192      # semlit tabstop - doc: source tab expansion
00193      if ($cmd =~ /^tabstop\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00194          if ($1 =~ /^\d+$/) {
00195              $tabstop = $1;  # used by Text::Tabs
00196              return "";
00197          } else {
00198              err("Tabstop value '$1' must be numeric");
00199              return "";
00200          }
00201      }
The $tabstop variable is used directly by the built-in Text::Tabs Perl module:
00020  use Text::Tabs;
Note that the $tabstop variable can also be set on the command line:
00054  GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
next ref  prev ref
So, where does this actually get used? Right here:
00373              $iline = expand($iline);  # expand tabs according to $tabstop.
(In the function process_src_file().) The expand() function is part of the Text::Tabs module.

The tabstop command does not return any text (returns "" - empty string).

Command: srcfile

The srcfile command is used to scan an slsrc file:

00203      # semlit srcfile - doc: read and process source file
00204      elsif ($cmd =~ /^srcfile\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00205          return process_src_file($1, $2);
00206      }
During the scan, any semlit commands found in the slsrc file are processed (usually block/endblock commands), and the source output files are created (both html and src).

The returned text is a link to the plaintext output src file. The author of the sldoc file expects this and positions the srcfile command such that a link to the src file makes sense. For example, near the top of this sldoc file, the srcfile commands are arranged in a bullet list. Each srcfile command is followed by the hint "(right-click and save)" followed by a short description of the file.

Command: initialsource

The initialsource command is used to specify a file for initial display in the source frame:

00208      # semlit initialsource - doc: set initial source frame
00209      elsif ($cmd =~ /^initialsource\s*$o_fs\s*([^\s$o_fs]+)\s*/i) {
00210          $o_initialsource = $1;
00211          return "";
00212      }
The file must be the final file name, ending with ".html".

There is no returned text.

Command: include

The include command is used to scan an sldoc file:

00214      # semlit include - doc: read and process doc file
00215      elsif ($cmd =~ /^include\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00216          return process_doc_file($1);
00217      }
Note that this represents a recursive call to process_doc_file(). During the scan, the included file is processed the same way as the master sldoc file.

The returned text is simply the processed html of the included file.

Command: insert

The insert command is used to insert into the document output html file a named block of source lines (from slsrc files):

00219      # semlit insert - doc: insert a source block
00220      elsif ($cmd =~ /^insert\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00221          my $block_name = $1;
00222          if (exists($srcblocks{$block_name})) {
00223              my $num_refs = 1;
00224              my $block_ref_name = $block_name;
00225              if (defined($block_numrefs{$block_name})) {
00226                  $num_refs = $block_numrefs{$block_name} + 1;
00227                  $block_ref_name = $block_name . "_ref_$num_refs";
00228              }
00229              $block_numrefs{$block_name} = $num_refs;
00230  
00231              my $block_str = $srcblocks{$block_name};
00232              return <<__EOF__;
00233  <a name="$block_ref_name" id="$block_ref_name"><\/a>
00234  <small><pre>
00235  $block_str
00236  <\/pre><!-- endblock $block_ref_name --></small>\n
00237  __EOF__
00238          } else {
00239              err("attempt to insert block named '$block_name' but block not defined");
00240              return "";
00241          }
00242      }
Note the use of Perl's "here document" <<__EOF__ ... __EOF__ See Wikipedia and the Perl documentation if you are not familiar with this construct. Also note that the source blocks are stored in the %srcblocks hash by the process_src_file() function.

The endblock html comment becomes important during the final fix up step; for blocks inserted multiple times, the comment is replaced with links to other references.

The returned text is the processed html of the source block.

Command: block

The block command essentially tells the process_src_file() function to start storing source lines into a named block:

00244      # semlit block - src: start a named block of source
00245      elsif ($cmd =~ /^block\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00246          my $block_name = $1;
00247          if (defined($srcblocks{$block_name})) {
00248              err("block '$block_name' already defined");
00249              return "";
00250          }
00251          $srcblocks{$block_name} = "";
00252          $block_numrefs{$block_name} = 0;
00253          $active_srcblocks{$block_name} = $cur_file_linenum;
00254          
00255          $global_src_buffer = "<span name=\"$block_name\" id=\"$block_name\"><\/span>";
00256          return "";
00257      }
Since it is possible for source lines to be included in multiple named blocks, the global hash %active_src_blocks is used to indicate which named blocks are accumulating.

Command: endblock

The endblock command essentially tells the process_src_file() function to stop storing source lines into the named block:

00259      # semlit endblock - src: end a named block of source
00260      elsif ($cmd =~ /^endblock\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00261          my $block_name = $1;
00262          if (exists($active_srcblocks{$block_name})) {
00263              delete($active_srcblocks{$block_name});
00264              $srcblocks{$block_name} =~ s/\n$//s;
00265              return "";
00266          } else {
00267              err("found endblock for '$block_name', which is not active");
00268              return "";
00269          }
00270      }
00271  
00272      # semlit tooltip - create hover over text for a phrase
00273      elsif ($cmd =~ /^tooltip\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00274          my $text_source = $1;
00275          my $text_link = $2;
00276          my $contents = file_get_contents($text_source);
00277          return <<__EOF__;
00278  <a href="#" title="$contents" style="color:2222ee;border-bottom:1px dotted #2222ee;text-decoration: none;">$text_link</a>
00279  __EOF__
00280      }
00281  
The block name needs to be supplied because a nested block does not need to be fully-contained by the outer block. I.e. if multiple blocks are active, the endblock does not necessarily end the most-recently opened block.

Command: tooltip

The tooltip command creates a keyword in block of text that can be hovered over for additional information.

00272      # semlit tooltip - create hover over text for a phrase
00273      elsif ($cmd =~ /^tooltip\s*$o_fs\s*([^\s$o_fs]+)\s*$o_fs\s*([^\s$o_fs]+)\s*$/i) {
00274          my $text_source = $1;
00275          my $text_link = $2;
00276          my $contents = file_get_contents($text_source);
00277          return <<__EOF__;
00278  <a href="#" title="$contents" style="color:2222ee;border-bottom:1px dotted #2222ee;text-decoration: none;">$text_link</a>
00279  __EOF__
00280      }
The tooltip data is loaded from a file on the local server, and the keyword is supplied as the last parameter. Note: the keyword cannot contain any spaces.

Function: process_src_file()

The process_src_file() function is called from the srcfile command. It reads and processes an slsrc file (program source code). Part of the processing is to generate two output files: a source html file, and a src file, stripped of its semlit commands and ready for compilation.

This function has conceptual similarities with the process_doc_file() function, but there are important differences. The biggest difference is the overall approach to processing the file. Here, we process the slsrc file line-at-a-time. A line must contain either program source code, or a single semlit command (although that command might be enclosed in comment delimiters). Instead of replacing the command with returned content, the line containing the command is simply discarded.

Here are the high-level steps performed:

  1. Open the slsrc input file.
  2. Open the output source html file.
  3. Open the output src file.
  4. For each input line in the slsrc file:
    1. If it is a semlit command, process it.
    2. Else it is a source line:
      1. Write the line to the src file.
      2. Html-ify the line.
      3. If there are active source blocks accumulating:
        1. Create doc link and write source html file.
        2. Add the source line to all of the active blocks.
      4. Else no active block, write source html file without link.
  5. Close files and wrap up.
  6. Return html link to the src file.

Open the slsrc input file

Almost the same code exists in process_doc_file(). Here we are opening an slsrc file which might be in any of the directories in the array @o_incdirs:

00299      # open source file, using one or more search directories
00300      my $incdir;
00301      my $open_success = 0;
00302      foreach $incdir (@o_incdirs) {
00303          if (open($slsrc_infd, "<", "$incdir/$src_filename")) {
00304              $open_success = 1;
00305              last;  # break out of foreach
00306          }
00307      }
00308      if (! $open_success) {
00309          err("could not open src file '$src_filename', skipping");
00310          return "";
00311      }

Open the output source html file

The output source html file is opened and some initial content is written:

00313      # create and write initial content to html-ified source file
00314      if (! open($src_html_outfd, ">", "$src_filename.html")) {
00315          err("could not open output source html file '$src_filename.html', skipping");
00316          close($slsrc_infd);
00317          return "";
00318      }
00319      print $src_html_outfd <<__EOF__;
00320  <!DOCTYPE html><html><head><title>$plain_src_filename</title>
00321  <link rel="stylesheet" href="//code.jquery.com/ui/1.11.4/themes/smoothness/jquery-ui.css">
00322  <script src="//code.jquery.com/jquery-1.10.2.js"></script>
00323  <script src="//code.jquery.com/ui/1.11.4/jquery-ui.js"></script>
00324  <link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.5/styles/default.min.css">
00325  <script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/8.5/highlight.min.js"></script>
00326  <script>
00327    \$(function() {
00328      \$( document ).tooltip();
00329    });
00330  </script>
00331  <style>
00332  #code {background-color:#ffffff;};
00333  </style>
00334  </head>
00335  <body><h1>$plain_src_filename</h1>
00336  <script>hljs.initHighlightingOnLoad();</script>
00337  <small><pre><code id="code"><table border=0 cellpadding=0 cellspacing=0><tr>
00338  __EOF__
In addition to the initial html source, it also contains a link to the plain text src file which can be downloaded.

Open the output src file

The output src file is opened:

00340      # Create plaintext source file (without semlit commands)
00341      if (! open($src_outfd, ">", "$plain_src_filename")) {
00342          err("could not open output src '$plain_src_filename', skipping");
00343          close($slsrc_infd);
00344          close($src_html_outfd);
00345          return "";
00346      }

For each input line in the slsrc file:

Read the slsrc file, line-by-line:

00355      my $iline;
00356      while (defined($iline = <$slsrc_infd>)) {
00357          chomp($iline);  # remove line delim
00358          $iline .= "\n";  # add newline
00359          $iline =~ s/\r//gs;  # remove carriage returns, if any
00360          $cur_file_linenum ++;
The chomp() function removes line delimiter, which, depending on platform, might be something other than a linefeeds, and the next line adds a newline to be the line delimiter. Then, carriage returns, if any, are removed. This unifies different platforms. (Similar code can be found in process_doc_file().)

If it is a semlit command, process it

The slsrc file contains semlit commands, usually block and endblock. Find and process them:

00362          # check for semlit commands
00363          if ($iline =~ /$o_delim\s*semlit\s*$o_fs\s*([^$o_delim]+)$o_delim/i) {
00364              semlit_cmd($1);
00365              # discard command line
00366          }

Else it is a source line:

If the slsrc line does not contain a semlit command, then it is normal source code.

00367          else {
00368              $src_linenum ++;  # don't count semlit command lines
Note that source code lines are counted separately than input slsrc file lines ($cur_file_linenum counted above). The input line count ($cur_file_linenum) includes semlit command lines and is used when printing error messages by err(), while the source code line count ($src_linenum) does not include semlit command lines and is used as the user-visible line number in the output source html file.

Write the line to the src file

Write the plaintext source line to the src file:

00370              print $src_outfd $iline;

Html-ify the line

In advance of writing the line to the output source html file, expand tabs and convert the special characters '&', '<' and '>' to their html forms:

00372              # fix up source for html rendering (tab expansion, special char encoding)
00373              $iline = expand($iline);  # expand tabs according to $tabstop.
00374              $iline =~ s/\&/\&amp;/g;  $iline =~ s/</\&lt;/g;  $iline =~ s/>/\&gt;/g;
Note that the expand() function is part of the Text::Tabs Perl module and uses the $tabstop global variable to control how it expands.

If there are active source blocks accumulating:

Check to see if this source line is inside one or more block/endblock constructs:

00376              # if we are in at least one block, link the source to the earliest block's first doc reference
00377              if (scalar(keys(%active_srcblocks)) > 0) {

Create doc link and write source html file

There is at least one named block active. Assuming there might be more than one, find the active block which was most-recently opened (is at the highest-numbered input line):

00378                  # descending sort so that elemet 0 is largest
00379                  my @active_blocks = sort { $active_srcblocks{$b} cmp $active_srcblocks{$a} } keys(%active_srcblocks);
This sort construct orders the keys by descending content of the %active_srcblocks hash. Thus, $active_blocks[0] is the name of that most-recently opened block. This is used to construct the link back to the doc:
00380                  my $targ = $active_blocks[0] . "_ref_1";
00381                  $src_lines_td .= sprintf("<a href=\"$doc_html_filename#$targ\" target=\"doc\">%05d<\/a>\n", $src_linenum);
00382                  if ($global_src_buffer) {
00383                      $src_content_td .= sprintf("%s  %s", $global_src_buffer, $iline);
00384                      $global_src_buffer = "";
00385                  }
00386                  else {
00387                      $src_content_td .= sprintf("  %s", $iline);
00388                  }

Add the source line to all of the active blocks

For each active named block/endblock construct, add the source code to the %srcblocks hash:

00390                  # for each open source block on this line of source, link the doc block to the that source block
00391                  foreach my $block_name (keys(%active_srcblocks)) {
00392                      my $a = sprintf("<a href=\"$cur_file_name.html#$block_name\" target=\"src\">%05d<\/a>  %s", $src_linenum, $iline);
00393                      $srcblocks{$block_name} .= $a;
00394                  }
This is used by the insert semlit command to insert the source block into the output doc html file.

Else no active block, write source html file without link

For source lines which are not contained in a block/endblock construct, no doc link is needed when writing to the output source html file:

00395              } else {
00396                  # no active blocks
00397                  my $a = sprintf("%05d\n", $src_linenum);
00398                  my $c = sprintf("  %s", $iline);
00399                  $src_lines_td .= $a;
00400                  $src_content_td .= $c;
00401              }

Close files and wrap up

Close the files:

00418      close($slsrc_infd);
00419      close($src_outfd);
00420  
00421      print $src_html_outfd "</tr></table></code>\n";
00422      print $src_html_outfd "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n";
00423      print $src_html_outfd "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n";
00424      print $src_html_outfd "</pre></small></body></html>\n";
00425      close($src_html_outfd);
Note the many newlines printed. This is so that clicking on a line number which is near the end of the source file will still position that line of source at the top of the screen.

Also, if the user accidentally started one or more named blocks but did not end them, print errors and force them ended:

00427      # if the source file started a block but reached eof without ending it, end it here.
00428      foreach (keys(%active_srcblocks)) {
00429          err("block named '$_' started but not ended");
00430          semlit_cmd("endblock$o_fs$_");  # end it for the user
00431      }
Do it by calling semlit_cmd() function, passing it the endblock command (as if the slsrc file had it).

Return html link to the src file

But before we return, restore the $cur_file_name and $cur_file_linenum variables to their previous state. Then return.

00433      # the semlit.srcfile command writes a link to the plaintext source file
00434      ($cur_file_name, $cur_file_linenum) = ($save_doc_filename, $save_doc_linenum);
00435      return "<a href=\"$plain_src_filename\">$plain_src_filename</a>";
Since the call to process_src_file() came from semlit_cmd() executing the srcfile command, and that command is used in the sldoc file, the thing we return is a link to the plaintext output src file.

Function: help()

First, let's declare a couple of globals that will be used for helping the user:

00025  my $tool = "semlit.pl";
00026  my $usage_str = "$tool [-h] [-d delim] [-f fs] [-I dir] [-t tabstop] [files]";
next ref  last ref
(The usage() function also uses this.) In the main code, the "-h"option calls help() to print a more-extensive help:
00470  sub help {
00471      my($err_str) = @_;
00472  
00473      if (defined $err_str) {
00474          print "$tool: $err_str\n\n";
00475      }
00476      print <<__EOF__;
00477  Usage: $usage_str
00478  Where:
00479      -h - print help screen
00480      -d delim - delimiter character at start and end of a semlit command.
00481              (default to '=')
00482      -f fs - field separator character within a semlit command.
00483              (default to ',')
00484      -i initialsource - file name for initial source frame.
00485              (default to "blank.htmo")  Also, initialsource semlit command.
00486      -I dir - directory to find files for 'srcfile' and 'include' commands.
00487              (default to ".")  The "-I dir" option can be repeated.
00488      -t tabstop - convert tabs to "tabstop" spaces.
00489              (default to '4')
00490      files - zero or more input files.  If omitted, inputs from stdin.
00491  
00492  __EOF__
00493  
00494      exit($exit_status);
00495  }  # help
Note the use of Perl's "here document" <<__EOF__ ... __EOF__ See Wikipedia and the Perl documentation if you are not familiar with this construct.

Function: file_get_contents()

This is a simple routine to read the contents of a file for the purpose of filling in the tooltips:

00458  sub file_get_contents{
00459      my ($text_file) = @_;
00460      open FILE, $text_file or die $!;
00461      flock FILE, 1 or die $!;        # wait for lock
00462      seek(FILE, 0, 0);       # move pointer to beginning
00463      my $slurp = do{local $/; <FILE>};
00464      flock FILE, 8;          # release the lock
00465      close(FILE);
00466  
00467      return $slurp;
00468  } # file_get_contents

Error handling

When there is an obvious user error in the invocation of the semlit program, the usage() function is called.

For example:

00054  GetOptions("h"=> \$o_help, "d=s" => \$o_delim, "f=s" => \$o_fs, "i=s" => \$o_initialsource, "I=s" => \@o_incdirs, "t=i" => \$tabstop) || usage("Error in GetOptions");
first ref  prev ref
Note the use of the logical OR construct (||). Because of Perl's C-like short-circuit evaluation, the right-hand expression (function call to usage()) is only executed if the left-hand side (function call to GetOptions()) returns false. I.e. usage is called if GetOptions() fails. Non-Perl programmers might be tempted to use a simple if/then construct, but the logical OR construct is such a common Perl idiom that the wise reader will learn it.

In other cases, an error is discovered in one of the input files, sldoc and/or slsrc. In those cases, the err() function is called.

Function: usage()

First, let's declare a couple of globals that will be used for helping the user:

00025  my $tool = "semlit.pl";
00026  my $usage_str = "$tool [-h] [-d delim] [-f fs] [-I dir] [-t tabstop] [files]";
first ref  prev ref
(The help() function also uses this.) The usage() function allows an optional error message to be passed in, which is printed before the usage string:
00447  sub usage {
00448      my($err_str) = @_;
00449  
00450      if (defined $err_str) {
00451          print STDERR "$tool: $err_str\n\n";
00452      }
00453      print STDERR "Usage: $usage_str\n\n";
00454      $exit_status ++;
00455      exit($exit_status);
00456  }  # usage

Function: err()

The function err() is a programmer convenience which prints an error message, along with the file name and line number where the error was discovered.

00439  sub err {
00440      my ($msg) = @_;
00441  
00442      print STDERR "Error [$cur_file_name:$cur_file_linenum], $msg\n";
00443      $exit_status ++;
00444  }  # err
It also increments $exit_status, which starts out at zero (success):
00043  my $exit_status = 0;  # assume success
and is used when exiting the program:
00129  # All done.
00130  exit($exit_status);
Thus, a non-zero (failure) exit status also indicates how many errors there were.

Program Explanation: semlit.sh

As mentioned above, the first line of semlit.pl is a fairly common shebang:

00004  #!/usr/local/bin/perl -w

This Unix construct, combined with setting the executable bit on the file, is intended to allow the tool to be run by simply typing the file name as a command (assuming that the PATH environment variable is set up right). However, it does require that the physical location of the Perl interpreter be encoded directly in the file. Unfortunately, different Unix systems install Perl in different places - sometimes under /usr/local, sometimes in /bin, sometimes under /opt.

I vaguely remember that there is a clever way to re-code that line such that the shell will search for the Perl interpreter in the PATH environment variable. But I don't remember how to do it.

One way to run semlit.pl without having to set the Perl interpreter's full location is:
    perl SemLitPath/semlit.pl ...

But even that is somewhat unsatisfying for an experienced Unix user. Lazy as we are, we would prefer to let PATH do the work of finding the program, so we just enter:
    semlit ...

One common way to accomplish this is with a wrapper shell script which encapsulates any annoying details of running the tool. Hence the semlit file.

The first line of semlit is the standard shebang:

00004  #!/bin/sh

But this time it references the universally-respected location for a Bourne-compatible shell. No chance that this won't work on some flavor of Unix.

Next save the initial working directory (so that we can get back to it).

00008  IWD=`pwd`                    # remember initial working directory

I establish the convention that the wrapper script must be in the same directory as the perl script. So, the next thing to do is figure out which directory contains the wrapper which is running:

00010  # Find dir where tool is stored (useful for finding related files)
00011  TOOLDIR=`dirname $0`
00012  # Make sure TOOLDIR is a full path name (not relative)
00013  cd $TOOLDIR; TOOLDIR=`pwd`; cd $IWD

This might look more complicated than it needs to be. Isn't the first line enough?

00011  TOOLDIR=`dirname $0`

No, it isn't. Suppose you are in your home directory, and you have placed the semlit files in $HOME/bin. Then let's say you execute the program by entering:
    bin/semlit ...

That is perfectly legal, and dirname $0 will, not surprisingly, return bin, a relative path. But I want a fully-qualified path, so I include the three commands:

00013  cd $TOOLDIR; TOOLDIR=`pwd`; cd $IWD

You simply cd to that potentially-relative location and use pwd to get the full path. Then cd back to the initial working directory. This is easier than trying to parse all the possible return values for dirname.

All that remains to be done is to run the perl interpreter as a comand (such that PATH is used):

00015  perl $TOOLDIR/semlit.pl $*