Extract glossary commands from TeX file

< Previous page^Computing^Next page >


In my forthcoming book, written in LaTeX, I have glossary macros in two formats:

  • \gls<etc>{entry} and
  • \tdef<etc>{entry}

where there are several choices for <etc> and either gls or tdef can be capitalized.

Goal: to extract the glossary items being used in a given TeX file.

Solution: a small awk programme invoked at the command line.

Programme (saved as extractgloss.awk):

# awk programme to extract glossary commands from a TeX file

# before processing
BEGIN {
}
# during processing
{
#    there may be more than one match per line; repeat as necessary
     while (match($0,"(gls|Gls|tdef|Tdef)[a-z]*\{[a-z]+\}")) {
          # RSTART is where the pattern starts
          # RLENGTH is the length of the pattern
          found = substr($0,RSTART,RLENGTH);
          # replace the line with its remainder
          $0    = substr($0,RSTART+RLENGTH);
          # now find the glossary entry within the command
          match(found,"\{[a-z]+\}");
          # output it
          printf("%s\n", substr(found,RSTART+1,RLENGTH-2));
     }
}
# after processing
END {
}

(Yes, this can be written more efficiently, but this is readable! And the BEGIN and END sections are not used here, but they are provided as convenient placeholders.)

Command line command:

awk -f extractgloss.awk infile | sort | uniq >outfile

Leave a Comment