Extract glossary commands from TeX file

< Previous page^Computing^Next page >

In my forthcoming book, written in LaTeX, I have glossary macros in two formats:

  • \gls<etc>{entry} and
  • \tdef<etc>{entry}

where there are several choices for <etc> and either gls or tdef can be capitalized.

Goal: to extract the glossary items being used in a given TeX file.

Solution: a small awk programme invoked at the command line.

Programme (saved as extractgloss.awk):

# awk programme to extract glossary commands from a TeX file

# before processing
# during processing
#    there may be more than one match per line; repeat as necessary
     while (match($0,"(gls|Gls|tdef|Tdef)[a-z]*\{[a-z]+\}")) {
          # RSTART is where the pattern starts
          # RLENGTH is the length of the pattern
          found = substr($0,RSTART,RLENGTH);
          # replace the line with its remainder
          $0    = substr($0,RSTART+RLENGTH);
          # now find the glossary entry within the command
          # output it
          printf("%s\n", substr(found,RSTART+1,RLENGTH-2));
# after processing

(Yes, this can be written more efficiently, but this is readable! And the BEGIN and END sections are not used here, but they are provided as convenient placeholders.)

Command line command:

awk -f extractgloss.awk infile | sort | uniq >outfile

Leave a Comment