# Extract glossary commands from TeX file

In my forthcoming book, written in LaTeX, I have glossary macros in two formats:

• \gls<etc>{entry} and
• \tdef<etc>{entry}

where there are several choices for <etc> and either gls or tdef can be capitalized.

Goal: to extract the glossary items being used in a given TeX file.

Solution: a small awk programme invoked at the command line.

Programme (saved as extractgloss.awk):

```# awk programme to extract glossary commands from a TeX file

# before processing
BEGIN {
}
# during processing
{
#    there may be more than one match per line; repeat as necessary
while (match(\$0,"(gls|Gls|tdef|Tdef)[a-z]*\{[a-z]+\}")) {
# RSTART is where the pattern starts
# RLENGTH is the length of the pattern
found = substr(\$0,RSTART,RLENGTH);
# replace the line with its remainder
\$0    = substr(\$0,RSTART+RLENGTH);
# now find the glossary entry within the command
match(found,"\{[a-z]+\}");
# output it
printf("%s\n", substr(found,RSTART+1,RLENGTH-2));
}
}
# after processing
END {
}
```

(Yes, this can be written more efficiently, but this is readable! And the BEGIN and END sections are not used here, but they are provided as convenient placeholders.)

Command line command:

`awk -f extractgloss.awk infile | sort | uniq >outfile`