#LyX 1.3 created this file. For more info see http://www.lyx.org/
\lyxformat 221
\textclass article
\begin_preamble
\usepackage{url}
\end_preamble
\language english
\inputencoding latin1
\fontscheme default
\graphics default
\paperfontsize default
\spacing single 
\papersize a4paper
\paperpackage a4
\use_geometry 0
\use_amsmath 0
\use_natbib 0
\use_numerical_citations 0
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle default

\layout Title

The forindex utilities
\newline 
 Version 0.1
\layout Author

Guido Milanese
\newline 

\begin_inset ERT
status Collapsed

\layout Standard
 
\backslash 
url{guido.milanese@unicatt.it}
\end_inset 


\layout Standard


\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
thispagestyle{empty}
\end_inset 


\layout Date

January 2005
\layout Abstract
\noindent 
Making a good index is a very important part in the process of writing a
 document, particularly books and manuals.
 Entering data manually can be a very long process; although a certain amount
 of data must be entered manually, some tasks can be performed automatically,
 e.g.
 an index of geographical names or other trivial tasks.
 This task can be achieved using the program 
\family sans 
doindex
\family default 
, that prepares a file to be processed by 
\family sans 
makeindex
\family default 
.
 Another useful feature is to remove all the 
\family typewriter 

\backslash 
index
\family default 
 entries in a LaTeX file, obtaining a clean file with no indexing (program
 
\family sans 
cleanindex
\family default 
).
 The programs are written in Snobol4; the only requirement is to install
 the interpreter.
 For Windows, a standalone file compiled with Spitbol is also provided,
 and the programs can be run without the need of an external interpreter.
 
\layout Standard


\begin_inset LatexCommand \tableofcontents{}

\end_inset 


\layout Section

The programs
\layout Subsection

The program 
\family sans 
doindex
\layout Standard

The programs 
\family sans 
doindex
\family default 
 reads a LaTeX file, using a list file, and enters index entries in the
 file according to this list.
 Previously entered index entries are left unchanged, making it possible
 to add further indexing to an already indexed file.
\layout Standard

The input LaTeX file may have extension 
\family typewriter 
tex
\family default 
 or 
\family typewriter 
latex
\family default 
, both uppercase and lowercase (not mixed as in 
\family typewriter 
Tex
\family default 
).
\layout Standard

The list file is meant to contain all the words to be indexed.
 It must have exactly the extension 
\family typewriter 
wls
\family default 
.
 Sub-entries are identified with the separator character '/'.
 See 
\family sans 
test.wls
\family default 
 as example:
\layout Quote


\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
begin{verbatim}
\newline 
  animals
\newline 
  dogs/animals
\newline 
  cats
\newline 
  house/nouns/english/languages
\newline 
  sleeping@sleep
\newline 
  �vita/Italian/foreign words
\newline 
  �a/French/foreign words
\newline 
  dr�cken/German/foreign words
\newline 
  
\backslash 
end{verbatim}
\end_inset 


\layout Standard

No particular order in this file is required.
 Some users will prefer alphabetical order, others different orders, so
 the programs has no requirements concerning order/sort in this file.
 Entries as 
\family typewriter 
sleeping@sleep
\family default 
 use the standard 
\family sans 
makeindex
\family default 
 syntax and are left unchanged.
 The 
\begin_inset Quotes eld
\end_inset 

logic
\begin_inset Quotes erd
\end_inset 

 of this syntax is opposite to the internal logic of 
\family sans 
makeindex
\family default 
, that is -- I think -- very clever at the stage of typesetting an index,
 but not at the stage of designing an index.
 
\begin_inset Quotes eld
\end_inset 

A dog is an animal
\begin_inset Quotes erd
\end_inset 

 (
\family typewriter 
dog/animal
\family default 
 in my syntax) seems to me to be more natural than 
\begin_inset Quotes eld
\end_inset 

Among animals there are dogs
\begin_inset Quotes erd
\end_inset 

 (
\family typewriter 
animals!dogs
\family default 
 in the 
\family sans 
makeindex
\family default 
 syntax).
 The LaTeX file produced by 
\family sans 
doindex
\family default 
 follows, of course, the 
\family sans 
makeindex
\family default 
 conventions.
\layout Standard

The original LaTeX file is left unchanged.
 A new file will be written, identified by 
\family sans 
-ind
\family default 
.
 For example, from 
\family sans 
file.tex
\family default 
 you will get 
\family sans 
file-ind.tex
\family default 
.
 Of course, you'll have to run 
\family sans 
makeindex
\family default 
 as usual.
\layout Standard

The purpose of the program is similar to what is provided by the program
 
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
mbox{
\end_inset 


\family sans 
ixgen
\family default 

\begin_inset ERT
status Collapsed

\layout Standard
}
\end_inset 

 (
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
url{http://www.iit.upco.es/~oscar/ixgen/}
\end_inset 

) written by 
\shape smallcaps 
Oscar Lopez
\shape default 
 (
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
url{oscar@iit.upco.es}
\end_inset 

), but 
\family sans 
forindex
\family default 
 was designed to be a bit more flexible.
\layout Subsection

The program 
\family sans 
cleanindex
\layout Standard

The program 
\family sans 
cleanindex
\family default 
 removes 
\family sans 

\backslash 
index
\family default 
 sequences from a LaTeX file.
 The program can be used e.g.
 if a user is not happy with the indexing of a file and wants to start it
 over again.
\layout Standard

The input file may have extension 
\family typewriter 
tex
\family default 
 or 
\family typewriter 
latex
\family default 
, both uppercase and lowercase.
\layout Standard

The original file is left unchanged.
 A new file will be written, identified by 
\family sans 
-noind
\family default 
.
 For example, from 
\family sans 
file.tex
\family default 
 you will get 
\family sans 
file-noind.tex
\family default 
.
 In this file, lines concerning 
\family sans 
makeindex
\family default 
 will be left but commented, in order to avoid an empty 
\emph on 
Contents
\emph default 
 section in the output.
 You can uncomment the lines as soon as you want to reindex the file again.
\layout Section

Installation
\layout Subsection

GNU/Linux and other *nix systems
\layout Enumerate

Install 
\family sans 
snobol4
\family default 
 from 
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
url{http://www.snobol4.org}
\end_inset 

.
 This is Philip Budne's CSNOBOL implementation.
 You need a 
\family typewriter 
c
\family default 
 compiler to compile the interpreter; it's normally a very quick and easy
 process.
 
\layout Enumerate

Make sure 
\family sans 
snobol4
\family default 
 is in your PATH or make a symbolic link.
 
\layout Enumerate

Copy all the files from the source directory in a suitable directory (you
 do not need the 
\family typewriter 
bat
\family default 
 files, provided for Windows, and can safely remove them).
 
\layout Enumerate

Make executables the scripts (
\family sans 
doindex
\family default 
 and 
\family sans 
cleanindex
\family default 
 with no extensions), e.g.
 
\family typewriter 
chmod +x doindex
\layout Enumerate

Run the scripts as follows:
\begin_deeper 
\layout Enumerate

-- to index a text: 
\family typewriter 
./doindex file.tex
\layout Enumerate

If you want to exclude words with accents: 
\family typewriter 
./doindex file.tex --noacc
\layout Enumerate

-- to remove 
\family sans 

\backslash 
index 
\family default 
sequences: 
\family typewriter 
./cleanindex file.tex
\layout Enumerate

If the current directory is in your PATH, you do not need 
\family typewriter 
./
\family default 
 before the script name.
\newline 

\end_deeper 
\layout Subsection

Windows
\layout Standard

The package offers exe files compiled with Spitbol (see (
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
url{http://www.snobol4.com}
\end_inset 

).
 Make a directory and copy all the file in the bin/windows directory.
 There must be two 
\family typewriter 
*.exe
\family default 
 files and the two 
\family typewriter 
test.*
\family default 
 files.
\layout Standard

Run the programs as follows:
\layout Standard

-- to index a text: 
\family typewriter 
doindex file.tex
\layout Standard

If you want to exclude words with accents: 
\family typewriter 
doindex file.tex -noacc
\family default 
.
 Accents must be encoded using the 
\family sans 
latin1
\family default 
 encoding (see the list of Todo).
\layout Standard

-- to remove 
\family typewriter 

\backslash 
index
\family default 
 sequences: 
\family typewriter 
cleanindex file.tex
\layout Subsection

Windows from source
\layout Standard

Basically, follow the same directions given about GNU/Linux, but make sure
 to use the 
\family typewriter 
bat
\family default 
 files and to install the Windows version of the interpreter.
 Before using the sources, that are in Unix format, use a script to translate
 from Unix to Dos-Windows format.
 If you do not have such a script, open the files with a text editor and
 save the sources in Windows-Dos format.
 This can be done reading and saving each file with the DOS 
\family sans 
edit
\family default 
 program, with 
\family sans 
vim
\family default 
 or any other editor able to deal with different file formats.
 Do not alter the files if you are not sure of what you are doing.
 Please (1) do not use a word processor (as Word or similar) but a simple
 text editor and (2) make sure to leave the encoding of file 
\family sans 
acc.inc
\family default 
 to 
\family sans 
ISO-8859-1
\family default 
 or 
\family sans 
8859-15
\family default 
, not to plain DOS or Unicode.
\layout Subsection

Cygwin
\layout Standard

I suggest to follow the same directions given for GNU/Linux, but the EXE
 files provided for native Windows can be used anyway if preferred.
\layout Subsection

Macintosh
\layout Standard

Not yet tested (I do not have a Mac right now).
 It's in the TODO list.
\layout Section

Test files
\layout Standard

Please test the program on 
\family sans 
test.tex
\family default 
 and 
\family sans 
test.wls
\family default 
.
 The produced file will be called 
\family sans 
test-ind.tex
\family default 
 if you use 
\family sans 
doindex
\family default 
, 
\family sans 
test-noind.tex
\family default 
 if you use 
\family sans 
cleanindex
\family default 
.
\layout Section

Bugs and TODO
\layout Standard

The program does not support Unicode files.
 At this moment, most LaTeX users are still using latin1, but the situation
 is rapidly changing.
\layout Standard

List of features that I would like to add:
\layout Enumerate

Index also included files.
 
\layout Enumerate

Add typographical styles, such as italics for the most important locations
 of a word.
 
\layout Enumerate

Add support for several indexes (particularly with class 
\family sans 
memoir
\family default 
) 
\layout Enumerate

Add an option to generate a rough index for all the words.
 
\layout Enumerate

Add a support to index words listed with regular expressions.
 E.g.
 
\family typewriter 
read*
\family default 
 should index 
\emph on 
read
\emph default 
, 
\emph on 
reads
\emph default 
, 
\emph on 
reading
\emph default 
, 
\emph on 
readings
\emph default 
, all under the same heading 
\emph on 
read
\emph default 
.
 
\layout Enumerate

Make possible to use another separator for the list file, e.g.
 a simple blank or other char preferred by user.
 
\layout Enumerate

Test the programs on a Mac.
 
\layout Enumerate

Add Unicode support.
 
\layout Section

Acknowledgements
\layout Standard

The program 
\family sans 
ixgen
\family default 
 gave me the idea of 
\family sans 
forindex
\family default 
.
 Many thanks to 
\shape smallcaps 
Oscar Lopez
\shape default 
 for this very good program.
\layout Standard

Some questions sent by 
\shape smallcaps 
Carlo Pellegrino
\shape default 
 (Modena University, Italy) gave me the idea of transforming a very rudimentary
 script into a general purpose utility.
 
\shape smallcaps 
Maurizio Loreti
\shape default 
 (Padua University, Italy) sent me very useful remarks on the problems of
 automatical generations of indexes, which I made use of in the introduction
 to this text.
\layout Standard

My warmest thanks to 
\shape smallcaps 
Phil Budne
\shape default 
 (
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
url{phil@ultimate.com}
\end_inset 

) for making his excellent CSNOBOL available.
 Many thanks to the community of Snobol users, particularly to the members
 of the list 
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
url{snobol4@mercury.dsu.edu}
\end_inset 

, and, among them, to 
\shape smallcaps 
Gordon Peterson
\shape default 
 (
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
url{http://personal.terabites.com/}
\end_inset 

), 
\shape smallcaps 
Michael Radow
\shape default 
 (
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
url{mikeradow@yahoo.com}
\end_inset 

), 
\shape smallcaps 
Gregory L.
 White
\shape default 
 (
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
url{glwhite@netconnect.com.au}
\end_inset 

) and to 
\shape smallcaps 
Rafal M.
 Sulejman
\shape default 
 (
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
url{rafal@engelsinfo.de}
\end_inset 

) whose 
\family sans 
vim
\family default 
 syntax files are a daily blessing.
\layout Standard

Thanks to Jim Hefferon <
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
url{ftpmaint@alan.smcvt.edu}
\end_inset 

> who pointed out that the original name of the package,
\family sans 
 4index
\family default 
, was not acceptable due to XML syntax rules.
\layout Section

Author, copyright, license, disclaimer
\layout Standard

This program is Copyright 
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
copyright{}
\end_inset 

 2005
\newline 
 Guido Milanese <
\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
url{guido.milanese@unicatt.it}
\end_inset 

>
\newline 
 under the terms of the GNU General Public License.
\newline 

\layout Standard


\size footnotesize 

\begin_inset ERT
status Collapsed

\layout Standard

\backslash 
begin{verbatim}
\newline 
This program is free software; you can redistribute it and/or modify it under
\newline 
the terms of the GNU General Public License as published by the Free Software
\newline 
Foundation; either version 2 of the License, or (at your option) any later
\newline 
version.
\newline 

\newline 
This program is distributed in the hope that it will be useful, but WITHOUT ANY
\newline 
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
\newline 
PARTICULAR PURPOSE. See the GNU General Public License for more details.
\newline 

\newline 
If you do not have a copy of the GNU General Public License write to the Free
\newline 
Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
\newline 

\newline 
If the author of this software was too lazy to include the full GPL text along
\newline 
with the code, you can find it at: http://www.gnu.org/copyleft/gpl.html.
\newline 

\backslash 
end{verbatim}
\end_inset 

 
\the_end