yasm/tools/re2c/re2c.1

.ds re \fBre2c\fP
.ds le \fBlex\fP
.ds rx regular expression
.ds lx \fIl\fP-expression
.TH RE2C 1 "8 April 1994" "Version 0.5"
\"$Log: re2c.1,v $
\"Revision 1.1  2002/04/07 22:27:06  peter
\"Initial revision
\"
\"Revision 1.2  1994/04/16  15:50:32  peterr
\"Fix bug in simple example.
\"
\"Revision 1.1  1994/04/08  15:39:09  peterr
\"Initial revision
\"
.SH NAME
re2c \- convert regular expressions to C/C++

.SH SYNOPSIS
\*(re [\fB-esb\fP] \fIname\fP

.SH DESCRIPTION
\*(re is a preprocessor that generates C-based recognizers from regular
expressions.
The input to \*(re consists of C/C++ source interleaved with
comments of the form \fC/*!re2c\fP ... \fC*/\fP which contain
scanner specifications.
In the output these comments are replaced with code that, when
executed, will find the next input token and then execute
some user-supplied token-specific code.

For example, given the following code

.in +3
.nf
#define NULL            ((char*) 0)
char *scan(char *p){
char *q;
#define YYCTYPE         char
#define YYCURSOR        p
#define YYLIMIT         p
#define YYMARKER        q
#define YYFILL(n)
/*!re2c
        [0-9]+          {return YYCURSOR;}
        [\\000-\\377]     {return NULL;}
*/
}
.fi
.in -3

\*(re will generate

.in +3
.nf
/* Generated by re2c on Sat Apr 16 11:40:58 1994 */
#line 1 "simple.re"
#define NULL            ((char*) 0)
char *scan(char *p){
char *q;
#define YYCTYPE         char
#define YYCURSOR        p
#define YYLIMIT         p
#define YYMARKER        q
#define YYFILL(n)
{
        YYCTYPE yych;
        unsigned int yyaccept;
        goto yy0;
yy1:    ++YYCURSOR;
yy0:
        if((YYLIMIT - YYCURSOR) < 2) YYFILL(2);
        yych = *YYCURSOR;
        if(yych <= '/') goto yy4;
        if(yych >= ':') goto yy4;
yy2:    yych = *++YYCURSOR;
        goto yy7;
yy3:
#line 10
        {return YYCURSOR;}
yy4:    yych = *++YYCURSOR;
yy5:
#line 11
        {return NULL;}
yy6:    ++YYCURSOR;
        if(YYLIMIT == YYCURSOR) YYFILL(1);
        yych = *YYCURSOR;
yy7:    if(yych <= '/') goto yy3;
        if(yych <= '9') goto yy6;
        goto yy3;
}
#line 12

}
.fi
.in -3

.SH OPTIONS
\*(re provides the following options:
.TP
\fB-e\fP
Cross-compile from an ASCII platform to an EBCDIC one. 
.TP
\fB-s\fP
Generate nested \fCif\fPs for some \fCswitch\fPes.  Many compilers need this
assist to generate better code.
.TP
\fB-b\fP
Implies \fB-s\fP.  Use bit vectors as well in the attempt to coax better
code out of the compiler.  Most useful for specifications with more than a
few keywords (e.g. for most programming languages).

.SH "INTERFACE CODE"
Unlike other scanner generators, \*(re does not generate complete scanners:
the user must supply some interface code.
In particular, the user must define the following macros:
.TP
\fCYYCHAR\fP
Type used to hold an input symbol.
Usually \fCchar\fP or \fCunsigned char\fP.
.TP
\fCYYCURSOR\fP
\*(lx of type \fC*YYCHAR\fP that points to the current input symbol.
The generated code advances \fCYYCURSOR\fP as symbols are matched.
On entry, \fCYYCURSOR\fP is assumed to point to the first character of the
current token.  On exit, \fCYYCURSOR\fP will point to the first character of
the following token.
.TP
\fCYLIMIT\fP
Expression of type \fC*YYCHAR\fP that marks the end of the buffer
(\fCYLIMIT[-1]\fP is the last character in the buffer).
The generated code repeatedly compares \fCYYCURSOR\fP to \fCYLIMIT\fP
to determine when the buffer needs (re)filling.
.TP
\fCYYMARKER\fP
\*(lx of type \fC*YYCHAR\fP.
The generated code saves backtracking information in \fCYYMARKER\fP.
.TP
\fCYYFILL(\fP\fIn\fP\fC)\fP
The generated code "calls" \fCYYFILL\fP when the buffer needs
(re)filling:  at least \fIn\fP additional characters should
be provided.  \fCYYFILL\fP should adjust \fCYYCURSOR\fP, \fCYYLIMIT\fP and
\fCYYMARKER\fP as needed.  Note that for typical programming languages
\fIn\fP will be the length of the longest keyword plus one.

.SH "SCANNER SPECIFICATIONS"
Each scanner specification consists of a set of \fIrules\fP and name
definitions.
Rules consist of a regular expression along with a block of C/C++ code that
is to be executed when the associated regular expression is matched.
Name definitions are of the form
``\fIname\fP \fC=\fP \fIregular expression\fP\fC;\fP''.

.SH "SUMMARY OF RE2C REGULAR EXPRESSIONS"
.TP
\fC"foo"\fP
the literal string \fCfoo\fP.
ANSI-C escape sequences can be used.
.TP
\fC[xyz]\fP
a "character class"; in this case,
the \*(rx matches either an '\fCx\fP', a '\fCy\fP', or a '\fCz\fP'.
.TP
\fC[abj-oZ]\fP
a "character class" with a range in it;
matches an '\fCa\fP', a '\fCb\fP', any letter from '\fCj\fP' through '\fCo\fP',
or a '\fCZ\fP'.
.TP
\fIr\fP\fC\e\fP\fIs\fP
match any \fIr\fP which isn't an \fIs\fP. \fIr\fP and \fIs\fP must be regular expressions
which can be expressed as character classes.
.TP
\fIr\fP\fC*\fP
zero or more \fIr\fP's, where \fIr\fP is any regular expression
.TP
\fC\fIr\fP\fC+\fP
one or more \fIr\fP's
.TP
\fC\fIr\fP\fC?\fP
zero or one \fIr\fP's (that is, "an optional \fIr\fP")
.TP
name
the expansion of the "name" definition (see above)
.TP
\fC(\fP\fIr\fP\fC)\fP
an \fIr\fP; parentheses are used to override precedence
(see below)
.TP
\fIrs\fP
an \fIr\fP followed by an \fIs\fP ("concatenation")
.TP
\fIr\fP\fC|\fP\fIs\fP
either an \fIr\fP or an \fIs\fP
.TP
\fIr\fP\fC/\fP\fIs\fP
an \fIr\fP but only if it is followed by an \fIs\fP. The s is not part of
the matched text. This type of \*(rx is called "trailing context".
.LP
The regular expressions listed above are grouped according to
precedence, from highest precedence at the top to lowest at the bottom.
Those grouped together have equal precedence.

.SH "A LARGER EXAMPLE"
.LP
.in +3
.nf
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <string.h>

#define ADDEQ   257
#define ANDAND  258
#define ANDEQ   259
#define ARRAY   260
#define ASM     261
#define AUTO    262
#define BREAK   263
#define CASE    264
#define CHAR    265
#define CONST   266
#define CONTINUE        267
#define DECR    268
#define DEFAULT 269
#define DEREF   270
#define DIVEQ   271
#define DO      272
#define DOUBLE  273
#define ELLIPSIS        274
#define ELSE    275
#define ENUM    276
#define EQL     277
#define EXTERN  278
#define FCON    279
#define FLOAT   280
#define FOR     281
#define FUNCTION        282
#define GEQ     283
#define GOTO    284
#define ICON    285
#define ID      286
#define IF      287
#define INCR    288
#define INT     289
#define LEQ     290
#define LONG    291
#define LSHIFT  292
#define LSHIFTEQ        293
#define MODEQ   294
#define MULEQ   295
#define NEQ     296
#define OREQ    297
#define OROR    298
#define POINTER 299
#define REGISTER        300
#define RETURN  301
#define RSHIFT  302
#define RSHIFTEQ        303
#define SCON    304
#define SHORT   305
#define SIGNED  306
#define SIZEOF  307
#define STATIC  308
#define STRUCT  309
#define SUBEQ   310
#define SWITCH  311
#define TYPEDEF 312
#define UNION   313
#define UNSIGNED        314
#define VOID    315
#define VOLATILE        316
#define WHILE   317
#define XOREQ   318
#define EOI     319

typedef unsigned int uint;
typedef unsigned char uchar;

#define BSIZE   8192

#define YYCTYPE         uchar
#define YYCURSOR        cursor
#define YYLIMIT         s->lim
#define YYMARKER        s->ptr
#define YYFILL(n)       {cursor = fill(s, cursor);}

#define RET(i)  {s->cur = cursor; return i;}

typedef struct Scanner {
    int                 fd;
    uchar               *bot, *tok, *ptr, *cur, *pos, *lim, *top, *eof;
    uint                line;
} Scanner;

uchar *fill(Scanner *s, uchar *cursor){
    if(!s->eof){
        uint cnt = s->tok - s->bot;
        if(cnt){
            memcpy(s->bot, s->tok, s->lim - s->tok);
            s->tok = s->bot;
            s->ptr -= cnt;
            cursor -= cnt;
            s->pos -= cnt;
            s->lim -= cnt;
        }
        if((s->top - s->lim) < BSIZE){
            uchar *buf = (uchar*)
                malloc(((s->lim - s->bot) + BSIZE)*sizeof(uchar));
            memcpy(buf, s->tok, s->lim - s->tok);
            s->tok = buf;
            s->ptr = &buf[s->ptr - s->bot];
            cursor = &buf[cursor - s->bot];
            s->pos = &buf[s->pos - s->bot];
            s->lim = &buf[s->lim - s->bot];
            s->top = &s->lim[BSIZE];
            free(s->bot);
            s->bot = buf;
        }
        if((cnt = read(s->fd, (char*) s->lim, BSIZE)) != BSIZE){
            s->eof = &s->lim[cnt]; *(s->eof)++ = '\\n';
        }
        s->lim += cnt;
    }
    return cursor;
}

int scan(Scanner *s){
        uchar *cursor = s->cur;
std:
        s->tok = cursor;
/*!re2c
any     = [\\000-\\377];
O       = [0-7];
D       = [0-9];
L       = [a-zA-Z_];
H       = [a-fA-F0-9];
E       = [Ee] [+-]? D+;
FS      = [fFlL];
IS      = [uUlL]*;
ESC     = [\\\\] ([abfnrtv?'"\\\\] | "x" H+ | O+);
*/

/*!re2c
        "/*"                    { goto comment; }
        
        "auto"                  { RET(AUTO); }
        "break"                 { RET(BREAK); }
        "case"                  { RET(CASE); }
        "char"                  { RET(CHAR); }
        "const"                 { RET(CONST); }
        "continue"              { RET(CONTINUE); }
        "default"               { RET(DEFAULT); }
        "do"                    { RET(DO); }
        "double"                { RET(DOUBLE); }
        "else"                  { RET(ELSE); }
        "enum"                  { RET(ENUM); }
        "extern"                { RET(EXTERN); }
        "float"                 { RET(FLOAT); }
        "for"                   { RET(FOR); }
        "goto"                  { RET(GOTO); }
        "if"                    { RET(IF); }
        "int"                   { RET(INT); }
        "long"                  { RET(LONG); }
        "register"              { RET(REGISTER); }
        "return"                { RET(RETURN); }
        "short"                 { RET(SHORT); }
        "signed"                { RET(SIGNED); }
        "sizeof"                { RET(SIZEOF); }
        "static"                { RET(STATIC); }
        "struct"                { RET(STRUCT); }
        "switch"                { RET(SWITCH); }
        "typedef"               { RET(TYPEDEF); }
        "union"                 { RET(UNION); }
        "unsigned"              { RET(UNSIGNED); }
        "void"                  { RET(VOID); }
        "volatile"              { RET(VOLATILE); }
        "while"                 { RET(WHILE); }
        
        L (L|D)*                { RET(ID); }
        
        ("0" [xX] H+ IS?) | ("0" D+ IS?) | (D+ IS?) |
        (['] (ESC|any\\[\\n\\\\'])* ['])
                                { RET(ICON); }
        
        (D+ E FS?) | (D* "." D+ E? FS?) | (D+ "." D* E? FS?)
                                { RET(FCON); }
        
        (["] (ESC|any\\[\\n\\\\"])* ["])
                                { RET(SCON); }
        
        "..."                   { RET(ELLIPSIS); }
        ">>="                   { RET(RSHIFTEQ); }
        "<<="                   { RET(LSHIFTEQ); }
        "+="                    { RET(ADDEQ); }
        "-="                    { RET(SUBEQ); }
        "*="                    { RET(MULEQ); }
        "/="                    { RET(DIVEQ); }
        "%="                    { RET(MODEQ); }
        "&="                    { RET(ANDEQ); }
        "^="                    { RET(XOREQ); }
        "|="                    { RET(OREQ); }
        ">>"                    { RET(RSHIFT); }
        "<<"                    { RET(LSHIFT); }
        "++"                    { RET(INCR); }
        "--"                    { RET(DECR); }
        "->"                    { RET(DEREF); }
        "&&"                    { RET(ANDAND); }
        "||"                    { RET(OROR); }
        "<="                    { RET(LEQ); }
        ">="                    { RET(GEQ); }
        "=="                    { RET(EQL); }
        "!="                    { RET(NEQ); }
        ";"                     { RET(';'); }
        "{"                     { RET('{'); }
        "}"                     { RET('}'); }
        ","                     { RET(','); }
        ":"                     { RET(':'); }
        "="                     { RET('='); }
        "("                     { RET('('); }
        ")"                     { RET(')'); }
        "["                     { RET('['); }
        "]"                     { RET(']'); }
        "."                     { RET('.'); }
        "&"                     { RET('&'); }
        "!"                     { RET('!'); }
        "~"                     { RET('~'); }
        "-"                     { RET('-'); }
        "+"                     { RET('+'); }
        "*"                     { RET('*'); }
        "/"                     { RET('/'); }
        "%"                     { RET('%'); }
        "<"                     { RET('<'); }
        ">"                     { RET('>'); }
        "^"                     { RET('^'); }
        "|"                     { RET('|'); }
        "?"                     { RET('?'); }


        [ \\t\\v\\f]+           { goto std; }

        "\\n"
            {
                if(cursor == s->eof) RET(EOI);
                s->pos = cursor; s->line++;
                goto std;
            }

        any
            {
                printf("unexpected character: %c\\n", *s->tok);
                goto std;
            }
*/

comment:
/*!re2c
        "*/"                    { goto std; }
        "\\n"
            {
                if(cursor == s->eof) RET(EOI);
                s->tok = s->pos = cursor; s->line++;
                goto comment;
            }
        any                     { goto comment; }
*/
}

main(){
    Scanner in;
    int t;
    memset((char*) &in, 0, sizeof(in));
    in.fd = 0;
    while((t = scan(&in)) != EOI){
/*
        printf("%d\\t%.*s\\n", t, in.cur - in.tok, in.tok);
        printf("%d\\n", t);
*/
    }
    close(in.fd);
}
.fi
.in -3

.SH "SEE ALSO"
.LP
flex(1), lex(1).

.SH FEATURES
.LP
\*(re does not provide a default action:
the generated code assumes that the input
will consist of a sequence of tokens.
Typically this can be dealt with by adding a rule such as the one for
unexpected characters in the example above.
.LP
The user must arrange for a sentinel token to appear at the end of input
(and provide a rule for matching it):
\*(re does not provide an \fC<<EOF>>\fP expression.
If the source is from a null-byte terminated string, a
rule matching a null character will suffice.  If the source is from a
file then the approach taken in the example can be used: pad the input with
a newline (or some other character that can't appear within another token);
upon recognizing such a character check to see if it is the sentinel
and act accordingly.
.LP
\*(re does not provide start conditions:  use a separate scanner
specification for each start condition (as illustrated in the above example).
.LP
No [^x].  Use difference instead.
.SH BUGS
.LP
Only fixed length trailing context can be handled.
.LP
The maximum value appearing as a parameter \fIn\fP to \fCYYFILL\fP is not
provided to the generated code (this value is needed for constructing
the interface code).
Note that this value is usually relatively small: for
typical programming languages \fIn\fP will be the length of the longest
keyword plus one.
.LP
Difference only works for character sets.
.LP
The \*(re internal algorithms need documentation.

.SH AUTHOR
.LP
Please send bug reports, fixes and feedback to:
.LP
.nf
Peter Bumbulis
Computer Systems Group
University of Waterloo
Waterloo, Ontario
N2L 3G1
Internet:  peterr@csg.uwaterloo.ca
.fi
Initial revision svn path=/trunk/yasm/; revision=573 23 years ago			`.ds re \fBre2c\fP`
			`.ds le \fBlex\fP`
			`.ds rx regular expression`
			`.ds lx \fIl\fP-expression`
			`.TH RE2C 1 "8 April 1994" "Version 0.5"`
			`\"$Log: re2c.1,v $`
			`\"Revision 1.1 2002/04/07 22:27:06 peter`
			`\"Initial revision`
			`\"`
			`\"Revision 1.2 1994/04/16 15:50:32 peterr`
			`\"Fix bug in simple example.`
			`\"`
			`\"Revision 1.1 1994/04/08 15:39:09 peterr`
			`\"Initial revision`
			`\"`
			`.SH NAME`
			`re2c \- convert regular expressions to C/C++`

			`.SH SYNOPSIS`
			`\*(re [\fB-esb\fP] \fIname\fP`

			`.SH DESCRIPTION`
			`\*(re is a preprocessor that generates C-based recognizers from regular`
			`expressions.`
			`The input to \*(re consists of C/C++ source interleaved with`
			`comments of the form \fC/!re2c\fP ... \fC/\fP which contain`
			`scanner specifications.`
			`In the output these comments are replaced with code that, when`
			`executed, will find the next input token and then execute`
			`some user-supplied token-specific code.`

			`For example, given the following code`

			`.in +3`
			`.nf`
			`#define NULL ((char*) 0)`
			`char scan(char p){`
			`char *q;`
			`#define YYCTYPE char`
			`#define YYCURSOR p`
			`#define YYLIMIT p`
			`#define YYMARKER q`
			`#define YYFILL(n)`
			`/*!re2c`
			`[0-9]+ {return YYCURSOR;}`
			`[\\000-\\377] {return NULL;}`
			`*/`
			`}`
			`.fi`
			`.in -3`

			`\*(re will generate`

			`.in +3`
			`.nf`
			`/* Generated by re2c on Sat Apr 16 11:40:58 1994 */`
			`#line 1 "simple.re"`
			`#define NULL ((char*) 0)`
			`char scan(char p){`
			`char *q;`
			`#define YYCTYPE char`
			`#define YYCURSOR p`
			`#define YYLIMIT p`
			`#define YYMARKER q`
			`#define YYFILL(n)`
			`{`
			`YYCTYPE yych;`
			`unsigned int yyaccept;`
			`goto yy0;`
			`yy1: ++YYCURSOR;`
			`yy0:`
			`if((YYLIMIT - YYCURSOR) < 2) YYFILL(2);`
			`yych = *YYCURSOR;`
			`if(yych <= '/') goto yy4;`
			`if(yych >= ':') goto yy4;`
			`yy2: yych = *++YYCURSOR;`
			`goto yy7;`
			`yy3:`
			`#line 10`
			`{return YYCURSOR;}`
			`yy4: yych = *++YYCURSOR;`
			`yy5:`
			`#line 11`
			`{return NULL;}`
			`yy6: ++YYCURSOR;`
			`if(YYLIMIT == YYCURSOR) YYFILL(1);`
			`yych = *YYCURSOR;`
			`yy7: if(yych <= '/') goto yy3;`
			`if(yych <= '9') goto yy6;`
			`goto yy3;`
			`}`
			`#line 12`

			`}`
			`.fi`
			`.in -3`

			`.SH OPTIONS`
			`\*(re provides the following options:`
			`.TP`
			`\fB-e\fP`
			`Cross-compile from an ASCII platform to an EBCDIC one.`
			`.TP`
			`\fB-s\fP`
			`Generate nested \fCif\fPs for some \fCswitch\fPes. Many compilers need this`
			`assist to generate better code.`
			`.TP`
			`\fB-b\fP`
			`Implies \fB-s\fP. Use bit vectors as well in the attempt to coax better`
			`code out of the compiler. Most useful for specifications with more than a`
			`few keywords (e.g. for most programming languages).`

			`.SH "INTERFACE CODE"`
			`Unlike other scanner generators, \*(re does not generate complete scanners:`
			`the user must supply some interface code.`
			`In particular, the user must define the following macros:`
			`.TP`
			`\fCYYCHAR\fP`
			`Type used to hold an input symbol.`
			`Usually \fCchar\fP or \fCunsigned char\fP.`
			`.TP`
			`\fCYYCURSOR\fP`
			`\(lx of type \fCYYCHAR\fP that points to the current input symbol.`
			`The generated code advances \fCYYCURSOR\fP as symbols are matched.`
			`On entry, \fCYYCURSOR\fP is assumed to point to the first character of the`
			`current token. On exit, \fCYYCURSOR\fP will point to the first character of`
			`the following token.`
			`.TP`
			`\fCYLIMIT\fP`
			`Expression of type \fC*YYCHAR\fP that marks the end of the buffer`
			`(\fCYLIMIT[-1]\fP is the last character in the buffer).`
			`The generated code repeatedly compares \fCYYCURSOR\fP to \fCYLIMIT\fP`
			`to determine when the buffer needs (re)filling.`
			`.TP`
			`\fCYYMARKER\fP`
			`\(lx of type \fCYYCHAR\fP.`
			`The generated code saves backtracking information in \fCYYMARKER\fP.`
			`.TP`
			`\fCYYFILL(\fP\fIn\fP\fC)\fP`
			`The generated code "calls" \fCYYFILL\fP when the buffer needs`
			`(re)filling: at least \fIn\fP additional characters should`
			`be provided. \fCYYFILL\fP should adjust \fCYYCURSOR\fP, \fCYYLIMIT\fP and`
			`\fCYYMARKER\fP as needed. Note that for typical programming languages`
			`\fIn\fP will be the length of the longest keyword plus one.`

			`.SH "SCANNER SPECIFICATIONS"`
			`Each scanner specification consists of a set of \fIrules\fP and name`
			`definitions.`
			`Rules consist of a regular expression along with a block of C/C++ code that`
			`is to be executed when the associated regular expression is matched.`
			`Name definitions are of the form`
			``\fIname\fP \fC=\fP \fIregular expression\fP\fC;\fP''.

			`.SH "SUMMARY OF RE2C REGULAR EXPRESSIONS"`
			`.TP`
			`\fC"foo"\fP`
			`the literal string \fCfoo\fP.`
			`ANSI-C escape sequences can be used.`
			`.TP`
			`\fC[xyz]\fP`
			`a "character class"; in this case,`
			`the \*(rx matches either an '\fCx\fP', a '\fCy\fP', or a '\fCz\fP'.`
			`.TP`
			`\fC[abj-oZ]\fP`
			`a "character class" with a range in it;`
			`matches an '\fCa\fP', a '\fCb\fP', any letter from '\fCj\fP' through '\fCo\fP',`
			`or a '\fCZ\fP'.`
			`.TP`
			`\fIr\fP\fC\e\fP\fIs\fP`
			`match any \fIr\fP which isn't an \fIs\fP. \fIr\fP and \fIs\fP must be regular expressions`
			`which can be expressed as character classes.`
			`.TP`
			`\fIr\fP\fC*\fP`
			`zero or more \fIr\fP's, where \fIr\fP is any regular expression`
			`.TP`
			`\fC\fIr\fP\fC+\fP`
			`one or more \fIr\fP's`
			`.TP`
			`\fC\fIr\fP\fC?\fP`
			`zero or one \fIr\fP's (that is, "an optional \fIr\fP")`
			`.TP`
			`name`
			`the expansion of the "name" definition (see above)`
			`.TP`
			`\fC(\fP\fIr\fP\fC)\fP`
			`an \fIr\fP; parentheses are used to override precedence`
			`(see below)`
			`.TP`
			`\fIrs\fP`
			`an \fIr\fP followed by an \fIs\fP ("concatenation")`
			`.TP`
			`\fIr\fP\fC\|\fP\fIs\fP`
			`either an \fIr\fP or an \fIs\fP`
			`.TP`
			`\fIr\fP\fC/\fP\fIs\fP`
			`an \fIr\fP but only if it is followed by an \fIs\fP. The s is not part of`
			`the matched text. This type of \*(rx is called "trailing context".`
			`.LP`
			`The regular expressions listed above are grouped according to`
			`precedence, from highest precedence at the top to lowest at the bottom.`
			`Those grouped together have equal precedence.`

			`.SH "A LARGER EXAMPLE"`
			`.LP`
			`.in +3`
			`.nf`
			`#include <stdlib.h>`
			`#include <stdio.h>`
			`#include <fcntl.h>`
			`#include <string.h>`

			`#define ADDEQ 257`
			`#define ANDAND 258`
			`#define ANDEQ 259`
			`#define ARRAY 260`
			`#define ASM 261`
			`#define AUTO 262`
			`#define BREAK 263`
			`#define CASE 264`
			`#define CHAR 265`
			`#define CONST 266`
			`#define CONTINUE 267`
			`#define DECR 268`
			`#define DEFAULT 269`
			`#define DEREF 270`
			`#define DIVEQ 271`
			`#define DO 272`
			`#define DOUBLE 273`
			`#define ELLIPSIS 274`
			`#define ELSE 275`
			`#define ENUM 276`
			`#define EQL 277`
			`#define EXTERN 278`
			`#define FCON 279`
			`#define FLOAT 280`
			`#define FOR 281`
			`#define FUNCTION 282`
			`#define GEQ 283`
			`#define GOTO 284`
			`#define ICON 285`
			`#define ID 286`
			`#define IF 287`
			`#define INCR 288`
			`#define INT 289`
			`#define LEQ 290`
			`#define LONG 291`
			`#define LSHIFT 292`
			`#define LSHIFTEQ 293`
			`#define MODEQ 294`
			`#define MULEQ 295`
			`#define NEQ 296`
			`#define OREQ 297`
			`#define OROR 298`
			`#define POINTER 299`
			`#define REGISTER 300`
			`#define RETURN 301`
			`#define RSHIFT 302`
			`#define RSHIFTEQ 303`
			`#define SCON 304`
			`#define SHORT 305`
			`#define SIGNED 306`
			`#define SIZEOF 307`
			`#define STATIC 308`
			`#define STRUCT 309`
			`#define SUBEQ 310`
			`#define SWITCH 311`
			`#define TYPEDEF 312`
			`#define UNION 313`
			`#define UNSIGNED 314`
			`#define VOID 315`
			`#define VOLATILE 316`
			`#define WHILE 317`
			`#define XOREQ 318`
			`#define EOI 319`

			`typedef unsigned int uint;`
			`typedef unsigned char uchar;`

			`#define BSIZE 8192`

			`#define YYCTYPE uchar`
			`#define YYCURSOR cursor`
			`#define YYLIMIT s->lim`
			`#define YYMARKER s->ptr`
			`#define YYFILL(n) {cursor = fill(s, cursor);}`

			`#define RET(i) {s->cur = cursor; return i;}`

			`typedef struct Scanner {`
			`int fd;`
			`uchar bot, tok, ptr, cur, pos, lim, top, eof;`
			`uint line;`
			`} Scanner;`

			`uchar fill(Scanner s, uchar *cursor){`
			`if(!s->eof){`
			`uint cnt = s->tok - s->bot;`
			`if(cnt){`
			`memcpy(s->bot, s->tok, s->lim - s->tok);`
			`s->tok = s->bot;`
			`s->ptr -= cnt;`
			`cursor -= cnt;`
			`s->pos -= cnt;`
			`s->lim -= cnt;`
			`}`
			`if((s->top - s->lim) < BSIZE){`
			`uchar buf = (uchar)`
			`malloc(((s->lim - s->bot) + BSIZE)*sizeof(uchar));`
			`memcpy(buf, s->tok, s->lim - s->tok);`
			`s->tok = buf;`
			`s->ptr = &buf[s->ptr - s->bot];`
			`cursor = &buf[cursor - s->bot];`
			`s->pos = &buf[s->pos - s->bot];`
			`s->lim = &buf[s->lim - s->bot];`
			`s->top = &s->lim[BSIZE];`
			`free(s->bot);`
			`s->bot = buf;`
			`}`
			`if((cnt = read(s->fd, (char*) s->lim, BSIZE)) != BSIZE){`
			`s->eof = &s->lim[cnt]; *(s->eof)++ = '\\n';`
			`}`
			`s->lim += cnt;`
			`}`
			`return cursor;`
			`}`

			`int scan(Scanner *s){`
			`uchar *cursor = s->cur;`
			`std:`
			`s->tok = cursor;`
			`/*!re2c`
			`any = [\\000-\\377];`
			`O = [0-7];`
			`D = [0-9];`
			`L = [a-zA-Z_];`
			`H = [a-fA-F0-9];`
			`E = [Ee] [+-]? D+;`
			`FS = [fFlL];`
			`IS = [uUlL]*;`
			`ESC = [\\\\] ([abfnrtv?'"\\\\] \| "x" H+ \| O+);`
			`*/`

			`/*!re2c`
			`"/*" { goto comment; }`

			`"auto" { RET(AUTO); }`
			`"break" { RET(BREAK); }`
			`"case" { RET(CASE); }`
			`"char" { RET(CHAR); }`
			`"const" { RET(CONST); }`
			`"continue" { RET(CONTINUE); }`
			`"default" { RET(DEFAULT); }`
			`"do" { RET(DO); }`
			`"double" { RET(DOUBLE); }`
			`"else" { RET(ELSE); }`
			`"enum" { RET(ENUM); }`
			`"extern" { RET(EXTERN); }`
			`"float" { RET(FLOAT); }`
			`"for" { RET(FOR); }`
			`"goto" { RET(GOTO); }`
			`"if" { RET(IF); }`
			`"int" { RET(INT); }`
			`"long" { RET(LONG); }`
			`"register" { RET(REGISTER); }`
			`"return" { RET(RETURN); }`
			`"short" { RET(SHORT); }`
			`"signed" { RET(SIGNED); }`
			`"sizeof" { RET(SIZEOF); }`
			`"static" { RET(STATIC); }`
			`"struct" { RET(STRUCT); }`
			`"switch" { RET(SWITCH); }`
			`"typedef" { RET(TYPEDEF); }`
			`"union" { RET(UNION); }`
			`"unsigned" { RET(UNSIGNED); }`
			`"void" { RET(VOID); }`
			`"volatile" { RET(VOLATILE); }`
			`"while" { RET(WHILE); }`

			`L (L\|D)* { RET(ID); }`

			`("0" [xX] H+ IS?) \| ("0" D+ IS?) \| (D+ IS?) \|`
			`(['] (ESC\|any\\[\\n\\\\'])* ['])`
			`{ RET(ICON); }`

			`(D+ E FS?) \| (D* "." D+ E? FS?) \| (D+ "." D* E? FS?)`
			`{ RET(FCON); }`

			`(["] (ESC\|any\\[\\n\\\\"])* ["])`
			`{ RET(SCON); }`

			`"..." { RET(ELLIPSIS); }`
			`">>=" { RET(RSHIFTEQ); }`
			`"<<=" { RET(LSHIFTEQ); }`
			`"+=" { RET(ADDEQ); }`
			`"-=" { RET(SUBEQ); }`
			`"*=" { RET(MULEQ); }`
			`"/=" { RET(DIVEQ); }`
			`"%=" { RET(MODEQ); }`
			`"&=" { RET(ANDEQ); }`
			`"^=" { RET(XOREQ); }`
			`"\|=" { RET(OREQ); }`
			`">>" { RET(RSHIFT); }`
			`"<<" { RET(LSHIFT); }`
			`"++" { RET(INCR); }`
			`"--" { RET(DECR); }`
			`"->" { RET(DEREF); }`
			`"&&" { RET(ANDAND); }`
			`"\|\|" { RET(OROR); }`
			`"<=" { RET(LEQ); }`
			`">=" { RET(GEQ); }`
			`"==" { RET(EQL); }`
			`"!=" { RET(NEQ); }`
			`";" { RET(';'); }`
			`"{" { RET('{'); }`
			`"}" { RET('}'); }`
			`"," { RET(','); }`
			`":" { RET(':'); }`
			`"=" { RET('='); }`
			`"(" { RET('('); }`
			`")" { RET(')'); }`
			`"[" { RET('['); }`
			`"]" { RET(']'); }`
			`"." { RET('.'); }`
			`"&" { RET('&'); }`
			`"!" { RET('!'); }`
			`"~" { RET('~'); }`
			`"-" { RET('-'); }`
			`"+" { RET('+'); }`
			`"" { RET(''); }`
			`"/" { RET('/'); }`
			`"%" { RET('%'); }`
			`"<" { RET('<'); }`
			`">" { RET('>'); }`
			`"^" { RET('^'); }`
			`"\|" { RET('\|'); }`
			`"?" { RET('?'); }`


			`[ \\t\\v\\f]+ { goto std; }`

			`"\\n"`
			`{`
			`if(cursor == s->eof) RET(EOI);`
			`s->pos = cursor; s->line++;`
			`goto std;`
			`}`

			`any`
			`{`
			`printf("unexpected character: %c\\n", *s->tok);`
			`goto std;`
			`}`
			`*/`

			`comment:`
			`/*!re2c`
			`"*/" { goto std; }`
			`"\\n"`
			`{`
			`if(cursor == s->eof) RET(EOI);`
			`s->tok = s->pos = cursor; s->line++;`
			`goto comment;`
			`}`
			`any { goto comment; }`
			`*/`
			`}`

			`main(){`
			`Scanner in;`
			`int t;`
			`memset((char*) &in, 0, sizeof(in));`
			`in.fd = 0;`
			`while((t = scan(&in)) != EOI){`
			`/*`
			`printf("%d\\t%.*s\\n", t, in.cur - in.tok, in.tok);`
			`printf("%d\\n", t);`
			`*/`
			`}`
			`close(in.fd);`
			`}`
			`.fi`
			`.in -3`

			`.SH "SEE ALSO"`
			`.LP`
			`flex(1), lex(1).`

			`.SH FEATURES`
			`.LP`
			`\*(re does not provide a default action:`
			`the generated code assumes that the input`
			`will consist of a sequence of tokens.`
			`Typically this can be dealt with by adding a rule such as the one for`
			`unexpected characters in the example above.`
			`.LP`
			`The user must arrange for a sentinel token to appear at the end of input`
			`(and provide a rule for matching it):`
			`\*(re does not provide an \fC<<EOF>>\fP expression.`
			`If the source is from a null-byte terminated string, a`
			`rule matching a null character will suffice. If the source is from a`
			`file then the approach taken in the example can be used: pad the input with`
			`a newline (or some other character that can't appear within another token);`
			`upon recognizing such a character check to see if it is the sentinel`
			`and act accordingly.`
			`.LP`
			`\*(re does not provide start conditions: use a separate scanner`
			`specification for each start condition (as illustrated in the above example).`
			`.LP`
			`No [^x]. Use difference instead.`
			`.SH BUGS`
			`.LP`
			`Only fixed length trailing context can be handled.`
			`.LP`
			`The maximum value appearing as a parameter \fIn\fP to \fCYYFILL\fP is not`
			`provided to the generated code (this value is needed for constructing`
			`the interface code).`
			`Note that this value is usually relatively small: for`
			`typical programming languages \fIn\fP will be the length of the longest`
			`keyword plus one.`
			`.LP`
			`Difference only works for character sets.`
			`.LP`
			`The \*(re internal algorithms need documentation.`

			`.SH AUTHOR`
			`.LP`
			`Please send bug reports, fixes and feedback to:`
			`.LP`
			`.nf`
			`Peter Bumbulis`
			`Computer Systems Group`
			`University of Waterloo`
			`Waterloo, Ontario`
			`N2L 3G1`
			`Internet: peterr@csg.uwaterloo.ca`
			`.fi`