Skip to content
ohboyohboyohboy edited this page Sep 13, 2010 · 12 revisions

To recognize a language, parsers and lexers must be able to iterate through its input sequentially. They must be able to move forward and backward from some current position, like a cursor. While the responsibility of tracking the current position of some input (such as a String or an Array of tokens) and fetching portions of the input could be built into the recognizers, doing so strictly limits the level of control you have over what the recognizer receives. Instead, ANTLR code abstracts the task of iterating through the input to an external stream object.

Stream Types

ANTLR can generate three different flavors of recognizer from a grammar: Lexer, Parser, and TreeParser. Fundamentally, each of these three types concerns itself with the sequence of three different types of input symbols: characters, tokens, and tree nodes. Thus, the Stream class hierarchy is based upon three basic stream types.

Recognizer Input Symbol Stream
lexer character CharacterStream
parser token TokenStream
tree tree node TreeNodeStream

Attributes and Operations Common to All Streams

Stream Attributes

  • position: indicates current position of the stream relative to the input symbols — such as a character index of a string or an array index of a list of tokens.
  • size: the total number of input symbols contained within the stream
  • source_name: an optional name describing where the input was taken from — usually a file name

Navigation

  • peek(n)
  • look(n)
  • consume
  • seek(pos)

Memory / Backtracking

mark
rewind
release

Character Streams

  • module ANTLR3::CharacterStream
    • class ANTLR3::StringStream
    • class ANTLR3::FileStream

Token Streams

  • module ANTLR3::TokenStream
    • class ANTLR3::CommonTokenStream
    • class ANTLR3::TokenRewriteStream