Parser()
primary()
unary_expr()
pow_expr()
mul_expr()
add_expr()
assign_expr()
string
having valid
syntax, such as 4.0 * atan(1.0)
, and returns a double
.(2 * 5) ^ 3
to
mean 2 times 5 all raised to the power of 3. However, if C++ provided an expression parser, it would probably
mimic the use of the ^
operator in code; i.e., ^
would be a bitwise exclusive or operator.
If you can write your own parser, you're in control of the syntax (language structure) and semantics (language meaning).
2 + 3
, but
2 + 3 + 4
, and 2 + 3 + 4 + 5
, and in general the addition of any amount of numbers. We
want to devise a minimal set of rules that describes all these expressions. Let's start with three symbols,
add_expr
, PLUS
, and NUMBER
. We want to say that an add_expr
can be a subexpression PLUS
a NUMBER
. We will write
add_expr: add_expr PLUS NUMBER
The above rule is called a production. add_expr:
is called the left hand side (lhs),
and add_expr PLUS NUMBER
is called the right hand side (rhs). Grammar productions are
applied when evaluating a language construct. For instance, 2 + 3 + 4
is supposed to be a valid
construct. We start by saying 2 + 3 + 4
is an add_expr
. This must fit the pattern
add_expr PLUS NUMBER
. If it does, then 2 + 3
must be an add_expr
. It
also must fit the pattern add_expr PLUS NUMBER
. It does if we can say that 2
is an
add_expr
. For this to happen, we need another rule. We need to say that an add_expr
can also be a NUMBER
:
add_expr: NUMBER
The result of applying our two rules recursively can be represented by the following tree:
add_expr | ------------------------- | | | add_expr PLUS NUMBER (4) | ------------------------- | | | add_expr PLUS NUMBER (3) | NUMBER (2)We constructed this tree from the top down, but is evaluated in a preorder sequence. So, the way our grammar is written, this expression will get evaluated from left to right.
add_expr: add_expr PLUS NUMBER
add_expr: NUMBER
These rules have the same left hand side. By convention, we usually write the left hand side once, joining the
right hand sides with the |
symbol to mean "or". We end with a ;
to show that we're
done. Using this convention, our grammar looks like:
add_expr: add_expr PLUS NUMBER
| NUMBER
;
The first rule in our grammar is called a left recursive rule. One property of left recursive rules is
that they specify a left to right evaluation of an expression. If we wanted to specify that expressions should
be evaluated right to left, we would use a right recursive rule:
add_expr: NUMBER PLUS add_expr
| NUMBER
;
2
or a
+
in an expression, but there is no token representing an add_expr
. You have to
compose an add_expr
out of the available tokens.
NUMBER
, we must decide what kind of number we want. For instance, we might want an
integer, or we might want a floating point number. We will use floating point numbers here.e
or an E
+
or -
)
(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))([eE][+-]?[0-9]+)?
add_expr: add_expr PLUS mul_expr
| mul_expr
;
mul_expr: mul_expr MUL NUMBER
| NUMBER
;
The expression 2 + 3 * 4
would produce the following derivation tree:
add_expr | --------------------------------- | | | add_expr PLUS mul_expr | | | -------------- | | | | mul_expr mul_expr MUL NUMBER | | NUMBER NUMBERNotice that when processed in a preorder sequence, the multiplication gets evaluated first. This is precisely what we want. The above grammar illustrates how to express operator precedence in a grammar. Our final grammar will contain a full set of operators and will handle things in addition to numbers such as identifiers and parenthesized expressions.
assign_expr: ID ASSIGN add_expr
| add_expr
;
add_expr: add_expr PLUS mul_expr
| add_expr MINUS mul_expr
| mul_expr
;
mul_expr: mul_expr MUL pow_expr
| mul_expr DIV pow_expr
| mul_expr MOD pow_expr
| pow_expr
;
pow_expr: unary_expr pow_expr_suffix
| unary_expr
;
pow_expr_suffix: POW unary_expr
;
unary_expr: PLUS primary
| MINUS primary
| primary
;
primary: ID
| NUMBER
| LP add_expr RP
| SIN LP add_expr RP
| COS LP add_expr RP
| TAN LP add_expr RP
| ASIN LP add_expr RP
| ACOS LP add_expr RP
| ATAN LP add_expr RP
| LOG LP add_expr RP
| EXP LP add_expr RP
| LOG10 LP add_expr RP
| EXP10 LP add_expr RP
| SQRT LP add_expr RP
| INT LP add_expr RP
;
ID
is a letter followed by zero or more letters or digits. A NUMBER
is the same as
previously described. ASSIGN
, PLUS
, MINUS
, MUL
,
DIV
, MOD
, POW
, LP
, and RP
represent
the symbols = + - * / % ^ ( )
. The rest of the tokens are the function names. They are represent
keywords having the same spelling, but in all lowercase.
add_expr: add_expr PLUS mul_expr
| mul_expr
;
If the current token in our token stream is a number, which right hand side do we choose? We don't know, without
looking ahead further, whether we're starting an addition expression or a multiplication expression. In general,
no grammar containing left recursion can be LL(1). The reason I have used left recursion is that it's more
intuitive than the LL(1) equivalent.
add_expr: mul_expr add_expr_tail
;
add_expr_tail: PLUS mul_expr add_expr_tail
| // EMPTY
;
The comment // Empty
just refers to "no token". Now, if our current token is NUMBER
and we're at a point where we're processing an add_expr
, we only have the rule mul_expr
add_expr_tail
to choose from. If a NUMBER
can't start this rule, we would have an error.
But, a NUMBER
can start an add_expr
, so we may proceed. Going down the parse tree,
mul_expr
eventually becomes a primary
, which becomes a NUMBER
. With that
token matched, we advance to the next token. We are now processing the add_expr_tail
nonterminal in
add_expr: mul_expr add_expr_tail
. If our current token is PLUS
, the rule
add_expr_tail: PLUS mul_expr add_expr_tail
is used. Otherwise, the rule add_expr_tail: //
EMPTY
is used.
assign_expr: ID ASSIGN add_expr
| add_expr
;
Can you see why? An add_expr
become a primary
which can become an ID
. So
an ID
can begin either right hand side.
assign_expr: add_expr assign_expr_suffix
;
assign_expr_suffix: ASSIGN add_expr
| // EMPTY
;
assign_expr: add_expr assign_expr_suffix
;
assign_expr_suffix: ASSIGN add_expr
| // EMPTY
;
add_expr: mul_expr add_expr_tail
;
add_expr_tail: PLUS mul_expr add_expr_tail
| MINUS mul_expr add_expr_tail
| // EMPTY
;
mul_expr: pow_expr mul_expr_tail
;
mul_expr_tail: MUL pow_expr mul_expr_tail
| DIV pow_expr mul_expr_tail
| MOD pow_expr mul_expr_tail
| // EMPTY
;
pow_expr: unary_expr pow_expr_suffix
;
pow_expr_suffix: POW unary_expr
| // EMPTY
;
unary_expr: PLUS primary
| MINUS primary
| primary
;
primary: ID
| NUMBER
| LP add_expr RP
| SIN LP add_expr RP
| COS LP add_expr RP
| TAN LP add_expr RP
| ASIN LP add_expr RP
| ACOS LP add_expr RP
| ATAN LP add_expr RP
| LOG LP add_expr RP
| EXP LP add_expr RP
| LOG10 LP add_expr RP
| EXP10 LP add_expr RP
| SQRT LP add_expr RP
| INT LP add_expr RP
;
add_expr: mul_expr add_expr_tail
;
This can be simply coded as
void add_expr()
{
mul_expr();
add_expr_tail();
}
The production set
add_expr_tail: PLUS mul_expr add_expr_tail
| MINUS mul_expr add_expr_tail
| // EMPTY
;
can be coded as
void add_expr_tail()
{
switch (current_token) {
case PLUS:
advance(); // skip past PLUS token
mul_expr();
add_expr_tail();
break;
case MINUS:
advance(); // skip past MINUS token
mul_expr();
add_expr_tail();
default:
return; // represents EMPTY.
}
}
Don't get too clever by factoring out the commonality in this code. It will disappear when you add the semantics
later.add_expr_tail
production has potential to be slow and to eat memory. When such
X
and X_tail
productions occur, we will code them together using the following kind of
optimization:
void add_expr()
{
mul_expr();
for (;;) {
switch (current_token()) {
case PLUS:
advance();
mul_expr();
break;
case MINUS:
advance();
mul_expr();
break;
default:
return;
}
}
}
We apply a similar optimization to a X
and X_suffix
pair:
void assign_expr()
{
add_expr();
if (current_token == ASSIGN) {
advance();
add_expr();
}
}
template<int N>
class Error {
public:
explicit Error(const std::string& s) : msg{s} { }
std::string get_message() const { return message; }
void put(std::ostream& os) const { os << message; }
private:
std::string message;
};
using Lexical_error = Error<1>;
using Syntax_error = Error<2>;
using Runtime_error = Error<3>;
template<int N>
std::ostream& operator<<(std::ostream& os, const Error<N>& x)
{
x.put(os);
return os;
}
enum
class
to do this.
enum class Token {
Id, Number, Sin, Cos, Tan, Asin, Acos, Atan, Log, Exp,
Log10, Exp10, Sqrt, Int, Assign='=', Plus='+', Minus='-',
Mul='*', Div='/', Mod='%', Pow='^', Lp='(', Rp=')', Eofsym=-1
};
class Lexer {
public:
explicit Lexer(std::istream& is);
explicit Lexer(std::istream* ps);
// A lexer belongs to a parser and should not be copied or moved.
Lexer(const Lexer&) = delete;
Lexer& operator=(const Lexer&) = delete;
Lexer(Lexer&&) = delete;
Lexer& operator=(Lexer&&) = delete;
~Lexer() { if (owns_input) delete p_input; }
Token token() const { return cur_token; }
std::string text() const { return buffer; }
void advance() { cur_token = get_token(); }
private:
std::istream* p_input; // The source stream (a stream of characters).
bool owns_input; // True if we can delete p_input, false if we can't.
Token cur_token;
std::string buffer;
Token get_token(); // The workhorse. Assembles characters from p_input into tokens.
};
The constructors simply initialize p_input
and owns_input
, and call
init()
to do a priming read on the stream.
Lexer::Lexer(std::istream& is)
: p_input{&is}, owns_input{false}
{
advance();
}
Lexer::Lexer(std::istream* ps)
: p_input{ps}, owns_input{true}
{
advance();
}
get_token()
is the workhorse of our lexer. It assembles the characters from the input stream into
tokens. It starts out as follows:
Token Lexer::get_token()
{
std::istream& input = *p_input; // shorthand to make the notation convenient.
token_buffer.clear(); // clear the buffer for the new token.
char c = input.get(); // a priming read on the stream.
// Skip whitespace.
while (isspace(c)) c = input.get();
// If there are no characters, we're at the end of the stream.
if (!input) return Token::Eofsym;
// ...
}
Next, we look for an identifier or function name. Both have the same syntax. We handle this by looking for an
identifier first. Then, we check what we have found against the function names.
Token Lexer::get_token()
{
// ...
// Look for an identifier or a function name. Both start with a letter.
if (isalpha(c)) {
token_buffer = c;
c = input.get();
// Look for zero or more letters or digits.
while (isalnum(c)) {
token_buffer += c;
c = input.get();
}
// The current character doesn't belong to our identifier, and must be put back into the stream.
input.putback(c);
// Check for a function name.
if (token_buffer == "sin") return Token::Sin;
if (token_buffer == "cos") return Token::Cos;
if (token_buffer == "tan") return Token::Tan;
if (token_buffer == "asin") return Token::Asin;
if (token_buffer == "acos") return Token::Acos;
if (token_buffer == "atan") return Token::Atan;
if (token_buffer == "log") return Token::Log;
if (token_buffer == "exp") return Token::Exp;
if (token_buffer == "log10") return Token::Log10;
if (token_buffer == "exp10") return Token::Exp10;
if (token_buffer == "sqrt") return Token::Sqrt;
if (token_buffer == "int") return Token::Int;
// Whatever is not a function name must be an identifier.
return Token::Id;
}
// ...
}
Here is the code to assemble a number from characters. We want numbers such as 304, 46., .72, 3.14, and any of patterns
followed by an exponent part, e.g., 4.92e-3, 21.34e6, .5e+7. The following code recognizes our number pattern.
Token Lexer::get_token()
{
// ...
// Look for a number.
if (isdigit(c) || c == '.') {
if (isdigit(c)) {
buffer += c;
c = input.get();
while (isdigit(c)) {
buffer += c;
c = input.get();
}
if (c == '.') {
buffer += c;
c = input.get();
while (isdigit(c)) {
buffer += c;
c = input.get();
}
}
} else { // c == '.'
buffer += c;
c = input.get();
if (!isdigit(c))
throw Lexical_error{"no digits after decimal point"};
while (isdigit(c)) {
buffer += c;
c = input.get();
}
}
if (c == 'e' || c == 'E') {
buffer += c;
c = input.get();
if (c == '+' || c == '-') {
buffer += c;
c = input.get();
}
if (!isdigit(c))
throw Lexical_error{"no digits in exponent part"};
while (isdigit(c)) {
buffer += c;
c = input.get();
}
}
input.putback(c);
return Token::NUMBER;
}
}
The rest of our tokens are single character tokens. They are simple to handle. If we get through our single
character tokens without returning, we're left with an error.
Token Lexer::get_token()
{
// ...
// Check for a single character token.
token_buffer = c;
switch (c) {
// Note: fallthrough intentional.
case '=':
case '+':
case '-':
case '*':
case '/':
case '%':
case '^':
case '(':
case ')':
return Token(c);
}
// Anything else is an error.
throw Lexical_error{token_buffer};
}
_tail
and _suffix
nonterminals, for which we provide the optimization mentioned above. For now, it is
convenient to declare the parser functions as returning void
. Later, when we add our semantics and
our functions return values, we'll change that return type to double
. The alternative is to have our
functions return double
from the beginning, and return a dummy value. But if we miss replacing that dummy
value later on, we'll have a buggy program.Parser
class.
class Parser {
public:
Parser();
// Change the return type to double when we add semantics.
void operator()(const std::string& s);
private:
Lexer* p_lexer;
// Change these return types to double when we add semantics.
void assign_expr();
void add_expr();
void mul_expr();
void pow_expr();
void unary_expr();
void primary();
};
We could let the compiler provide a default constructor at this stage, but when we add our symbol table we will
use the constructor to reserve a couple of identifiers as constants.operator()
function simply creates a Lexer, calls
assign_expr()
to start the parsing process, and then deletes the Lexer when it's done.
Parser::Parser()
{
}
void Parser::operator()(const std::string& s)
{
std::istringstream ist{s}; // Create an input stream from the string.
p_lexer = new Lexer{ist};
assign_expr(); // Our entry point into the parsing process.
delete p_lexer();
}
Now we will code the parsing routines. Each function is named after a nonterminal. The function bodies conform
closely to the right hand sides of all the productions for which the nonterminal is the left hand side. We
choose which of the rules we are processing based on the current token in the stream. Whenever we encounter a
nonterminal in an rhs, we simply call the function with that name. When we encounter a terminal, we check
whether it matches the current token in the stream. If it does, we advance the lexer. If it doesn't, we throw an
exception.
void Parser::assign_expr()
{
// We must save the current token at this point, because if it is followed by an assignment
// operator, we have to check whether it is an identifier.
Token t = p_lexer->get_current_token();
add_expr();
// Process assign_expr_suffix.
if (p_lexer->get_current_token() == Token::Assign) {
// The target of our assignment must be an Id.
if (t != Token::Id)
throw Syntax_error{"target of assignment must be an identifier"};
p_lexer->advance(); // Move past the Assign token.
add_expr();
}
}
In our grammar, an assignment to any expression is valid. However, when that target is not an identifier,
assignment doesn't make sense. When we specified our grammar, we left that requirement to semantic checking.
But, it's a check we can provide during translation. Semantics that can be checked during translation are called
static semantics.add_expr()
.
void Parser::add_expr()
{
mul_expr();
// Process add_expr_tail.
for (;;) {
switch (p_lexer->get_current_token()) {
case Token::Plus:
p_lexer->advance();
mul_expr();
break;
case Token::Minus:
p_lexer->advance();
mul_expr();
break;
default:
return;
}
}
}
Note that the two cases in the switch statement above have common code. We don't want to factor it out, because
when we add semantics later on, the code segments will no longer be identical.mul_expr()
is very similar.
void Parser::mul_expr()
{
pow_expr();
// Process mul_expr_tail.
for (;;) {
switch (p_lexer->get_current_token()) {
case Token::Mul:
p_lexer->advance();
pow_expr();
break;
case Token::Div:
p_lexer->advance();
pow_expr();
break;
case Token::Mod:
p_lexer->advance();
pow_expr();
break;
default:
return;
}
}
}
The code for pow_expr()
and unary_expr
are pretty straightforward.
void Parser::pow_expr()
{
unary_expr();
// Process pow_expr_suffix.
if (p_lexer->get_current_token() == Token::Pow) {
p_lexer->advance();
pow_expr();
}
}
void Parser::unary_expr()
{
switch (p_lexer->get_current_token()) {
case Token::Plus:
p_lexer->advance();
primary();
break;
case Token::Minus:
p_lexer->advance();
primary();
break;
default:
primary();
}
}
The primary() function is just a straightforward processing of the right hand side of each production in every
case. Instead of writing the code for function calls over and over, we'll put it in a function called
get_argument()
. When we add semantic processing, get_argument()
will return the value
produced by add_expr()
, i.e., the argument to the function.
void Parser::primary()
{
switch (p_lexer->get_current_token()) {
case Token::Id:
p_lexer->advance();
break;
case Token::Number:
p_lexer->advance();
break;
case Token::Lp:
p_lexer->advance();
add_expr();
if (p_lexer->get_current_token() != Token::Rp)
throw Syntax_error{"missing ) after subexpression"};
p_lexer->advance();
break;
case Token::Sin:
get_argument();
break;
case Token::Cos:
get_argument();
break;
case Token::Tan:
get_argument();
break;
case Token::Asin:
get_argument();
break;
case Token::Acos:
get_argument();
break;
case Token::Atan:
get_argument();
break;
case Token::Log:
get_argument();
break;
case Token::Exp:
get_argument();
break;
case Token::Log10:
get_argument();
break;
case Token::Exp10:
get_argument();
break;
case Token::Sqrt:
get_argument();
break;
case Token::Int:
get_argument();
break;
default:
throw Syntax_error{"invalid primary expression"};
}
}
// Change the return type to double when we add semantics.
void Parser::get_argument()
{
p_lexer->advance(); // Toss the function name. We already know it.
if (p_lexer->get_current_token() != Token::Lp)
throw Syntax_error{"missing ( after function name"};
p_lexer->advance();
add_expr();
if (p_lexer->get_current_token() != Token::Rp)
throw Syntax_error{"missing ) after function argument"};
p_lexer->advance();
}
We now write a driver for the parser. At this point, the program will only produce output if something is wrong.
Don't let that discourage you.
int main()
{
Parser parser;
while (std::cin) {
std::string s;
std::getline(std::cin, s);
if (!std::cin || s == "quit") break;
try {
parser(s);
} catch (const Lexical_error& e) {
std::cerr << "Lexical error: " << e << '\n';
} catch (const Syntax_error& e) {
std::cerr << "Syntax error: " << e << '\n';
} catch (const Runtime_error& e) {
std::cerr << "Runtime error: " << e << '\n';
}
}
}
primary()
function recognizes a number, it will need to convert the
string in the token text into a double
. We can provide a simple helper function to do that. It
works by turning the given string into an input stream, and then reads the number from that stream.
double to_number(const std::string& s)
{
std::istringstream ist{s};
// Tell the input stream to throw an exception if it can't read the number.
ist.exceptions(std::ios_base::failbit);
double x;
ist >> x;
return x;
}
The opposite conversion, from number to string, is helpful when composing error messages.
std::string to_string(double x)
{
std::ostringstream ost;
ost << x;
return ost.str();
}
map
is sufficient to use as a symbol table.
std::map symbol_table;
Parser()
pi
and e
. The constructor will put
their values into the symbol table. It will be the job of assign_expr()
to protect against
assignments to these two constants.
Parser::Parser()
{
symbol_table["pi"] = 4.0 * atan(1.0);
symbol_table["e"] = exp(1.0);
}
primary()
Id
, we look it up in the symbol table and return its associated value. If the
Id
hasn't been used yet, it's entered into the symbol table with a value of 0, and 0 is
returned.Number
, we convert its string representation to a double
and
return that value.add_expr()
returns.get_argument()
to get the argument value. When we need to, we check whether the
argument is valid for the function. We then return the result of calling the function on the argument. Here is
primary()
with the semantics added.
double Parser::primary()
{
std::string text = p_lexer->get_token_text();
double arg;
switch (p_lexer->get_current_token()) {
case Token::Id:
p_lexer->advance();
return symbol_table[text];
case Token::Number:
p_lexer->advance();
return to_number(text);
case Token::Lp:
p_lexer->advance();
arg = add_expr();
if (p_lexer->get_token() != Token::Rp)
throw Syntax_error{"missing ) after subexpression"};
p_lexer->advance();
return arg;
case Token::Sin:
return sin(get_argument());
case Token::Cos:
return cos(get_argument());
case Token::Tan:
arg = get_argument();
if (cos(arg) == 0)
throw Runtime_error{"invalid argument to tan: " + to_string(arg)};
return tan(arg);
case Token::Asin:
return asin(get_argument());
case Token::Acos:
return acos(get_argument());
case Token::Atan:
return atan(get_argument());
case Token::Log:
arg = get_argument();
if (arg < 1)
throw Runtime_error{"invalid argument to log: " + to_string(arg)};
return log(arg);
case Token::Exp:
return exp(get_argument());
case Token::Log10:
arg = get_argument();
if (arg < 1)
throw Runtime_error{"invalid argument to log10: " + to_string(arg)};
return log10(arg);
case Token::Exp10:
return exp10(get_argument());
case Token::Sqrt:
arg = get_argument();
if (arg < 0)
throw Runtime_error{"attempt to take square root of negative number"};
return sqrt(arg);
case Token::Int:
arg = get_argument();
if (arg < 0)
return ceil(arg);
else
return floor(arg);
default:
throw Syntax_error{"invalid primary expression"};
}
}
We also need ot modify get_argument()
so it returns the value of the argument.
double Parser::get_argument()
{
p_lexer->advance(); // Toss the function name. We already know it.
if (p_lexer->get_current_token() != Token::Lp)
throw Syntax_error{"missing ( after function name"};
p_lexer->advance();
double arg = add_expr();
if (p_lexer->get_current_token() != Token::Rp)
throw Syntax_error{"missing ) after function argument"};
p_lexer->advance();
return arg;
}
unary_expr()
double Parser::unary_expr()
{
switch (p_lexer->get_current_token()) {
case Token::Plus:
p_lexer->advance();
return +primary();
case Token::Minus:
p_lexer->advance();
return -primary();
default:
return primary();
}
}
pow_expr()
pow_expr()
uses the library function pow()
to do its computation. We also want to make
sure that we're not trying to take the root of a negative number. We provide a function
check_domain()
that does this. I chose to put check_domain()
in the parser class, but
made it static
because it doesn't depend on any members of the class.
double Parser::pow_expr()
{
double result = unary_expr();
if (p_lexer->get_current_token() == Token::Pow) {
p_lexer->advance();
double x = unary_expr();
check_domain(result, x);
result = pow(result, x);
}
return result;
}
double Parser::check_domain(double x, double y)
{
// There is no error if x is nonnegative.
if (x >= 0) return;
// There is no error unless 0 < abs(y) < 1.
double e = std::abs(y);
if (e <= 0 || e >=1) return;
// We have an error.
throw Runtime_error{"attempt to take root of a negative number"};
}
mul_expr()
double Parser::mul_expr()
{
double result = pow_expr();
double x;
for (;;) {
switch (p_lexer->get_current_token()) {
case Token::Mul:
p_lexer->advance();
result *= pow_expr();
break;
case Token::Div:
p_lexer->advance();
x = pow_expr();
if (x == 0)
throw Runtime_error{"attempt to divide by zero"};
result /= x;
break;
case Token::Mod:
p_lexer->advance();
x = pow_expr();
if (x == 0)
throw Runtime_error{"attempt to divide by zero"};
result = fmod(result, x);
break;
default:
return result;
}
}
}
add_expr()
double Parser::add_expr()
{
double result = mul_expr();
for (;;) {
switch (p_lexer->get_current_token()) {
case Token::Plus:
p_lexer->advance();
result += mul_expr();
break;
case Token::Minus:
p_lexer->advance();
result -= mul_expr();
break;
default:
return result;
}
}
}
assign_expr()
assign_expr()
, we want to guard against assigning to our constants, and we want to update any
value assigned to in our symbol table. We count assignment to a constant as a syntax error because it's
something that can be checked for during translation, and we're grouping our syntax errors and static semantic
errors together.
void Parser::assign_expr()
{
// We must save the current token at this point, because if it is followed by an assignment
// operator, we have to check whether it is an identifier.
Token t = p_lexer->get_current_token();
std::string text = p_lexer->get_token_text();
double result = add_expr();
// Process assign_expr_suffix.
if (p_lexer->get_current_token() == Token::Assign) {
// The target of our assignment must be an Id.
if (t != Token::Id)
throw Syntax_error{"target of assignment must be an identifier"};
// Check whether we're modifying a constant.
if (text == "pi" || text == "e")
throw Syntax_error{"attempt to modify the constant " + text};
p_lexer->advance(); // Move past the Assign token.
return symbol_table[text] = add_expr();
}
return result;
}
#include <iostream>
#include <sstream>
#include <map>
#include <string>
#include <algorithm>
#include <cctype>
#include <cmath>
double to_number(const std::string& s)
{
std::istringstream ist{s};
ist.exceptions(std::ios_base::failbit);
double x;
ist >> x;
return x;
}
std::string to_string(double x)
{
std::ostringstream ost;
ost << x;
return ost.str();
}
template<int N>
class Error {
public:
explicit Error(const std::string s) : message{s} { }
std::string get_message() const { return message; }
void put(std::ostream& os) const { os << message; }
private:
std::string message;
};
using Lexical_error = Error<1>;
using Syntax_error = Error<2>;
using Runtime_error = Error<3>;
template<int N>
std::ostream& operator<<(std::ostream& os, const Error<N>& e)
{
e.put(os);
return os;
}
// The basic elements of our expressions.
enum class Token {
Id, Number, Sin, Cos, Tan, Asin, Acos, Atan, Log, Exp,
Log10, Exp10, Sqrt, Int, Assign='=', Plus='+', Minus='-',
Mul='*', Div='/', Mod='%', Pow='^', Lp='(', Rp=')', Eofsym=-1
};
class Lexer {
public:
explicit Lexer(std::istream& is);
explicit Lexer(std::istream* ps);
// A Lexer belongs to a parser and shouldn't be copied or moved.
Lexer(const Lexer&) = delete;
Lexer& operator=(const Lexer&) = delete;
Lexer(Lexer&&) = delete;
Lexer& operator=(Lexer&&) = delete;
~Lexer() { if (owns_input) delete p_input; }
Token get_current_token() const { return current_token; }
std::string get_token_text() const { return current_token_text; }
void advance(); // Read the next token in the stream.
private:
std::istream* p_input; // The source stream (a stream of characters).
bool owns_input; // True if we can delete p_input, false if we can't.
Token current_token;
std::string current_token_text;
void init(); // Code common to each constructor.
Token get_token(); // The workhorse. Assembles characters from p_input into tokens.
std::string token_buffer; // The text of the token that get_token() just found.
void exponent_part(char& c); // A helper function for get_token() when it is looking for a number.
};
Lexer::Lexer(std::istream& is)
: p_input{&is}, owns_input{false}
{
init();
}
Lexer::Lexer(std::istream* ps)
: p_input{ps}, owns_input{false}
{
init();
}
void Lexer::init()
{
current_token = get_token();
current_token_text = token_buffer;
}
void Lexer::advance()
{
if (current_token != Token::Eofsym) {
current_token = get_token();
current_token_text = token_buffer;
}
}
Token Lexer::get_token()
{
std::istream& input = *p_input; // Shorthand to make the notation convenient.
token_buffer.clear(); // Clear the buffer for the new token.
char c = input.get(); // A priming read on the stream.
// Skip whitespace.
while (isspace(c)) c = input.get();
// If there are no characters, we're at the end of the stream.
if (!input) return Token::Eofsym;
// Look for an identifier or function name.
if (isalpha(c)) {
token_buffer = c;
c = input.get();
// Look for zero or more letters or digits.
while (isalnum(c)) {
token_buffer += c;
c = input.get();
}
// The current character doesn' belong to our identifier.
input.putback(c);
// Check for a function name.
if (token_buffer == "sin") return Token::Sin;
if (token_buffer == "cos") return Token::Cos;
if (token_buffer == "tan") return Token::Tan;
if (token_buffer == "asin") return Token::Asin;
if (token_buffer == "acos") return Token::Acos;
if (token_buffer == "atan") return Token::Atan;
if (token_buffer == "log") return Token::Log;
if (token_buffer == "exp") return Token::Exp;
if (token_buffer == "log10") return Token::Log10;
if (token_buffer == "exp10") return Token::Exp10;
if (token_buffer == "sqrt") return Token::Sqrt;
if (token_buffer == "int") return Token::Int;
// Whatever is not a function name must be an identifier.
return Token::Id;
}
// Look for a number beginning with a digit.
if (isdigit(c)) {
token_buffer = c;
c = input.get();
// Look for other digits.
while (isdigit(c)) {
token_buffer += c;
c = input.get();
}
// Look for an optional decimal point.
// If there is one, it can be followed by zero or more digits.
if (c == '.') {
token_buffer += c;
c = input.get();
while (isdigit(c)) {
token_buffer += c;
c = input.get();
}
}
// Look for an optional exponent part.
exponent_part(c);
input.putback(c);
return Token::Number;
}
// Look for a number beginning with a decimal point.
if (c == '.') {
token_buffer = c;
c = input.get();
// A decimal point must be followed by a digit. Otherwise we have an error.
if (!isdigit(c)) {
throw Lexical_error{token_buffer += c};
}
while (isdigit(c)) {
token_buffer += c;
c = input.get();
}
// Check for the optional exponent part.
exponent_part(c);
input.putback(c);
return Token::Number;
}
// Check for a single character token.
token_buffer = c;
switch (c) {
// Note: fallthrough intentional.
case '=':
case '+':
case '-':
case '*':
case '/':
case '%':
case '^':
case '(':
case ')':
return Token(c);
}
// Anything else is an error.
throw Lexical_error{token_buffer};
}
void Lexer::exponent_part(char& c)
{
std::istream& input = *p_input;
if (c != 'e' || c != 'E')
return;
token_buffer += c;
c = input.get();
// Check for an optional sign.
if (c == '+' || c == '-') {
token_buffer += c;
c = input.get();
}
// We must have a digit. Otherwise, we have an error.
if (!isdigit(c))
throw Lexical_error{token_buffer += c};
while (isdigit(c)) {
token_buffer += c;
c = input.get();
}
}
std::map<std::string, double> symbol_table;
class Parser {
public:
Parser();
double operator()(const std::string& s);
private:
Lexer* p_lexer;
double assign_expr();
double add_expr();
double mul_expr();
double pow_expr();
double unary_expr();
double primary();
double get_argument();
// Check for root of a negative number.
static void check_domain(double x, double y);
};
Parser::Parser()
{
symbol_table["pi"] = 4.0 * atan(1.0);
symbol_table["e"] = exp(1.0);
}
double Parser::operator()(const std::string& s)
{
std::istringstream ist{s};
p_lexer = new Lexer{ist};
double result = assign_expr();
delete p_lexer;
return result;
}
double Parser::assign_expr()
{
Token t = p_lexer->get_current_token();
std::string text = p_lexer->get_token_text();
double result = add_expr();
if (p_lexer->get_current_token() == Token::Assign) {
if (t != Token::Id)
throw Syntax_error{"target of assignment must be an identifier"};
if (text == "pi" || text == "e")
throw Syntax_error{"attempt to modify the constant " + text};
p_lexer->advance();
return symbol_table[text] = add_expr();
}
return result;
}
double Parser::add_expr()
{
double result = mul_expr();
for (;;) {
switch (p_lexer->get_current_token()) {
case Token::Plus:
p_lexer->advance();
result += mul_expr();
break;
case Token::Minus:
p_lexer->advance();
result -= mul_expr();
default:
return result;
}
}
}
double Parser::mul_expr()
{
double result = pow_expr();
double x;
for (;;) {
switch (p_lexer->get_current_token()) {
case Token::Mul:
p_lexer->advance();
result *= pow_expr();
break;
case Token::Div:
p_lexer->advance();
x = pow_expr();
if (x == 0)
throw Runtime_error{"attempt to divide by zero"};
result /= x;
break;
case Token::Mod:
p_lexer->advance();
x = pow_expr();
if (x == 0)
throw Runtime_error{"attempt to divide by zero"};
result = fmod(result, x);
break;
default:
return result;
}
}
}
double Parser::pow_expr()
{
double result = unary_expr();
if (p_lexer->get_current_token() == Token::Pow) {
p_lexer->advance();
double x = unary_expr();
check_domain(result, x);
return pow(result, x);
}
return result;
}
double Parser::unary_expr()
{
switch (p_lexer->get_current_token()) {
case Token::Plus:
p_lexer->advance();
return +primary();
case Token::Minus:
p_lexer->advance();
return -primary();
default:
return primary();
}
}
double Parser::primary()
{
std::string text = p_lexer->get_token_text();
double arg;
switch (p_lexer->get_current_token()) {
case Token::Id:
p_lexer->advance();
return symbol_table[text];
case Token::Number:
p_lexer->advance();
return to_number(text);
case Token::Lp:
p_lexer->advance();
arg = add_expr();
if (p_lexer->get_current_token() != Token::Rp)
throw Syntax_error{"missing ) after subexpression"};
p_lexer->advance();
return arg;
case Token::Sin:
return sin(get_argument());
case Token::Cos:
return cos(get_argument());
case Token::Tan:
arg = get_argument();
if (cos(arg) == 0)
throw Runtime_error{"invalid argument to tan: " + to_string(arg)};
return tan(arg);
case Token::Asin:
return asin(get_argument());
case Token::Acos:
return acos(get_argument());
case Token::Atan:
return atan(get_argument());
case Token::Log:
arg = get_argument();
if (arg < 1)
throw Runtime_error{"invalid argument to log: " + to_string(arg)};
return log(arg);
case Token::Exp:
return exp(get_argument());
case Token::Log10:
arg = get_argument();
if (arg < 1)
throw Runtime_error{"invalid argument to log10: " + to_string(arg)};
return log10(arg);
case Token::Exp10:
return exp10(get_argument());
case Token::Sqrt:
arg = get_argument();
if (arg < 0)
throw Runtime_error{"attempt to take square root of negative number"};
return sqrt(arg);
case Token::Int:
arg = get_argument();
if (arg < 0)
return ceil(arg);
else
return floor(arg);
default:
throw Syntax_error{"invalid primary expression"};
}
}
void Parser::check_domain(double x, double y)
{
// There is no error if x is nonnegative.
if (x >= 0) return;
// There is no error unless 0 < abs(y) < 1.
double e = std::abs(y);
if (e <= 0 || e >= 1) return;
// We have an error.
throw Runtime_error{"attempt to take root of a negative number"};
}
double Parser::get_argument()
{
p_lexer->advance();
if (p_lexer->get_current_token() != Token::Lp)
throw Syntax_error{"missing ( after function name"};
p_lexer->advance();
double arg = add_expr();
if (p_lexer->get_current_token() != Token::Rp)
throw Syntax_error{"missing ) after function argument"};
p_lexer->advance();
return arg;
}
int main()
{
Parser parser;
std::cout.precision(12);
while (std::cin) {
std::string s;
std::getline(std::cin, s);
if (!std::cin || s == "quit") break;
try {
std::cout << parser(s) << '\n';
} catch(const Lexical_error& e) {
std::cerr << "Lexical error: " << e << '\n';
} catch(const Syntax_error& e) {
std::cerr << "Syntax error: " << e << '\n';
} catch(const Runtime_error& e) {
std::cerr << "Runtime error: " << e << '\n';
}
}
}