Thursday, May 31, 2007

c tokens

Compilation of C Programs
C programs can be compiled by C compilers and most C++ compilers. Most UNIX operating systems usually come with the following compilers: cc - compiler for the C language
CC - compiler for the C++ language
In addition, two compilers written by the GNU free software foundation are often installed: gcc - compiler for the C language
g++ - compiler for the C++ language
Further information is available in the UNIX reference page.
Variables
A variable is a named or unnamed place for storing mutable data. Named variables are declared either globally (outside of functions) or local to a function. A named variable is given a type as part of its declaration. For simple types a variable declaration has the following form:
type variable-name;

where
type is the name of a type, either a built-in type such as int or char, or a programmer defined type, and
variable-name is an identifier that names the variable.
For example,
int x;

declares x to be an int (integer) variable. Definitions of variables with more complex types are constructed and interpreted according to C precedence rules.
Unnamed variables can be created at execution time using dynamic memory allocation. Unnamed variables can only be accessed through pointers.

Functions
A function is a named complex syntactic construction. It is defined once, but may be called upon many times to return values and/or perform actions. The actions performed by a function can include modifying variables, processing input, and generating output. Functions can have parameters. Parameters are values that are passed when a function is called. A function definition has the following form:
value-type function-name(parameter-list) {
local-variable-declarations
statements
}

where
value-type is the name of the type of value that the function returns.
function-name is an identifier that names the function.
parameter-list is a sequence of variable declarations, with semicolons omitted, but separated by commas. These declarations give names and types to the parameters that may be used as initialized variables in the statements. They are initialized by the values passed when the function is called.
local-variable-declarations is sequence of declarations for variables that may be used in the statements.
statements is a sequence of statements that perform the desired action and return the desired value.
A function can be defined so that it does not return any value. This is done by replacing value-type with the keyword void. A function can also be defined with no parameters. Then parameter-list should be the keyword void.
For example, the following is a definition of a function named Sum with three integer parameters that returns their sum:

int Sum(int a, int b, int c) {
return a + b + c;
}

Calling Functions
A function call consists of the function name followed by a parenthesized list of expressions that are separated by commas. The number of expressions must match the number of parameters in the definition of the function. The values of these expresions are called the arguments of the function call. The value types of the arguments must match, in order, the parameter types in the function definition. For the function Sum defined above, the argument list for a call must have three integer expressions. For example, the following is a legal call to the function Sum:
Sum(3, 27*11, 2 + 2)

A function call can appear anywhere that a value of its return type can appear. The above call could appear in a complex expression as long as an integer value is appropriate where it appears. For example, the following is a legitimate expression using the function Sum:
Sum(Sum(1, 2, 3), Sum(4, 5, 6), Sum(7, 8, 9))

The value of a function call is determined as follows. First, the argument expressions are evaluated and are used to initialize the parameters. For the first example call above, a is initialized as 3, b as 297, and c as 4. Then the statements in the function body are executed in order. A return statement completes the execution of statements and determines the returned value. The only statement in Sum just returns a + b + c, which is 304.
For the second example call above, another important fact comes into play: each call to a function uses its own variables, which are independent of the variables in other calls to the same function. This means that if a function does not refer to any variables except parameters and local variables then you can understand what its action and returned value are once you know the value of its parameters. The Sum function just returns the sum of its arguments, so the returned value for the second example is ((1 + 2 + 3) + (4 + 5 + 6) + (7 + 8 + 9)), or 45.

Statements
A statement is a syntactic constructions that perform actions. It can alter the value of variables, generate output, or process input. The simplest kind of statement is an expression statement, which is no more than an expression followed by a semicolon. The most common uses of expression statements are function call statements, where the expression is just a function call, and assignment statements, where the expression is an assignment expression. The most commonly used assignment expression has the form
variable = expression

so the assignment statement has the form
variable = expression;

This statement changes the value of the variable on the left side of the equals sign to the value of the expression on the right side.
C is unusual in treating an assignment as both an expression and as a statement. This is done to allow multiple assignments in a single statement, such as

a = b = 5;

By treating b = 5 as an expression with value 5, C makes sense of this statement, assigning the value 5 to both a and b.
C also has structured statements loops and conditional execution of statements. while - pretest loops
for - pretest loops
do-while - posttest loops
if-else - conditional execution
switch - conditionally selecting among several statements
return - return values from and terminate functions
continue - skip remaining statements in a loop
break - exit a loop or switch


While Loops
The while statement is a looping construction which has the following form:
while (condition) {
statements
}

where
condition is an expression that can be true (nonzero) or false (zero). This expression is tested prior to each iteration of the loop. The loop terminates when it is false.
statements is a sequence of statements. If there is only one statement then the braces can be omitted.
For Loops
The for statement is a looping construction which has the following form:
for (initialization; condition; increment) {
statements
}

where
initialization is a statement that is executed once at the beginning of the for loop.
condition is an expression that can be true (nonzero) or false (zero). This expression is tested prior to each iteration of the loop. The loop terminates when it is false.
increment is a statement that is executed after statements.
statements is a sequence of statements. If there is only one statement then the braces may be omitted.
Do-While Loops
The do-while statement is a looping construction which has the following form:
do {
statements
} while (condition);

where
statements is a sequence of statements. If there is only one statement then the braces may be omitted.
condition is an expression that can be true (nonzero) or false (zero). This expression is tested after each iteration of the loop. The loop terminates when it is false.
If-Else Statements
The if statement has one of two forms:
if (condition) {
true-statements
}

or
if (condition) {
true-statements
} else {
false-statements
}

where
condition is an expression that can be true (nonzero) or false (zero).
true-statements and false-statements are sequences of statements. If there is only one statement in a sequence then the surrounding braces may be omitted. For both forms, the true-statements are executed only if condition is true. For the second form the false-statements are executed if condition is false.
Switch Statements
The switch statement allows execution of different statements depending on the value of an expression. It has the following form:
switch (expression) {
case constant-expression-1:
statements-1
.
.
.
case constant-expression-n:
statements-n
default:
default-statements
}

where
expression is an expression of a simple type, such as int, char, or an enum type. It cannot have float or double type.
constant-expression-1 through constant-expression-n are expressions of a type that converts to the type of expression. The compiler must be able to evaluate these expressions to constant values.
statements-1 through statements-n are sequences of statements.
When the switch statement is executed, expression is evaluated. The resulting value is compared to the values of constant-expression-1 through constant-expression-n in order until a matching value is found. If a match is found in constant-expression-i then statements-i through statements-n and default-statements are executed, with switch statement execution terminated if a break statement is encountered. Normally, the last statement in each sequence is a break statement so that only one sequence is executed.
The default clause is optional. If it is present then the default-statements are executed whenever the value of expression does not match any of the constant-expression-i values.

Return Statements
The return statement is used in the definition of a function to set its returned value and to terminate execution of the function. It has two forms. For functions with returned type void use
return;

For functions with non-void returned type use
return expression;

where expression is an expression that yields the desired return value.
Continue Statements
The continue statement is used in loop (for, while, and do-while) statements to terminate an iteration of the loop. After a continue statement is executed in a for loop, execution proceeds to the increment clause and continuation test of the loop. After a continue statement is executed in a while or do-while loop, execution proceeds to the continuation test of the loop. A continue statement is formed with the keyword continue followed by a semicolon.
Break Statements
The break statement is used in loop (for, while, and do-while) statements and switch statements to terminate execution of the statement. After a break statement is executed, execution proceeds to the statement that follows the enclosing loop or switch statement. A break statement is formed with the keyword break followed by a semicolon.
Data Types
All C variables are defined with a specific type. C has the following built-in types: char - characters
int - integers (whole numbers)
float - real numbers
double - higher precision real numbers
In addition, C provides structured types: array - groups of variables of identical types, accessed using integer indices
structs - groups of variables of mixed types, accessed by using named field selectors
unions - variables that can contain values of different types, depending on a field selector
enumerated types - variables that can take on a small number of different named values
New C types can be defined with a typedef construction. This constructuction is identical to a variable declaration except that it is preceeded by the keyword typedef. The name of the new type appears where the variable name would appear in a variable declaration.
For example, the following declaration declares str to be an array of 10 characters.

char str[10];

On the other hand the following declaration declares String to be the name of a new type for arrays containing 10 characters.
typedef char String[10];

Once a type name has been defined with typedef, it can be used to declare variables. For example, after the above typedef, the following declaration declares str to be an array of 10 characters.

String str;


Scoping Rules
FIXME. Scoping rules are used to determine what an identifier refers to in a context.
Comments
Any text that begins with /* and continues up to the first following */, is ignored by C compilers. This can be used to add comments describing function behavior, variable usage, and algorithms. C comments cannot be nested.
Constants
A constant is a named or unnamed non-mutable program value. Named constants are defined by preceding an initialized variable definition with the keyword const. For example, the following defines a double-precision real constant named pi with its usual mathematical value:
const double pi = 3.141592653589793;

Older versions of C compilers did not use a const keyword. These compilers, and modern compilers as well, use the C preprocessor to define named constants. The above definition would be replaced by
#define pi 3.141592653589793

The preprocessor reads this definition as a command to replace the identifier pi by the literal constant wherever it appears in the program. So the compiler never sees the identifier pi - it just sees the literal constant.
There is a problem with using #define: syntax is not checked at the #define; it is only checked where the substitution is made. If there is something wrong with the #define (for example, adding a semicolon at the end), you can get error messages that are difficult to interpret.

Unnamed constants are often called literals. They are usually constructed similar to English or mathematical values, such as the literal constant 3.141592653589793 in the above definitions.

Conversions Between Types
FIXME.
Definitions and Declarations
A definition gives precise meaning to a program identifier. A declaration describes properties of a program identifier, usually so that the compiler can perform type checks. An identifier can only be defined once, but it may be declared more than once.
The difference between definitions and declarations is most significant for functions. The following is a definition of a function named Sum. This function has three integer parameters and it returns their sum.

int Sum(int a, int b, int c) {
return a + b + c;
}

To form a declaration for Sum, you omit the portion in braces, replacing the braces by a semicolon:
int Sum(int a, int b, int c);

Another situation where declarations and definitions are different is in definitions and delarations of array variables. The definition of an array variable must specify the number of entries, as in
char str[81];

This defines an array variable named str that has enough room for 80 characters plus a terminating null character.
In a definition or declaration of a function, you only need to provide a declaration for parameters. For array parameters, this means that you can omit the size specification as in

void CopyString(char dest[], char src[]) {
int i;

i = 0;
do {
dest[i] = src[i];
} while (src[i++] != '\0') {
}

Expressions
An expression is a syntactic construction that has a value. Expressions are formed by combining constant, variable, and function call values using operators. Some C expressions can have side-effects, which are actions that are executed when the expression is evaluated.
Including Declarations from Other Files
FIXME.
The main Function
Every C program must have a function named main. This function is called by the operating system when the program is executed. For a short program, all of the statements to be executed can be put into main. For larger programs, numerous functions are written that are directly or indirectly called by main.
For programs that do not use command-line arguments the header for main should be

int main(void)

For programs that use command-line arguments the header for the main function should be
int main(int argc, char *argv[])

Then argc is the number of words in the command line and argv is an array of strings, one for each word.
The value returned by main function is normally 0. Some programs return a non-zero value to indicate an abnormal condition. For example, C compilers return non-zero values when they attempt to compile a program that contains syntax errors.

Dynamic Allocation for C Variables
The C standard library (stdlib.h) provides three functions for dynamically allocating variables: malloc, calloc, and realloc. All of these functions return pointers to new variables or arrays of variables, unless the system runs out of memory. In that case they return NULL. It is a good habit to check the returned value. If it is NULL then your program should just print an error message and exit.
malloc is used for allocating non-array variables. It has a single argument that specifies the size of the variable. This is usually obtained by calling sizeof(type), where type is the type of the desired variable. calloc is used for allocating array variables. It has two arguments: the number of array elements and the size of an individual element. realloc is usually used for changing the number of entries for dynamically allocated array variables. It has two arguments: a pointer to the array to be reallocated and the total size of the resized array. The total size can be computed as the product of the number of elements times the size of an individual element. realloc copies all of the data from the old array into the new array.

Some compilers will give warnings when you assign the returned value from any of the memory allocation functions to a variable, because they return pointers of a generic type (void *). To quiet these warnings you can coerce the type by preceding the function call by the pointer or array type name enclosed in parentheses.

For example, suppose a stack type is declared as

/* A Stack is a struct that contains a body, its current capacity, and */
/* its current entry count. */
/* */
typedef struct Stack {
Data *body; /* the stack array */
int capacity; /* maximum number of entries in stack */
int count; /* index of top entry in stack */
} Stack;

and that st is a variable of type Stack. Then to create st with an initial capacity of 20 entries you would use the code
st.capacity = 20;
st.body = (Data *)calloc(st.capacity, sizeof(Data));
if (st.body == NULL) {
fprintf(stderr, "Memory allocation failed for stack body.");
exit(1);
}

To double the capacity of the stack you would use the code
st.capacity *= 2;
st.body = (Data *)realloc(st.body, st.capacity*sizeof(Data));
if (st.body == NULL) {
fprintf(stderr, "Memory reallocation failed for stack body.");
exit(1);
}

The C standard library also provides a function free for freeing up dynamically allocated memory. It has a single void * parameter, which should be a pointer from an earlier call to malloc, calloc, or realloc. Care must be taken to only use free once on a chunk of dynamically allocated memory.
Tokens of the C Language
A token is a sequence of characters that is understood as a unit. Tokens are the basic building blocks of a C program. C tokens fall roughly into the six categories listed below. grouping - tokens pairs that group things together
identifiers - tokens that name functions, variables, constants, and types
literals - tokens that specify values
operators - tokens that are used to combine values in expressions
punctuation - tokens that separate or terminate complex constructions
special - tokens with special meaning to the preprocessor or compiler

Grouping Operators
FIXME. grouping - tokens pairs that group things together
Identifiers
An identifier is a consecutive string of letters, digits, and underscore characters, the first of which is a letter or an underscore. Identifiers are tokens of the C language. They are used to name variables, constants, types, functions, and and members of structs and unions. Certain identifiers have predefined meanings and cannot be redefined. These identifiers, called keywords, are listed below. auto double int struct
break else long switch
case enum register typedef
char extern return union
const float short unsigned
continue for signed void
default goto sizeof volatile
do if static while

Literal Constants
A literal is a token that specifies a value. The syntax of a literal depends on the type of value: integer, real, character, or string.
An integer literal can be written in decimal, octal, or hexadecimal form. A decimal (base 10) literal is any string of digits that does not begin with the digit '0'. This is the most common form for integer literals. An octal (base 8) literal is a string of digits, excluding '8' and '9'. A hexadecimal (base 16) literal begins with the characters '0x' or the characters '0X'. The remaining characters are hexadecimal digits, which include decimal digits and the letters 'a' through 'f' and 'A' through 'F'.

A real literal is written as decimal integer, followed by a decimal point, followed by a fractional part, which is a string of decimal digits. It may optionally be followed by an exponent, which consists of either the letter 'e' or the letter 'E' followed by a decimal integer. This integer is called the exponent. It specifies a power of ten that multiplies the preceding number, as in scientific notation. For example, 1.5e3 represents the number 1.5*1000, or 1500.0. The exponent may be signed.

A string literal is a sequence of characters other than quote marks, or escape sequences, enclosed in double quotes. A character literal is a character other than a quote mark, or an escape sequence, enclosed in single quotes. For both string and character literals, the following escape sequences are used: \n - a newline character
\t - a tab character
\' - a single quote character
\" - a double quote character
\\ - a backslash character
\0 - a null character (string terminator)
A null character is automatically appended to the end of all string literals.

Operators, Precedence, and Associativity
Operators are tokens that are used to combine values in expressions. When evaluating an expression, C uses precedence and associativity rules to determine the order in which operators are applied. In the table below, the C operators are listed in order of precedence with highest precedence at the top. The rows of the table define precedence groups. Two operators in the same group have the same precedence.
Associativity rules are only used to determine the order of evaluation for two operators that are in the same group. Most groups use left to right associativity, which means that in an expression with operators in the same precedence group, the operators are applied in left-to-right order. The order is reversed for right to left associativity. operator group associativity links
() [] -> . left to right selectors
! ~ ++ -- + - (type) * & sizeof right to left unary
* / % left to right multiplicative
+ - left to right additive
<< >> left to right bitwise
< <= > >= left to right comparison
== != left to right comparison
& left to right bitwise
^ left to right bitwise
| left to right bitwise
&& left to right boolean
|| left to right boolean
?: right to left conditional
= += -= *= /= %= &= ^= |= <<= >>= right to left assignment
, left to right comma


Punctuation Tokens
FIXME. punctuation - tokens that separate complex constructions into parts
Special Tokens
FIXME. special - tokens with special meaning to the preprocessor or compiler
Arguments in Function Calls
FIXME.
Selector Operators
FIXME.
Unary Operators
FIXME.
Multiplicative Operators
FIXME.
Additive Operators
FIXME.
Boolean Operators
FIXME.
Bitwise Operators
FIXME.
Equality and Comparison Operators
FIXME.
Conditional Expressions
The conditional expression has one of two values, depending on a boolean condition. It has the following form:
condition ? expression-1 : expression-2

where
condition is an expression that can be true (nonzero) or false (zero).
expression-1 and expression-2 are expressions that convert to a common type.
If condition is true then expression-1 is evaluated and its value becomes the value of the conditional expression. Otherwise, expression-2 is evaluated and its value becomes the value of the conditional expression.
Assignment Operators
FIXME.
The Comma Operator
FIXME.

No comments: