Basic concepts

This section provides definitions for the specific terminology and the concepts used when describing the C programming language. A C program is a sequence of text files (typically header and source files) that contain declarations. They undergo translation to become an executable program, which is executed when the OS calls its main function (unless it is itself the OS or another freestanding program, in which case the entry point is implementation-defined). Certain words in a C program have special meaning, they are keywords. Others can be used as identifiers, which may be used to identify objects, functions, struct, union, or enumeration tags, their members, typedef names, labels, or macros. Each identifier (other than macro) is only valid within a part of the program called its scope and belongs to one of four kinds of name spaces. Some identifiers have linkage which makes them refer to the same entities when they appear in different scopes or translation units. Definitions of functions include sequences of statements and declarations, some of which include expressions, which specify the computations to be performed by the program. Declarations and expressions create, destroy, access, and manipulate objects. Each object, function, and expression in C is associated with a type.

Comments

Comments serve as a sort of in-code documentation. When inserted into a program, they are effectively ignored by the compiler; they are solely intended to be used as notes by the humans that read source code.

Syntax

/* comment */ (1) // comment (2) (since C99)
  1. Often known as "C Style" or "multi-line" comments.
  2. Often known as "C++ Style" or "single-line" comments

All comments are removed from the program at translation phase 3 by replacing each comment with a single whitespace character.

C style

comments are usually used to comment large blocks of text or small fragments of code; however, they can be used to comment single lines. To insert text as a C-style comment, simply surround the text with /* and */. C-style comments tell the compiler to ignore all content between /* and */. Although it is not part of the C standard, /** and */ are often used to indicate documentation blocks; this is legal because the second asterisk is simply treated as part of the comment. Except within a (character constant), a (string literal), or a comment, the characters /* introduce a comment. The contents of such a comment are examined only to identify multibyte characters and to find the characters */ that terminate the comment. C-style comments cannot be nested.

C++ Style

C++ Style comments are usually used to comment single lines of text or code; however, they can be placed together to form multi-line comments. To insert text as a C++-style comment, simply precede the text with // and follow the text with the new line character. C++-style comments tell the compiler to ignore all content between // and a new line. Except within a (character constant), a (string literal), or a comment, the characters // introduce a comment that includes all multibyte characters up to, but not including, the next new-line character. The contents of such a comment are examined only to identify multibyte characters and to find the new-line character that terminates the comment. C++-style comments can be nested:

// y = f(x); // invoke algorithm

A C Style comment may appear within a C++ Style comment:

// y = f(x); /* invoke algorithm */

A C++ Style comment may appear within a C Style comment; this is a mechanism for excluding a small block of source code:

/* y = f(x); // invoke algorithms z = g(x); */
ASCII Chart

The following chart contains all 128 ASCII decimal (dec), octal (oct), hexadecimal (hex) and character (ch) codes.

ASCII Chart

Note: in Unicode, the ASCII character block is known as U+0000..U+007F Basic Latin.pdf

Example

include int main(void) { puts("Printable ASCII:"); for (int i = 32; i < 127; ++i) { putchar(i); putchar(i % 16 == 15 ? '\n' : ' '); } }
Type

Objects, functions, and expressions have a property called type, which determines the interpretation of the binary value stored in an object or evaluated by the expression.

Type classification

The C type system consists of the following types:

Objects and alignment

C programs create, destroy, access, and manipulate objects. An object, in C, is region of data storage in the execution environment, the contents of which can represent values (a value is the meaning of the contents of an object, when interpreted as having a specific type). Every object has.

Objects are created by declarations, allocation functions, string literals, compound literals, and by non-lvalue expressions that return structures or unions with array members. <

Object representation

Except for bit fields, objects are composed of contiguous sequences of one or more bytes, each consisting of CHAR_BIT bits, and can be copied with memcpy into an object of type unsigned char[n], where n is the size of the object. The contents of the resulting array are known as object representation. If two objects have the same object representation, they compare equal (except if they are floating-point NaNs). The opposite is not true: two objects that compare equal may have different object representations because not every bit of the object representation needs to participate in the value. Such bits may be used for padding to satisfy alignment requirement, for parity checks, to indicate trap representations, etc. If an object representation does not represent any value of the object type, it is known as trap representation. Accessing a trap representation in any way other than reading it through an lvalue expression of character type is undefined behavior. The value of a structure or union is never a trap representation even if any particular member is one. For the objects of type char, signed char, and unsigned char, every bit of the object representation is required to participate in the value representation and each possible bit pattern represents a distinct value (no padding, trap bits, or multiple representations allowed). When objects of integer types (short, int, long, long long) occupy multiple bytes, the use of those bytes is implementation-defined, but the two dominant implementations are big-endian (POWER, Sparc, Itanium) and little-endian (x86, x86_64): a big-endian platform stores the most significant byte at the lowest address of the region of storage occupied by the integer, a little-endian platform stores the least significant byte at the lowest address. See Endianness for detail. See also example below. Although most implementations do not allow trap representations, padding bits, or multiple representations for integer types, there are exceptions; for example a value of an integer type on Itanium may be a trap representation.

The Main Function

Every C program coded to run in a hosted execution environment contains the definition (not the prototype) of a function called main, which is the designated start of the program.

int main (void) { body } int main (int argc, char *argv[]) { body } /*another implementation-defined signature*/

Parameters

The names argc and argv stand for argument count and argument vector. Names and representation of the types of the parameters are arbitrary: int main(int ac, char** av) is equally valid. A common implementation-defined form of main is int main(int argc, char *argv[], char *envp[]), where a third argument, of type char*[], points at an array of pointers to the host environment variables.

Return value

If the return statement is used, the return value is used as the argument to the implicit call to exit() (see below for details). The values zero and EXIT_SUCCESS indicate successful termination, the value EXIT_FAILURE indicates unsuccessful termination.

Explanation

The main function is called at program startup, after all objects with static storage duration are initialized. It is the designated entry point to a program that is executed in hosted environment (that is, with an operating system). The name and type of the entry point to any freestanding program (boot loaders, OS kernels, etc) are implementation-defin The parameters of the two-parameter form of the main function allow arbitrary multibyte character strings to be passed from the execution environment (these are typically known as command line arguments). The pointers argv[1] .. argv[argc-1] point at the first characters in each of these strings. argv[0] is the pointer to the initial character of a null-terminated multibyte strings that represents the name used to invoke the program itself (or, if this is not supported by the host environment, argv[0][0] is guaranteed to be ze If the host environment cannot supply both lowercase and uppercase letters, the command line arguments are converted to lowerc The strings are modifiable, and any modifications made persist until program termination, although these modifications do not propagate back to the host environment: they can be used, for example, with str The size of the array pointed to by argv is at least argc+1, and the last element, argv[argc], is guaranteed to be a null pointer.

The main function has several special properties:

  1. A prototype for this function cannot be supplied by the program

  2. If the return type of the main function is compatible with int, then the return from the initial call to main (but not the return from any subsequent, recursive, call) is equivalent to executing the exit function, with the value that the main function is returning passed as the argument (which then calls the functions registered with atexit, flushes and closes all streams, and deletes the files created with tmpfile, and returns control to the execution environment).

  3. If the main function executes a return that specifies no value or, which is the same, reaches the terminating } without executing a return, the termination status returned to the host environment is undefined. (until C99)If the return type of the main function is not compatible with int (e.g. void main(void)), the value returned to the host environment is unspecified. If the returned type is compatible with int and control reaches the terminating }, the value returned to the environment is the same as if executing return 0;

Example

Demonstrates how to inform a program about where to find its input and where to write its results. Invocation: ./a.out indatafile outdatafile

#include int main(int argc, char *argv[]) { printf("argc = %d\n", argc); for(int ndx = 0; ndx != argc; ++ndx) { printf("argv[%d] --> %s\n", ndx,argv[ndx]); } printf("argv[argc] = %p\n",(void*)argv[argc]); }
C Keywords

This is a list of reserved keywords in C. Since they are used by the language, these keywords are not available for re-definition.

The most common keywords that begin with an underscore are generally used through their convenience macros:

Preprocessor

The preprocessor is executed at translation phase 4, before the compilation. The result of preprocessing is a single file which is then passed to the actual compiler.

Directives

The preprocessing directives control the behavior of the preprocessor. Each directive occupies one line and has the following format:

The null directive (# followed by a line break) is allowed and has no effect.

Capabilities

The preprocessor has the source file translation capabilities:

The following aspects of the preprocessor can be controlled:

Footnotes
  1. ↑ These are the directives defined by the standard. The standard does not define behavior for other directives: they might be ignored, have some useful meaning, or make the program ill-formed. Even if otherwise ignored, they are removed from the source code when the preprocessor is done. A common non-standard extension is the directive #warning which emits a user-defined message during compilation.
Functions

A function is a C language construct that associates a compound statement (the function body) with an identifier (the function name). Every C program begins execution from the main function, which either terminates, or invokes other, user-defined or library functions.

//function definition. int sum(int x, int y) { return x + y; }

Functions may accept zero or more parameters, which are initialized from the arguments of a function call operator, and may return a value to its caller by means of the return statement.

/* parameters x and y are initialized with the arguments 1 and 2*/ int n = sum(1, 2);

The body of a function is provided in a function definition. Each function must be defined only once in a program, unless the function is inline.

There are no nested functions (except where allowed through non-standard compiler extensions): each function definition must appear at file scope, and functions have no access to the local variables from the caller:

int main(void) //the main function definition { //function declaration may appear any scope int sum(int, int); int x = 1; // local variable in main sum(1, 2); // function ca } //function definition int sum(int a, int b) { return a + b; }
Statements

Statements are fragments of the C program that are executed in sequence. The body of any function is a compound statement, which, in turn is a sequence of statements and declarations:

int main(void) { // start of a compound statement int n = 1; // declaration (not a statement) n = n+1; // expression statement printf("n = %d\n", n); return 0; // return statement }

There are five types of statements:

  1. compound statements
  2. expression statements
  3. selection statements
  4. iteration statements
  5. jump statements
Labels

Any statement can be labeled, by providing a name followed by a colon before the statement itself.

identifier : statement case constant_expression : statement default : statement
  1. Target for goto.
  2. Case label in a switch statement.
  3. Default label in a switch statement.

Any statement (but not a declaration) may be preceded by any number of labels, each of which declares identifier to be a label name, which must be unique within the enclosing function (in other words, label names have function scope).

Label declaration has no effect on its own, does not alter the flow of control, or modify the behavior of the statement that follows in any way.

Compound statements

A compound statement, or block, is a brace-enclosed sequence of statements and declarations.

{ statement | declaration...(optional) }(1)

The compound statement allows a set of declarations and statements to be grouped into one unit that can be used anywhere a single statement is expected (for example, in an if statement or an iteration statement):

if (expr) // start of if-statement { // start of block int n = 1; // declaration printf("%d\n", n); // expression statement } // end of block, end of if-statement

Each compound statement introduces its own block scope.

The initializers of the variables with automatic storage duration declared inside a block and the VLA declarators are executed when flow of control passes over these declarations in order, as if they were statements:

int main(void) { // start of block { // start of block puts("hello"); int n = printf("abc\n"); int a[n*printf("1\n")]; printf("%zu\n", sizeof(a)); } // end of block, scope of n and a ends int n = 7; // n can be reused }
Expression statements

An expression followed by a semicolon is a statement.

expression(optional) ;

Most statements in a typical C program are expression statements, such as assignments or function calls.

An expression statement without an expression is called a null statement. It is often used to provide an empty body to a for or while loop. It can also be used to carry a label in the end of a compound statement or before a declaration:

puts("hello"); // expression statement char *s; while (*s++ != '\0') ; // null statement
Selection statements

The selection statements choose between one of several statements depending on the value of an expression.

if ( expression ) statement if ( expression ) statement switch ( expression ) statement
  1. if statement
  2. if statement with an else clause
  3. switch statement
Iteration statements

The iteration statements repeatedly execute a statement.

while ( expression ) statement do statement while ( expression ) ; for (init ; expression; expression)
  1. while loop
  2. do-while loop
  3. for loop
Jump statements

The jump statements unconditionally transfer flow control.

break ; continue ; return expression(optional) ; goto identifier ;
  1. break statement
  2. continue statement
  3. return statement with an optional expression
  4. goto statement