C is a high-level, general-purpose programming language that provides a simple way for programmers to express operations using a standardized set of symbols, known as the C language character set.
Understanding this character set and how it is used in C programming is crucial for aspiring developers, as it forms the building blocks of every C program.
This comprehensive guide will discuss the C language character set in detail, covering various types of characters, tokens, keywords, and their specific uses in programming.
The character set in C language is a collection of symbols and notations used to represent program elements and to communicate meaning to both the programmer and the computer.
In total, there are 256 characters in the ASCII (American Standard Code for Information Interchange) character set, but not all are used in C programming. Here, we will focus on those that are relevant in C language.
There are five main types of character sets used in C programming:
Alphabets
Alphabets are the most basic building blocks of the C language character set. They consist of uppercase and lowercase letters from A to Z (i.e., A, B, C…Z and a, b, c…z)
.
Alphabets are used to form identifiers, which are names given to various program elements such as variables, functions, arrays, and structures.
Digits
Digits are the simplest numeric representations and include the numbers 0 through 9
.
In C language, digits can be used in identifiers or as numeric constants (e.g., integers and floating-point numbers).
Special characters
C programming utilizes a select group of special characters to convey specific meanings and perform certain actions.
Character |
Meaning |
Character |
Meaning |
+ | plus sign | - | minus sign(hyphen) |
* | asterisk | % | percent sign |
\\ | Backward slash | / | forward slash |
< | less than sign | = | equal to sign |
> | greater than sign | _ | underscore |
( | left parenthesis | ) | right parenthesis |
{ | left braces | } | right braces |
[ | left bracket | ] | right bracket |
, | comma | . | period |
' | single quotes | " | double quotes |
: | colon | ; | Semicolon |
? | Question mark | ! | Exclamation sign |
& | ampersand | | | vertical bar |
@ | at the rate | ^ | caret sign |
$ | dollar sign | # | hash sign |
~ | tilde sign | ` | back quotation mark |
White spaces
White spaces are non-printing characters used for formatting and separating program elements.
They include space ( ), horizontal tab (\t), vertical tab (\v), newline (\n), and form feed (\f)
.
White spaces are essential for clarity and readability in code, but they are ignored by the compiler when processing a program.
Escape characters
In C programming, certain characters like newline, tab, and backspace, which can't be printed as regular characters, are represented using escape sequences. An escape sequence is a combination of a backslash (\) followed by a character from the C character set.
Escape Sequence |
Meaning |
ASCII Value |
Purpose |
\b | backspace | 008 | Moves the cursor to the previous position of the current line |
\a | belle(alert) | 007 | Produces a beep sound for alert |
\r | carriage return | 013 | Moves the cursor to beginning of the current line |
\n | newline | 010 | Moves the cursor to the beginning of the next line |
\f | form feed | 012 | Moves the cursor to the initial position of the next logical page |
\0 | null | 000 | Null |
\v | vertical tab | 011 | Moves the cursor to next vertical tab position |
\t | Horizontal tab | 009 | Moves the cursor to the next horizontal tab position |
\\ | backslash | 092 | Presents a character with backslash ( \\ ) |
\" | Double quote | 034 | Represents a double quote character |
\' | Single quote | 039 | Represents a single quote character |
Trigraph Characters
In C programming, there might be situations where certain characters cannot be printed using the keyboard. To handle these scenarios, C provides a feature known as "trigraph sequences". A trigraph sequence consists of three characters, the first two being '??' and the third one is any character from the C character set. Here are some commonly used trigraph sequences:
Trigraph Sequence |
Symbol |
??< | { left brace |
??> | } right brace |
??( | [ left bracket |
??) | ] right bracket |
??! | | vertical bar |
??/ | \\ backslash |
??= | # hash sign |
??- | ~ tilde |
??' | ^ caret |
Tokens in C Language
Tokens are the smallest syntactically correct units in C programming.
They are formed by combining and grouping individual characters. C language consists of six types of tokens as follows:
Keywords
Keywords are reserved words in C language that have specific meanings and uses.
There are 32 keywords in C programming, which cannot be used as variable names or identifiers, as they are reserved for the language’s functionality.
Some common keywords include int, float, double, char, if, else, while, for, switch, case, break, return, and void
.
Identifiers
Identifiers are user-defined names given to various program elements like variables, functions, arrays, and structures.
They are created by following specific rules:
- An identifier can contain alphabets, digits, and underscores (_).
- Identifiers must begin with a letter or an underscore.
- They cannot contain special characters, keywords, or white spaces.
- Identifiers in C are case-sensitive.
Constants
Constants represent fixed values that do not change throughout the program’s execution.
They can be of various data types, such as integer constants, floating-point constants, and character constants.
Each constant is explicitly defined by a particular notation or syntax, for example:
- Integer constant: 42
- Floating-point constant: 3.14
- Character constant: ‘A’
Strings
Strings are sequences of characters enclosed within double quotes.
They are typically used to store and manipulate text in C programming.
Examples of strings include “Hello, World!”
and “C Programming”
.
Operators
Operators are special symbols used to perform specific operations in C language.
They act on operands (usually variables or constants) to produce a single result.
Some common operators in C programming are:
- Arithmetic operators: +, -, *, /, %
- Relational operators: <, >, <=, >=, ==, !=
- Logical operators: &&, ||, !
- Assignment operators: =, +=, -=, *=, /=, %=
- Increment/decrement operators: ++, –
- Bitwise operators: &, |, ^, ~, <<, >>
Punctuators
Punctuators are symbols or characters that indicate the structure and organization of a program in C language.
They play a crucial role in managing the flow, scope, and separation of statements and blocks, as well as providing clarity and readability.
Some common punctuators in C programming include:
- Semicolon (;): Indicates the end of a statement.
- Comma (,): Separates the elements in a list or declaration.
- Parentheses ((), {}, []): Enclose and group expressions or compound statements.