C Tutorial: Basics

Processing the command line

While many applicaions are graphical and are launched by a window manager on a windowing system, a great number of applications are designed for use via an interactive command-line shell or within shell scripts (sequences of commands along with controlling logic). Much of the appeal and power of the command line is the ability to redirect the output of a command to a file or to another command via a pipe. For example,

ls >files

will create a file named files that contains a list of all the files in the current directory

ls | wc -l

pipes the output of ls (list files) to wc -l, which counts the number of lines in the standard input (the wc command counts characters, words, and lines in a file; the -l option tells it to give just a line count).

Most command line programs need to be able to read the parameters given to them on the command line. Parameters are strings that are separated by white space. The shell is responsible for parsing them into a list, called an argument list, which is passed to every program via the execve system call and which the program receives as the arguments to the main function. The shell handles things such as quoted strings and escaped characters (e.g., escaping a quote or a space to make it part of the argument). Hence, a command such as:

grep -i "my name" file\ 1.txt file\ 2.txt

will result in the shell invoking the execve call with the the argument list containing the following strings:

arg[0] = "grep" arg[1] = "-i" arg[2] = "my name" arg[3] = "file 1.txt" arg[4] = "file 2.txt" arg[5] = 0

The list is an array of pointers. Note that the last element of the list is an entry containing the number 0, the null pointer. For conveniece, the main function gets invoked with a count of these arguments as the first parameter and the argument list itself as the second parameter.

Example 1

For the simplest example, let's just print each element of the argument list.

/* print the command line arguments Paul Krzyzanowski */ #include <stdio.h> int main(int argc, char **argv) { int i; printf("argc = %d\n", argc); for (i=0; i<argc; i++) printf("arg[%d] = \"%s\"\n", i, argv[i]); }

Download this file

Save this file by control-clicking or right clicking the download link and then saving it as echoargs.c.

Compile this program via:

gcc -o echoargs echoargs.c

If you don't have gcc, You may need to substitute the gcc command with cc or another name of your compiler.

Run the program:

./echo does she "weigh the same" as\ a duck?

You should see:

argc = 7 arg[0] = "./echo" arg[1] = "does" arg[2] = "she" arg[3] = "weigh the same" arg[4] = "as" arg[5] = "a" arg[6] = "duck?"

The shell did the work of parsing the double quotes aound "weigh the same" to treat that as a single argument as well as parsing the backslash ('\') as an escape character before the space to tread "as a" as a single argument.

Basic command-line data checking

Just using the argc and argv parameters directly is great for simple things. For example, suppose we write a command that takes two file names (for instance, a copy command). To ensure that the user entered no more and no less than two names on the command line, we simply check:

if (argc != 3) { fprintf(stderr, "usage: copy file1 file2\n"); exit(1); }

Note that we check if argc is 3 because the first argument (argv[0]) is the name of the command. Also note that we send the output to stderr, the standard error stream. This is good practice. Even if the user is redirecting the standard output of a program, any output to standard error will still go to the command window.

Using getopt

As you see, the arguments in a command line can comprise anything and it's up to the program to interpret them as it sees fit. A convention in most (but not all) Unix system commands is to place various options, or switches, at the beginning of the command and then the requisite strings or file names after them. The options are single letters that are prefixed by a hyphen and take a flexible syntax where you can enter the multiple options together in any order. For instance, these ls commands are all the same (try them):

ls -aFilL /etc ls -a -il -FL /etc ls -a -i -l -F -L /etc ls -ailLF /etc

Incidentally, these options instruct the ls command to list hidden files, show the file's i-node number, display the output in long format, follow symbolic links, and place a /, =, *, or @ character after a filename to identify its type. Run man ls to see the manual entry for the ls command. This might look horribly cryptic if you're not familiar with UNIX system commands. The design philosophy was that it's better for a newbie to struggle a bit and learn than for a skilled user to be forever burdened with verbose commands.

We can use the getopt function to help us deal with this sort of parsing of command-line arguments. The code below illustrates the use of getopt to process a command line that takes the following options:

  • -d, -m, and -p options. The -d is treated as a global debug flag.
  • An optional -s followed by a name.
  • A mandatory -f followed by a name.
  • One or more command line options after all that.

UNIX system manual pages will often summarize the syntax as:

getopt [-dmp] [-s name] -f name file [file ...]

The elements in brackets mean that they are optional. The -dmp means that any combination of -d, -m, and -p options is allowed.

There are things you need to know about using getopt:

  • Declare extern char *optarg; extern int optind; in your function. These are two external variables that getopt uses. The first, optarg, is used when parsing options that take a name as a parameter (as -s or -f in our example). In those cases, it contains a pointer to that parameter. The second, optind, is the current index into the main function's argument list. It's used to find arguments after all the option processing is done.
  • getopt takes three parameters. The first two are straight from main: the argument count and argument list. The third is the one that defines the syntax. It's a string that contains all the characters that make up the valid command line options. Any option that requires a parameter alongside it is suffixed by a colon (:). In our example, we have "df:mps:" since we accept command flags of -d, -f, -m, -p, and -s. Of these -f and -s require parameter. Of these -f and -s require a follow-on name.
  • getopt is called repeatedly. Each time it is called, it returns the next command-line option that it found. If theres a follow-on parameter, it is stored in optarg. If getopt encounters an option that's not in the list given to it, it returns a '?' character to signify an error. If there are no more command-line options found then getopt returns a -1. We usually program calls to getopt in a while loop with a switch statement within with a case for each option.
  • It's up to you to enforce which options are mandatory and which are optional. You may do this by setting variables and then checking them when getopt is done processing options. For example, in the example below we want -f to be mandatory, so we set:

    fflag = 1;

    when we encounter the -f option and then, after getopt is done, we check:

    if (fflag == 0) { fprintf(stderr, "%s: missing -f option\n", argv[0]); fprintf(stderr, usage, argv[0]); exit(1) }
  • It's also up to you to enforce if an option is used just once. You usually won't care but you might want to be extra diligent for options that require parameters. For example, we could make the code below more robust by checking the following for -f:

    case 'f': if (fflag == 1) { fprintf("%s: warning: -f is set multiple times\n"); } fflag = 1; fname = optarg; break;
  • When getopt is done, optind contains the index of the next command-line argument. If optind == argc then there are no more command-line arguments.

Example 2

Here's the sample program:

/* example of command line parsing via getopt usage: getopt [-dmp] -f fname [-s sname] name [name ...] Paul Krzyzanowski */ #include <stdio.h> #include <stdlib.h> int debug = 0; int main(int argc, char **argv) { extern char *optarg; extern int optind; int c, err = 0; int mflag=0, pflag=0, fflag=0; char *sname = "default_sname", *fname; static char usage[] = "usage: %s [-dmp] -f fname [-s sname] name [name ...]\n"; while ((c = getopt(argc, argv, "df:mps:")) != -1) switch (c) { case 'd': debug = 1; break; case 'm': mflag = 1; break; case 'p': pflag = 1; break; case 'f': fflag = 1; fname = optarg; break; case 's': sname = optarg; break; case '?': err = 1; break; } if (fflag == 0) { /* -f was mandatory */ fprintf(stderr, "%s: missing -f option\n", argv[0]); fprintf(stderr, usage, argv[0]); exit(1); } else if ((optind+1) > argc) { /* need at least one argument (change +1 to +2 for two, etc. as needeed) */ printf("optind = %d, argc=%d\n", optind, argc); fprintf(stderr, "%s: missing name\n", argv[0]); fprintf(stderr, usage, argv[0]); exit(1); } else if (err) { fprintf(stderr, usage, argv[0]); exit(1); } /* see what we have */ printf("debug = %d\n", debug); printf("pflag = %d\n", pflag); printf("mflag = %d\n", mflag); printf("fname = \"%s\"\n", fname); printf("sname = \"%s\"\n", sname); if (optind < argc) /* these are the arguments after the command-line options */ for (; optind < argc; optind++) printf("argument: \"%s\"\n", argv[optind]); else { printf("no arguments left to process\n"); } exit(0); }

Download this file

Save this file by control-clicking or right clicking the download link and then saving it as getopt.c.

Compile this program via:

gcc -o getopt getopt.c

Run the program and try various permutations of options. Some valid ones:

./getopt -dp -f test -s "more stuff" blah ./getopt -f test -d -p blah ./getopt -f test blah

Some invalid ones:

./getopt -dpq -f test -s "more stuff" blah ./getopt -f test -d -p blah blah blah ./getopt -f

Recommended

The Practice of Programming

 

The C Programming Language

 

The UNIX Programming Environment