Home Previous Next

CSC220 :: Lecture Note :: Week 13
Assignments | Handouts | Resources | Email Thurman {Twitter::@compufoo Facebook::CSzero}
{GDT::Bits:: Time  |  Weather  |  Populations  |  Special Dates}



The Unix Philosophy

Here is the Unix Philosophy in a nutshell.

GDT::Bit:: The Unix Philosophy

{TopOfPage} {Resources}

Environment Variables

An environment variable is a variable that is made available to commands as part of the environment that the shell maintains.

Environment variables are useful for a variety of reasons:

Personal environment variables can be defined in your .profile (or .bash_profile) file stored in your HOME directory. The .profile file is executed every time you log into the system.

Prior to executing the .profile, the shell executes /etc/profile. This ensures that all users start with a common environment upon successful login.

On most systems, the following environment variables are automatically set for you.

HOME absolute path of your home directory
PATH used by your shell to find programs
PS1 your shell prompt
SHELL the name of the shell program you are using
LOGNAME your login account name
TERM the type of terminal you are using
MAIL absolute path of where your email is stored

An environment variable has a name and an associated value. Prefixing an environment variable with a dollar sign causes the shell to get the variable's value. The echo command can be used to display the value of a environment variables. Here are some examples.

   $ echo $HOME

   $ echo $SHELL

   $ echo $MAIL

   $ echo TERM is $TERM and LOGNAME is $LOGNAME
   TERM is vt100 and LOGNAME is gthurman

One of the most useful environment variables is PATH. PATH is used by the shell to locate commands. The following is a common value assigned to the PATH environment variable:  /bin:/usr/bin:/usr/local/bin. When a command is entered, the shell (using the value of PATH) first looks in the /bin directory for the program and executes it if it finds it; otherwise, is searches /usr/bin for the command, followed by /usr/local/bin. If the shell searches all the components without finding the executable file, then it prints a "command not found" error message.

By default, the shell does not look in your current working directory for the command. This can be changed to modifying your PATH as follows.

   $ PATH=$PATH:.        or    PATH=$PATH::

      Important note:  If you do put the current working directory on
      the PATH, then it should be last the component.  You never want
      it to be the first component.

If you enjoy playing games, then you may alter the PATH as follows.


You can use your own environment variables for abbreviations. For example.

   $ letters=$HOME/personal/letters
   $ cd $letters

If you have an environment variable that you want available to other commands, then you need to export it.

   $ EDITOR=/usr/bin/vi export EDITOR

      Now the EDITOR environment variable can be used by email
      programs, pager programs, and so on.

Environment variables are often used to "hide" the complexities of the directory structure. In other words, you can access a specific directory by name without having to know its absolute path.

By convention, personal environment variables are spelled in lower case to help distinguish them from those setup by the system.

Environment variable names must begin with a letter.

The env program can be used to display all the environment variables you have defined.

A variable can be removed from the environment using the unset command. Syntax.

   $ unset variable_name

{TopOfPage} {Resources}

Tilde Expansion [minimal notes]

Shells such as ksh and bash support tilde expansion. The ~ expands to the operand's HOME environment variable setting.

   example 1
   $ echo ~thurmunit

   example 2
   $ pwd
   $ cd ~hmumford
   $ pwd

   example 3
   $ who am i
   gdt   /dev/pts/14  Sep 25 05:08
   $ cd ~/tmp
   $ pwd

GNU.org:: Bash Reference Manual: Tilde Expansion

{TopOfPage} {Resources}

Standard Input/Output/Error

In Unix, a file is a sequence of bytes that is stored somewhere on a storage device (e.g. a disk). The content of a file does not have any significant meaning to the OS (it is simply a sequence of bytes). The "structure" of the bytes may have to conform to a particular format in order to be used by specific applications (i.e. programs or commands).

Every command, when invoked, has three I/O (input/output) streams opened for it: two for output and one for input. The output streams are called standard output and standard error. The input stream is called standard input. In "C" terminology, these I/O streams have the names stdout, stderr and stdin, respectively.

A command can get (read) input from the standard input stream, but it doesn't know where the input comes from (it could come from a file, the keyboard, or another command).

   keyboard--->+ ---> standard input ---> command

The output of a command (if any) can be written to either the standard output stream or the standard error stream. The command doesn't need to know where the output is going (it could go to a file, the screen, or another command).

                                     +---> file
   command ---> standard output ---> +---> screen
                                     +---> command

The shell manages the standard I/O streams for a command.

The shell assigns a file descriptor to each of the I/O streams: standard input is 0, standard output is 1, and standard error is 2.

By default, standard input is the keyboard, and standard output and standard error are both the screen. Re-directing these streams is accomplished on the command-line when the command is invoked using the re-direction operators. [Note: It can also be done internally by the command itself.]

The re-direction operators are as follows.

< cmd < file take input for cmd from file
> cmd > file send output of cmd to file
>> cmd >> file append output of cmd to file
| cmd1 | cmd2 run cmd1 and send output to cmd2

{TopOfPage} {Resources}

Redirecting Standard Output and Standard Error

In some instances you want to save the output of a command into a file. This can be accomplished using the shell's redirection operators > and >>.

Here are examples.

  $ who > who.out

    Executes the command 'who' and redirects the output into
    a file named "who.out".  No output is seen on 
    the screen.  The file "who.out" is created in 
    your current working directory.

  $ date > /tmp/now

    Executes the command 'date' and redirects the output to a
    file named "now" stored in the "/tmp" 
    directory.  (Note: If the directory "/tmp" contains 
    a directory named "now", then the shell will not execute the

  $ ls -l>myfiles

    Use of whitespace around the > operator is optional.  
    Executes the 'ls -l' command with output redirected into a 
    file named "myfiles".

If you redirect the output of a command into a file that doesn't exist, then the shell creates the file for you (assuming permissions are not a problem). If you redirect the output of a command into a file that already exists, then the content of the existing file is replaced with the command's output (again, if allowed).

   $ ps > /etc/foo

     This command should fail on most Unix systems.  We are attempting
     to create a file named "foo" in the "/etc"
     directory.  "/etc" is a system-level directory and 
     regular users are not allowed to write to it.

If you want to redirect the output of a command and have it appended to the content of an existing file, then you must use the >> operator.

   $ who > cmd.log

     The 'who' command is executed and its output redirected into
     a file named "cmd.log".

   $ date >> cmd.log
     The 'date' command is executed and its output is appended to
     the file named "cmd.log".  (If file "cmd.log"
     doesn't exist, then the system creates it.)  Again, the use of 
     whitespace around the operator is optional.

   $ ps >> cmd.log

     The 'ps' command is executed and its output is appended to the
     file named "cmd.log".

   $ cat 'mygroup:*:40000:' >> /etc/group

     This command should fail on most Unix Systems when executed as 
     a regular (i.e. non-root) user.  We are attempting to add a
     group to file that should be read-only.
Re-directing Both Output and Error

The following are some examples of how you can re-direct the standard error stream. Note: these examples work with Bourne, Korn and Bash shells, but they do not work with "C" shell.

   Assume  cmd  is some Unix command.

   $ cmd >out 2>err
     'cmd' is executed.  Standard output is re-directed to a file
     named "out" and standard error is re-directed to a 
     file named "err".

   $ cmd >out 2>&1

     All output (both standard output and standard error) are
     redirected into a file named "out".

   $ cmd 2>foo

     The standard error is re-directed to a file named "foo".

{TopOfPage} {Resources}

Redirecting Standard Input

Many commands obtain their input from files specified on the command-line. Most of these commands also work if no files are specified. In these cases, the command reads input from the standard input stream, which by default is the keyboard. The cat command provides a good example.

   $ cat /etc/group

     The command opens the file  /etc/group  and uses 
     the content of the file for input.

   $ cat
   Now I am entering data (via the keyboard) that is
   going to be used as input to the 'cat' command.
   This is the standard input stream.  You tell the
   shell that you are done entering data by typing
   a <CTRL-D> character.  [Note: on an ASCII
   system, <CTRL-D> is EOT.]

     The 'cat' command is executed without any arguments; therefore,
     it gets input from the standard input stream.

   $ cat < /etc/group

     From a user perspective this has the same effect as if
     /etc/group  was specified as a command-line argument,
     but internally the standard input stream was re-directed
     from the keyboard to the file  /etc/group.

In some cases you want to execute a command and have it get input from both the standard input stream and a file.

   $ cat - /etc/group
   This is the content of /etc/group:

      On some systems, the use of  -  may not be supported by
      all commands.  The file  /dev/stdin  can be used instead.

         $ cat /dev/stdin /etc/group
         This is the content of /etc/group:
Redirecting Input and Output

You can execute a command and have both the input and output streams re-directed.

   $ cat < /etc/group > foo

     Execute the 'cat' command re-directing standard input to
     the file  /etc/group  (i.e. the content of  /etc/group
     becomes the standard input stream).  The output of the
     'cat' command is re-directed to the file  foo.

{TopOfPage} {Resources}


Recall, the output of a command be redirected by using the > operator.

Many commands receive their input from a file or from in the standard input stream. Input can be redirected into a command by using the < command.

   $ sort /etc/group

      The  sort  program displays the content of  /etc/group
      sorted in alphabetical order.

   $ sort < /etc/group

      Instead of getting its input from a file, the  sort  command
      sorts the standard input stream (which in this happens to be
      the content of the  /etc/group  file).

In many cases you need to the take the output of command and use it as input to another command. This can be easily accomplished by using the | (or pipe) operator.

   $ ls -l | grep "Oct 28"

      Do a long listing and pipe the output into the  grep
      command searching for the pattern "Oct 28".  The output
      of the command sequence will be a long listing on all
      files that were created and/or modified on "Oct 28".

   $ wc -l /etc/passwd

      wc -l  counts and prints the number of lines found
      in the file argument  /etc/passwd.

   $ who | wc -l

      The output of the  who  command is piped into the
      word count program  wc  .   wc  when executed with
      the  -l  option prints the number of lines it finds
      in its input.

   $ cut -f1 -d":" /etc/passwd | sort | uniq | wc -l

      The 'cut' command prints the values of field
      one found in the /etc/passwd file having colon
      delimited fields.  The output is sorted using
      the Unix 'sort' command and the sorted output
      is sent into the 'uniq' program which eliminates
      duplicate values.  The unique list of values is
      then counted by the 'wc -l' command.

   $ cat /etc/passwd | tr [a-z] [A-Z] | grep THURM | wc -l

      This script prints the number of times the string
      "thurm" is located in the /etc/passwd file.  Searching
      is not case sensitive.  Exercise:  describe what is
      going on.

   $ tail -100 $logs | grep "^csnet:" 2>/dev/null | sort | pr -s | more

      Exercise:  describe what is going on.

Bell-Labs.com:: Why Ken Had to Invent the | [hyperlink to Dennis Ritchie's homepage]

Early Unix History and Evolution

The following was copy/pasted from Ritchie's website.

   Pipes appeared in Unix in 1972, well after the PDP-11 
   version of the system was in operation, at the suggestion 
   (or perhaps insistence) of M. D. McIlroy, a long-time 
   advocate of the non-hierarchical control flow that 
   characterizes coroutines. Some years before pipes 
   were implemented, he suggested that commands should 
   be thought of as binary operators, whose left and
   right operand specified the input and output files. 
   Thus a copy utility would be commanded by 

     inputfile copy outputfile

{TopOfPage} {Resources}

Some Pipe Examples

Print your LOGNAME excluding the first character in all upper-case.

   $ echo $LOGNAME

   $ echo $LOGNAME | cut -c2-8  | tr '[:lower:]' '[:upper:]'

Print your uid using the id command without using the id command's -u option.

   $ id
   uid=879(gthurman) gid=100(users) groups=100(users)

   $ id | cut -f2 -d"=" | cut -f1 -d"("

The following command-line will be analyzed during lecture.

   $ tail -100 $logs | grep "^cszero:" 2>/dev/null | sort | pr -s | more

{TopOfPage} {Resources}

Shell Meta-Characters

Shell meta-characters (or wildcards) can be used to reduce the amount of stuff you need to type and to concisely refer to a group of commonly related files.

The meta-characters are: * (asterisk), ? (question mark) and [] (square brackets).

   *    An asterisk matches 0 or more characters in a file name.
   ?    A question mark matches any single character.
   [ ]  Square brackets can surround a choice of characters to match.

Note: asterisk does not match file names that start with a dot (i.e. hidden files).

   $ ls -x
   d01.shtml     d02.shtml     d03.shtml   d04.shtml   d05

      Display all files names found in the current directory.

   $ ls -x *.shtml
   d01.shtml     d02.shtml     d03.shtml   d04.shtml

      Display all file names that end with the string  .shtml

   $ ls -x ?05

      Display all file names that are 3 characters long and
      end in  05

   $ ls -x d0[1-2]*
   d01.shtml     d02.shtml

      Display all file names that start with  d0  and are at 
      least three characters long.  The third character must 
      fall in the range of 1 to 2 (inclusive) and can be 
      followed by 0 or more characters.

   $ ls -x *5*l

      Display all file names that have a 5 somewhere in them
      (except the last character) and that end with a lowercase 
      ell character.

   $ ls -x d0[123].shtml
   d01.shtml     d02.shtml     d03.shtml

      Display all file names that start with  d0  followed
      by either a 1, 2, 3 followed by  s.html.  Every file
      name must be 9 characters long.

   $ ls -x ???

      Display all file names that are three characters long.

   $ cat d*

      Display all file names that start with a lowercase dee.

   $ rm *.shtml

      Removes all files that end with the string  .shtml

   $ grep "the end" d0?

      Searches for the pattern "the end" in all files having names
      that are three characters long and start with  d0

   $ mv *0* /tmp
      Moves all files having a 0 in their name to the /tmp directory.

   $ mv temp[0-9] /tmp
      Moves all files having the name  temp  followed by a single
      digit to the  /tmp  directory.

   $ mv [A-Z]* /tmp

      Moves all files beginning with an uppercase letter to 
      the /tmp directory.

When you enter a command-line that contains meta-characters. The shell expands the command-line to include the fully named files. In otherwords, commands see complete file names; they do not know about the meta-characters.

   Assume we have a directory that has the following files:
      x.out    y.out   z.err   w.out    t.err   c.obj

   $ rm *.out

      The shell expands  *.out  and the following command-line
      is executed:  rm x.out y.out w.out

   $ rm *.junk

      No file names are found ending with  .junk  therefore  *.junk 
      is the argument that is passed to the  'rm'  program.  The 'rm'
      command will try to open a file named  *.junk  and will fail
      resulting in an error message.

Caution needs to be exercised when using meta-characters on the command-line. The following has happened to many users:

   $ rm x *

      The user wants to remove all files starting with  x  but
      the space between the  x  and  *  causes all files to be removed.
      The shell expands the * to all file names which in turn get
      passed onto the 'rm' command.

There is a maximum length that the command-line can end up being. For example, if you have directory containing a 1000 files having long file names, then a command like rm * may not work.

What happens if you have file name that has an asterisk in it? For example, suppose we have directory containing the following files:

   x.out  x.err  x.*  x.obj  x.c
and we want to remove the file named "x.*".
   $ rm x.\*

{TopOfPage} {Resources}

Introduction to the grep Command

The grep command is used to search for a pattern in a file or list of files. The pattern used by grep is called a regular expression. On some Unix systems, by default, the grep command supports basic-REs. [grep: global regular expression print (g/re/p) or general regular expression parser]

   $ grep Unix d01.shtml

      Print all lines from the file  d01.shtml  containing
      the pattern  Unix.

   $ grep -i UNIX d*.shtml

      Search all files starting the character 'd' and ending
      in the string ".shtml" that contain the pattern  UNIX.
      The search is not case sensitive; therefore, Unix, UNix,
      uniX, UNIX, unix, ..., all match.

   $ grep "#include <iostream.h>" *.c

      Look for the string  #include <iostream.h> in 
      all files ending in ".c".  Since the pattern contains
      strings and metacharacters, it must be specified using
      double quotes.

   $ grep -l "while (" *.cpp

      Search all "*.cpp" files for the pattern  "while ("
      and only display the names of the files with one
      more matching lines, not the lines themselves.
   $ grep -v UNIX foo

      Display only those lines from the file  foo  that
      do not contain the pattern   UNIX.

   $ grep -v -c UNIX  foo

      Display a count of the number of lines from the file
      foo  that do not contain the pattern  UNIX.

   $ who | grep "^jdoe "

      Find out if  jdoe  is logged in.

   $ grep Unix foo >/dev/null 2>&1
   $ echo $?

      Search the file  foo  for the pattern  Unix  and re-direct
      the output to  /dev/null.  Use the exit status of the  grep
      command to determine if the pattern was found (0 indicates
      that it was, 1 implies it wasn't).

   $ crypt some_key < roster.done | grep jdoe

      Find the password entry for use  jdoe  in the encrypted
      file  roster.done  that was encrypted using  some_key.

   $ grep -E "unix|UNIX" foo
   $ egrep "unix|UNIX" foo

      -E causes  grep  to run as  egrep.  egrep  supports 
      extended-REs (Regular Expressions).  Search for either
      the pattern  unix  or the pattern  UNIX.

   $ grep -F hello foo
   $ fgrep hello foo

      -F causes  grep  to run as  fgrep.   If the pattern is 
      a string literal (i.e. fixed), then  fgrep  may be used.  

   $ pgrep -u root

      pgrep  is a customized  grep  that is used to search
      the process table.  -u LOGNAME  displays a list of
      PIDs for user LOGNAME.

GNU.org:: grep, print lines matching a pattern

Unix Systems

I suspected that grep, egrep and fgrep were all the same executable file.

This is true on a Mandrake 9.0 system. [I was expecting hard-link usage.]

   $ (cd bin; ls -l *grep)
   lrwxr-xr-x    1 root     root            4 Oct 18 05:23 egrep -> grep*
   lrwxr-xr-x    1 root     root            4 Oct 18 05:23 fgrep -> grep*
   -rwxr-xr-x    1 root     root        88396 Aug 21 02:37 grep*

This is not true on Solaris.

   $ (cd /bin; ls -l *grep)
   -r-xr-xr-x   1 root     bin        26940 Jan  5  2000 egrep
   -r-xr-xr-x   1 root     bin        11820 May 30 15:13 fgrep
   -r-xr-xr-x   1 root     bin        10032 Jan  5  2000 grep

Nor is it true on 7.3 Red Hat Linux.

   $ (cd /bin; ls -l *grep)
   -rwxr-xr-x    1 root     root        49244 Feb 27  2001 egrep
   -rwxr-xr-x    1 root     root        49244 Feb 27  2001 fgrep
   -rwxr-xr-x    1 root     root        49244 Feb 27  2001 grep

   $ (cd /bin; md5sum *grep)
   85f3b6bb02c5b67b19ff5af0b456db31  egrep
   3e1d1e10d530df11f1b9589ee5f55086  fgrep
   1af31bdf6bb72347f94a4731f276e3be  grep

{TopOfPage} {Resources}

Introduction to the find Command

The find command locates files that match a given set of criteria in a hierarchy of directories. The criterion may be filename or a specified property of a file (such as its modification date, size, or type). You can also direct the command remove, print, or otherwise act on the file.

   find  pathname  search-options   action-option

      pathname -- directory from which  find  begins the search; the
                  search is recursive (sub-directories, if any, are
                  also searched)
      search-options -- identifies the file you are interested in
      action-options -- tells what to do once the file is found

   $ find . -print

      Begin searching from current working directory and print
      the name of all files found.

   $ find / -name foo.c -print

      Begin searching for a file named  foo.c  from the root
      directory and print its name if found.  More than one
      file can be found with the name  foo.c.  Caution:  if you 
      are not super-user, then you will not have permission to 
      search many directories.  

   $ find $HOME -name "*.c" -print

      Begin searching in the HOME directory for all files
      that end in ".c".  Print their names, if found.

   $ find . -type d -print

      Begin searching from current working directory and print
      all files that are of type directory.

   $ find /tmp -type f -print

      Begin searching from the  /tmp  directory and print
      all files that are of regular files.

   $ find . -mtime 10 -print

      Find and display files last modified exactly 10 days ago.

   $ find . -mtime -10 -print

      Find and display files last modified less than 10 days ago.

   $ find . -mtime +10 -print

      Find and display files last modified more than 10 days ago.

   $ find . -atime +10 -print

      Find and display files last accessed (read) more 
      than 10 days ago.

   $ find . -newer .lastbkup -print
      Find and display files modified more recently than  
      .lastbkup  was.

   $ find . -user jdoe -size +50 -print

      Find and display files owned by  jdoe  that are 
      larger than 50 blocks in size.

   $ find . -type f -exec chmod 644 {} \;

      Find and change permissions on all regular files.

   $ find . -name foo.c -mtime +30 -exec rm {} \;

      Find and remove all instances of the file  foo.c
      that are 30 days old.

   $ find . -name foo.c -mtime +30 -ok rm {} \;

      Find and interactively remove all instances of the 
      file  foo.c  that are 30 days old.

   $ find . -inum 8888 -print

      Find all and display files having i-node number 8888.
      This is a handy way to remove files that have goofy
      (e.g. non-printable) characters in their names.
Finding Things in Unix [ONLamp.com; part 1] [opens new browser window]
Find: Part Two [ONLamp.com] [opens new browser window]

{TopOfPage} {Resources}

Home Previous Next