Home Previous Next

CSC220 :: Lecture Note :: Week 14
Assignments | Handouts | Resources | Email Thurman {Twitter::@compufoo Facebook::CSzero}
{GDT::Bits:: Time  |  Weather  |  Populations  |  Special Dates}

Overview

Assignment(s):


The Unix Philosophy

Here is the Unix Philosophy in a nutshell.

GDT::Bit:: The Unix Philosophy

{TopOfPage} {Resources}


Redirecting Standard Input

Many commands obtain their input from files specified on the command-line. Most of these commands also work if no files are specified. In these cases, the command reads input from the standard input stream, which by default is the keyboard. The cat command provides a good example.

   $ cat /etc/group

     The command opens the file  /etc/group  and uses 
     the content of the file for input.

   $ cat
   Now I am entering data (via the keyboard) that is
   going to be used as input to the 'cat' command.
   This is the standard input stream.  You tell the
   shell that you are done entering data by typing
   a <CTRL-D> character.  [Note: on an ASCII
   system, <CTRL-D> is EOT.]
   <CTRL-D>

     The 'cat' command is executed without any arguments; therefore,
     it gets input from the standard input stream.

   $ cat < /etc/group

     From a user perspective this has the same effect as if
     /etc/group  was specified as a command-line argument,
     but internally the standard input stream was re-directed
     from the keyboard to the file  /etc/group.

In some cases you want to execute a command and have it get input from both the standard input stream and a file.

   $ cat - /etc/group
   This is the content of /etc/group:
   ==================================
   <CTRL-D>

      On some systems, the use of  -  may not be supported by
      all commands.  The file  /dev/stdin  can be used instead.

         $ cat /dev/stdin /etc/group
         This is the content of /etc/group:
         ==================================
         <CTRL-D>
Redirecting Input and Output

You can execute a command and have both the input and output streams re-directed.


   $ cat < /etc/group > foo

     Execute the 'cat' command re-directing standard input to
     the file  /etc/group  (i.e. the content of  /etc/group
     becomes the standard input stream).  The output of the
     'cat' command is re-directed to the file  foo.

{TopOfPage} {Resources}


Pipes

Recall, the output of a command be redirected by using the > operator.

Many commands receive their input from a file or from in the standard input stream. Input can be redirected into a command by using the < command.

   $ sort /etc/group

      The  sort  program displays the content of  /etc/group
      sorted in alphabetical order.

   $ sort < /etc/group

      Instead of getting its input from a file, the  sort  command
      sorts the standard input stream (which in this happens to be
      the content of the  /etc/group  file).

In many cases you need to the take the output of command and use it as input to another command. This can be easily accomplished by using the | (or pipe) operator.

   $ ls -l | grep "Oct 28"

      Do a long listing and pipe the output into the  grep
      command searching for the pattern "Oct 28".  The output
      of the command sequence will be a long listing on all
      files that were created and/or modified on "Oct 28".

   $ wc -l /etc/passwd

      wc -l  counts and prints the number of lines found
      in the file argument  /etc/passwd.

   $ who | wc -l

      The output of the  who  command is piped into the
      word count program  wc  .   wc  when executed with
      the  -l  option prints the number of lines it finds
      in its input.

   $ cut -f1 -d":" /etc/passwd | sort | uniq | wc -l

      The 'cut' command prints the values of field
      one found in the /etc/passwd file having colon
      delimited fields.  The output is sorted using
      the Unix 'sort' command and the sorted output
      is sent into the 'uniq' program which eliminates
      duplicate values.  The unique list of values is
      then counted by the 'wc -l' command.

   $ cat /etc/passwd | tr [a-z] [A-Z] | grep THURM | wc -l

      This script prints the number of times the string
      "thurm" is located in the /etc/passwd file.  Searching
      is not case sensitive.  Exercise:  describe what is
      going on.

   $ tail -100 $logs | grep "^csnet:" 2>/dev/null | sort | pr -s | more

      Exercise:  describe what is going on.

Bell-Labs.com:: Why Ken Had to Invent the | [hyperlink to Dennis Ritchie's homepage]

Early Unix History and Evolution

The following was copy/pasted from Ritchie's website.

   Pipes appeared in Unix in 1972, well after the PDP-11 
   version of the system was in operation, at the suggestion 
   (or perhaps insistence) of M. D. McIlroy, a long-time 
   advocate of the non-hierarchical control flow that 
   characterizes coroutines. Some years before pipes 
   were implemented, he suggested that commands should 
   be thought of as binary operators, whose left and
   right operand specified the input and output files. 
   Thus a copy utility would be commanded by 

     inputfile copy outputfile

{TopOfPage} {Resources}


Some Pipe Examples

Print your LOGNAME excluding the first character in all upper-case.

   $ echo $LOGNAME
   gthurman

   $ echo $LOGNAME | cut -c2-8  | tr '[:lower:]' '[:upper:]'
   THURMAN

Print your uid using the id command without using the id command's -u option.

   $ id
   uid=879(gthurman) gid=100(users) groups=100(users)

   $ id | cut -f2 -d"=" | cut -f1 -d"("
   879

The following command-line will be analyzed during lecture.

   $ tail -100 $logs | grep "^cszero:" 2>/dev/null | sort | pr -s | more

{TopOfPage} {Resources}


Shell Meta-Characters

Shell meta-characters (or wildcards) can be used to reduce the amount of stuff you need to type and to concisely refer to a group of commonly related files.

The meta-characters are: * (asterisk), ? (question mark) and [] (square brackets).

   *    An asterisk matches 0 or more characters in a file name.
   ?    A question mark matches any single character.
   [ ]  Square brackets can surround a choice of characters to match.

Note: asterisk does not match file names that start with a dot (i.e. hidden files).

   $ ls -x
   d01.shtml     d02.shtml     d03.shtml   d04.shtml   d05

      Display all files names found in the current directory.

   $ ls -x *.shtml
   d01.shtml     d02.shtml     d03.shtml   d04.shtml

      Display all file names that end with the string  .shtml

   $ ls -x ?05
   d05

      Display all file names that are 3 characters long and
      end in  05

   $ ls -x d0[1-2]*
   d01.shtml     d02.shtml

      Display all file names that start with  d0  and are at 
      least three characters long.  The third character must 
      fall in the range of 1 to 2 (inclusive) and can be 
      followed by 0 or more characters.

   $ ls -x *5*l
   d05.shtml

      Display all file names that have a 5 somewhere in them
      (except the last character) and that end with a lowercase 
      ell character.

   $ ls -x d0[123].shtml
   d01.shtml     d02.shtml     d03.shtml

      Display all file names that start with  d0  followed
      by either a 1, 2, 3 followed by  s.html.  Every file
      name must be 9 characters long.

   $ ls -x ???
   d05

      Display all file names that are three characters long.

   $ cat d*

      Display all file names that start with a lowercase dee.

   $ rm *.shtml

      Removes all files that end with the string  .shtml


   $ grep "the end" d0?

      Searches for the pattern "the end" in all files having names
      that are three characters long and start with  d0

   $ mv *0* /tmp
      
      Moves all files having a 0 in their name to the /tmp directory.

   $ mv temp[0-9] /tmp
      
      Moves all files having the name  temp  followed by a single
      digit to the  /tmp  directory.

   $ mv [A-Z]* /tmp

      Moves all files beginning with an uppercase letter to 
      the /tmp directory.

When you enter a command-line that contains meta-characters. The shell expands the command-line to include the fully named files. In otherwords, commands see complete file names; they do not know about the meta-characters.

   Assume we have a directory that has the following files:
      x.out    y.out   z.err   w.out    t.err   c.obj

   $ rm *.out

      The shell expands  *.out  and the following command-line
      is executed:  rm x.out y.out w.out

   $ rm *.junk

      No file names are found ending with  .junk  therefore  *.junk 
      is the argument that is passed to the  'rm'  program.  The 'rm'
      command will try to open a file named  *.junk  and will fail
      resulting in an error message.

Caution needs to be exercised when using meta-characters on the command-line. The following has happened to many users:

   $ rm x *

      The user wants to remove all files starting with  x  but
      the space between the  x  and  *  causes all files to be removed.
      The shell expands the * to all file names which in turn get
      passed onto the 'rm' command.

There is a maximum length that the command-line can end up being. For example, if you have directory containing a 1000 files having long file names, then a command like rm * may not work.

What happens if you have file name that has an asterisk in it? For example, suppose we have directory containing the following files:

   x.out  x.err  x.*  x.obj  x.c
and we want to remove the file named "x.*".
   $ rm x.\*

{TopOfPage} {Resources}


Introduction to the grep Command

The grep command is used to search for a pattern in a file or list of files. The pattern used by grep is called a regular expression. On some Unix systems, by default, the grep command supports basic-REs. [grep: global regular expression print (g/re/p) or general regular expression parser]

   $ grep Unix d01.shtml

      Print all lines from the file  d01.shtml  containing
      the pattern  Unix.

   $ grep -i UNIX d*.shtml

      Search all files starting the character 'd' and ending
      in the string ".shtml" that contain the pattern  UNIX.
      The search is not case sensitive; therefore, Unix, UNix,
      uniX, UNIX, unix, ..., all match.

   $ grep "#include <iostream.h>" *.c

      Look for the string  #include <iostream.h> in 
      all files ending in ".c".  Since the pattern contains
      strings and metacharacters, it must be specified using
      double quotes.

   $ grep -l "while (" *.cpp

      Search all "*.cpp" files for the pattern  "while ("
      and only display the names of the files with one
      more matching lines, not the lines themselves.
   
   $ grep -v UNIX foo

      Display only those lines from the file  foo  that
      do not contain the pattern   UNIX.

   $ grep -v -c UNIX  foo

      Display a count of the number of lines from the file
      foo  that do not contain the pattern  UNIX.

   $ who | grep "^jdoe "

      Find out if  jdoe  is logged in.

   $ grep Unix foo >/dev/null 2>&1
   $ echo $?

      Search the file  foo  for the pattern  Unix  and re-direct
      the output to  /dev/null.  Use the exit status of the  grep
      command to determine if the pattern was found (0 indicates
      that it was, 1 implies it wasn't).

   $ crypt some_key < roster.done | grep jdoe

      Find the password entry for use  jdoe  in the encrypted
      file  roster.done  that was encrypted using  some_key.

   $ grep -E "unix|UNIX" foo
   $ egrep "unix|UNIX" foo

      -E causes  grep  to run as  egrep.  egrep  supports 
      extended-REs (Regular Expressions).  Search for either
      the pattern  unix  or the pattern  UNIX.

   $ grep -F hello foo
   $ fgrep hello foo

      -F causes  grep  to run as  fgrep.   If the pattern is 
      a string literal (i.e. fixed), then  fgrep  may be used.  

   $ pgrep -u root

      pgrep  is a customized  grep  that is used to search
      the process table.  -u LOGNAME  displays a list of
      PIDs for user LOGNAME.

GNU.org:: grep, print lines matching a pattern

Unix Systems

I suspected that grep, egrep and fgrep were all the same executable file.

This is true on a Mandrake 9.0 system. [I was expecting hard-link usage.]

   $ (cd bin; ls -l *grep)
   lrwxr-xr-x    1 root     root            4 Oct 18 05:23 egrep -> grep*
   lrwxr-xr-x    1 root     root            4 Oct 18 05:23 fgrep -> grep*
   -rwxr-xr-x    1 root     root        88396 Aug 21 02:37 grep*

This is not true on Solaris.

   $ (cd /bin; ls -l *grep)
   -r-xr-xr-x   1 root     bin        26940 Jan  5  2000 egrep
   -r-xr-xr-x   1 root     bin        11820 May 30 15:13 fgrep
   -r-xr-xr-x   1 root     bin        10032 Jan  5  2000 grep

Nor is it true on 7.3 Red Hat Linux.

   $ (cd /bin; ls -l *grep)
   -rwxr-xr-x    1 root     root        49244 Feb 27  2001 egrep
   -rwxr-xr-x    1 root     root        49244 Feb 27  2001 fgrep
   -rwxr-xr-x    1 root     root        49244 Feb 27  2001 grep

   $ (cd /bin; md5sum *grep)
   85f3b6bb02c5b67b19ff5af0b456db31  egrep
   3e1d1e10d530df11f1b9589ee5f55086  fgrep
   1af31bdf6bb72347f94a4731f276e3be  grep

{TopOfPage} {Resources}


Introduction to the find Command

The find command locates files that match a given set of criteria in a hierarchy of directories. The criterion may be filename or a specified property of a file (such as its modification date, size, or type). You can also direct the command remove, print, or otherwise act on the file.

   find  pathname  search-options   action-option

      pathname -- directory from which  find  begins the search; the
                  search is recursive (sub-directories, if any, are
                  also searched)
      search-options -- identifies the file you are interested in
      action-options -- tells what to do once the file is found

   $ find . -print

      Begin searching from current working directory and print
      the name of all files found.

   $ find / -name foo.c -print

      Begin searching for a file named  foo.c  from the root
      directory and print its name if found.  More than one
      file can be found with the name  foo.c.  Caution:  if you 
      are not super-user, then you will not have permission to 
      search many directories.  

   $ find $HOME -name "*.c" -print

      Begin searching in the HOME directory for all files
      that end in ".c".  Print their names, if found.

   $ find . -type d -print

      Begin searching from current working directory and print
      all files that are of type directory.

   $ find /tmp -type f -print

      Begin searching from the  /tmp  directory and print
      all files that are of regular files.

   $ find . -mtime 10 -print

      Find and display files last modified exactly 10 days ago.

   $ find . -mtime -10 -print

      Find and display files last modified less than 10 days ago.

   $ find . -mtime +10 -print

      Find and display files last modified more than 10 days ago.

   $ find . -atime +10 -print

      Find and display files last accessed (read) more 
      than 10 days ago.

   $ find . -newer .lastbkup -print
 
      Find and display files modified more recently than  
      .lastbkup  was.

   $ find . -user jdoe -size +50 -print

      Find and display files owned by  jdoe  that are 
      larger than 50 blocks in size.

   $ find . -type f -exec chmod 644 {} \;

      Find and change permissions on all regular files.

   $ find . -name foo.c -mtime +30 -exec rm {} \;

      Find and remove all instances of the file  foo.c
      that are 30 days old.

   $ find . -name foo.c -mtime +30 -ok rm {} \;

      Find and interactively remove all instances of the 
      file  foo.c  that are 30 days old.

   $ find . -inum 8888 -print

      Find all and display files having i-node number 8888.
      This is a handy way to remove files that have goofy
      (e.g. non-printable) characters in their names.
Finding Things in Unix [ONLamp.com; part 1] [opens new browser window]
Find: Part Two [ONLamp.com] [opens new browser window]

{TopOfPage} {Resources}


What is a Process?

A program is an executable (or binary) file the resides on some sort of secondary storage device (e.g. hard drive, floppy disk, tape, CD-ROM, etc.). [Programs are sometimes called applications or commands.]

A program is typically generated by translating some sort of source code into a machine language that is readable by the CPU (Central Processing Unit). Some programs contain source code that is interpreted by some other program. These are commonly referred to as scripts.

Typically, program files are located in a common directory or collection of directories. On many Unix systems, programs are found in the following locations:

   /bin
   /usr/bin
   /usr/local/bin

Note that the term bin is used to represent the word binary. Programs are often called binary files. Binary files have a non-ASCII format to them and cannot be modified using a regular text editor. Not all binary files are programs -- in some cases, they are data base type files.

Sadly, a binary file generated on one version of Unix may not be executable on a different Unix system even though the systems may be using the same CPU. [There are different types of executable formats.]

To execute a program, the binary file must be loaded into the memory of the computer. Once this is accomplished, the program now becomes a process. Said another way: A process is an instance of a program.

The shell is usually the program responsible for getting a program loaded into memory. [The shell -- when executing -- is a process.]

Once a process has been created, it is assigned a PID or process identifier that is used to track the process while it executes.

Every process has a parent process to which it belongs. In many cases, that parent process is the shell.

The following are some crude notes that have not been incorporated into the lecture note. The are intended for the CSC178 -- Programming in the Unix Environment course.

   + process subsystem
      * process control
         + creation
         + termination
      * scheduling
      * memory management
         - swapping
         - paging
   + layout of a process
      * text: 
         machine instructions executed by the CPU typically, 
         shareable and read-only
      * initialized data segment: 
         data that is defined & initialized outside any 
         function (e.g. int i = 5;)
      * uninitialized data segment:  
         (bss) data in this segment is initialized by the kernel 
         to 0 (or null pointers) before the program starts executing 
         (bss - old assembler operator "block started by symbol")
      * stack: 
         where auto variables are stored along with info that is 
         saved each time a function is called
      * heap:  dynamic memory alloc'd from here

         +-------------+
         |cmdline args |
         |env vars     |
         ---------------
         | stack       |
         ---------------
         |   ...       |
         |   heap      |
         |   ...       |
         ---------------
         | uninit data |
         -----------------------+
         | init'd data |
         ---------------          read from program file by exec
         | text        |
         -----------------------+

   + pid = fork()   (fork is the only way for a Unix to create a new process)
      * in parent, pid > 0; in the child, pid is 0
      * process 0, created internally by the kernel when the system is booted,
        is the only process not created via fork
      * process creation
         - slot alloc'd in process tbl
         - unique pid assigned 
         - logical copy of the parent process is made
            (text area may not be included in this copy)
         - file and inode tbl counters incremented
         - return pid of child to parent, and 0 to child
      * child inherits the following (incomplete list):  read/effective 
         uid/gid, current working directory, root directory, 
         file mode creation mask, environment, resource 
   + the 'ps' command
      - pid and ppid
      - pid 0 is scheduler
      - pid 1 is init
      - pid 2 is pageout (or many systems)

{TopOfPage} {Resources}


More on Processes

A program file loaded into memory becomes a process.

Every process is assigned a PID (process id) by the kernel. PIDs are assigned on a sequential basis and usually wrap when they read the value 32,767 (2-bytes might be used to store the PID in the process table).

At the lowest-level, a process is created by a fork system call. If the new process is a different program file, then a exec is invoked (fork and exec). [ Webopedia.com::system call]

<side-bar>
Unix consists of six program system calls.

	open, read, write, close, fork, exec
</side-bar>

Every process has a parent process. For shell users, their shell is the parent process of the commands executed.

To aid with terminology, the following paragraph was taken from page 656 of the text book for the course.

There is a special metaphor that applies to processes in the Unix system. The processes have life: they are alive or dead; they are spawned (born) or die; they become zombies or they become orphaned. They are parents or children, and when you want to get rid of one, you kill it.

{TopOfPage} {Resources}


Process Related Commands

There are numerous process-related commands that come with Unix. Two of the more commonly used commands are ps and kill.

The ps Command

The ps (process status) command is used to see a list of active processes.

The default output of ps displays the PID, what terminal the processes was invoked on, the amount of CPU time given to the process, and the process name.

   $ ps -l      # gives a long listing of your current processes
   $ ps -aux    # displays all processes

When you do a long form of the ps command, the process state (S) is displayed. The following are the various states that a process can be in:

   O     running
   S     sleeping
   R     runnable process in queue
   I     idle process, being created
   Z     zombie
   T     process stopped and being traced
   X     process waiting for more memory

The pstree command gives a "tree" diagram of the process table.

The top command gives a "real-time" image of processor activity.

The kill Command

The kill command is used to stop a running process. The simpliest version of the command is as follows:

   $ kill 1234
where 1234 is the PID of the process you want to stop.

The kill command is used to send a signal to the process. In some cases, the program can choose to ignore signals. If you try to kill a process but it doesn't die, then execute the following:

   $ kill -9 1234
where "-9" is an unconditional kill signal (programs cannot choose to ignore this signal).

If you execute kill without specifying a signal value, then a 15 (SIGTERM) is sent by default.

If you want to kill all the processes you may have running on a particular terminal session, then execute the following:

   $ kill 0

You are only allowed to kill your own processes. Only super-user has random killing privledges.

The pkill Command

To be completed.

{TopOfPage} {Resources}


Daemons

Daemon processes are used to extend the functionality of the OS. They are not part of the kernel, but they play important roles in providing applications not directly supported by the kernel. [webserver, telnet, mail, line printer, cron, etc.]

A daemon is a process that starts at boot-time and continues as long as the system is up. Some daemons start and stop on an as needed basis and some run at scheduled time periods.

Some daemons can be thought of as service providers.

Many daemon program names end with a dee ('d'). [httpd, inetd, lpd, crond, ...]

Some popular daemon programs.

   init, cron, inetd, lpd, sendmail
   paging daemon (pageout, kpiod, pagedaemon)
   swapping daemon (swapper, kswapd)

inetd is a super-daemon. [Note: inetd has been replaced with xinetd.]

About the Word: daemon

There term daemon was first used in computing during the early 1960's. At one time the term meant "an attendant spirit that influences one's character or personality. A daemon is neither good or evil; they are creatures of independent thought and will."

{TopOfPage} {Resources}


Running Commands in the Background

Unix is a multi-tasking system. Not only can it support multiple users, but each user can in turn have multiple tasks running.

Usually when you execute a command, the shell takes over your terminal and you have to wait for the command to end before you see the shell prompt again. In some instances, the command you need to execute will take a long time and you don't want to sit idle waiting for it to finish; in other words, you want to "spawn" off the command letting it run in the background while you go ahead and issue more commands. This can be accomplished by using an ampersand at the end of the command-line.

   $ make &
   939
   $ who
   ...
   $ ps
   ...
   $ date
   ...

      Typically, the 'make' command can take a long time to
      execute.  Therefore, we start off the command and use
      a & on the command-line to spin it off in the background.
      This shell displays the PID (process id) of the command
      and re-issues the shell prompt.  Now we can execute more
      commands while the 'make' program runs as a background job.

When you execute commands in the background, you have to be careful with respect to processing ouput. If all the commands write data to the standard output stream and/or error streams, then the data from the various commands will be mixed together. In many instances, commands executed in the background have their output re-directed into a file.

   $ make 1>make.out 2>make.err &
   1411
   $ who
   ...
   $ date
   ...

You need to be careful when executing interactive programs in the background. When you do, then you can have multiple programs reading the standard input stream and this doesn't work [at least two -- the shell and the program executed as a background process]. Typically, the standard input stream is re-directed (i.e. input is obtained from an object other than the keybard) when interactive programs are executed in the background.

{TopOfPage} {Resources}


The nohup Command

When you exit the system, a SIGHUP signal (value 1) is sent all of the programs you have running. By default, a program dies upon receiving this signal.

The nohup command can be used to keep a program running after you exit the system (i.e. the SIGHUP signal is not sent to your processes).

   general syntax:

      nohup your_command_line &

   example:

      nohup make 1>make.out 2>make.err &

When you use nohup , you should always re-direct command output. If you don't, then the command re-directs both output streams to a file called nohup.out (generally, not a good idea because what happens when you nohup a couple of commands?).

{TopOfPage} {Resources}


Signals

A signal is a piece of data in the form of an integer value that can be sent to a process. Signals can be sent to process using the kill command.

Signal values start at 1 and go up from there.

Each signal value has a name associated with it. A list of signal names can be obtained by using the kill command with the -l option. Here is an abreviated list obtained from a Linux system.

    1) SIGHUP     2) SIGINT     3) SIGQUIT    4) SIGILL
    5) SIGTRAP    6) SIGABRT    7) SIGBUS     8) SIGFPE
    9) SIGKILL   10) SIGUSR1   11) SIGSEGV   12) SIGUSR2
   13) SIGPIPE   14) SIGALRM   15) SIGTERM   17) SIGCHLD
   ...

The ANSI 'C' standard defines the following signals.

   SIGABRT    SIGFPE     SIGILL
   SIGINT     SIGSEGV    SIGTERM

A process can be written to ignore or catch signals except for the SIGKILL (signal value 9). If the process contains no signal handling logic, then the default behavior is for the process to terminate. If you want to terminate a program, then first use SIGTERM followed by a SIGKILL if the first kill doesn't work.

Typically, when working at the command-line, typing a <ctrl-c> causes a SIGINT (the interrupt signal value 2) to be sent to a process.

The operating system can send signals to a process when the process performs an illegal operation (e.g. attempts a divide-by-zero or access an invalid memory location).

Many daemon processes read configuration files upon startup. If the configuration is modified while the daemon is running, then a signal is usually sent to the daemon to instruct it to re-read its configuration files. SIGHUP is commonly used for this purpose.

   kill -s SIGHUP pid_of_the_daemon_process

{TopOfPage} {Resources}


Previous Next