Home Previous Next

CSC220 :: Lecture Note :: Week 12
Assignments | Handouts | Resources | Email Thurman {Twitter::@compufoo Facebook::CSzero}
{GDT::Bits:: Time  |  Weather  |  Populations  |  Special Dates}

Overview

Assignment(s): [Note: All assignments have been assigned.]


The Unix Philosophy

Here is the Unix Philosophy in a nutshell.

GDT::Bit:: The Unix Philosophy

{TopOfPage} {Resources}


Introduction to the File System

The Unix file system is a logical method for organizing and storing large amounts of information in a way which makes it easy to manage. The file is the smallest unit in which information is stored.

Conceptually, a computer file is similar to a paper document. Technically, a file is a sequence of bytes that is stored somewhere on a storage device. [A byte is the smallest address-able unit. A byte is typically made up of eight bits. A bit is a binary digit (0 or 1).]

Every file has a name called its filename. Users access files using filenames.

A filename can be almost any sequence of characters; however, certain characters should be avoided:

	! # & @ $ ^ ( ) ; | < > TAB SPACE BACKSPACE \ ? * { } [ ] ' "

You should take care when naming a file: use meaningful, easy to remember filenames.

Adopt and follow a consistent file naming convention.

Remember that Unix is case sensitive. Use uppercase and lowercase letters to help make good filenames.

Many filenames use extensions to help indicate the type of file. For example, the file foo.c contains C source code, the file foo.html contains HTML code, and the file foo.au is an audio file. Extensions are not required.

On early Unix systems, filenames were limited to 14 characters in length; however, that restriction is no longer enforced. Maximum filename length is longer than what you want to use.

Over the course of time, you end up creating lots of files. A directory is a special type of file that can be used to group files together. [A directory is similar to a folder.]

Directories provide a way to categorize information.

A directory can contain other directories (often called sub-directories or child directories).

A directory is a file; therefore, it must have a name.

As with naming files, care should be taken when naming directories.

Because directories can contain other directories, which can in turn contain other directories, the Unix file system is called a hierarchical file system or tree-structured file system.

A collection of directories and files creates a directory structure.

The top directory of a the Unix file system is called root and is represented by the name / (slash).

UnixHelp.ed.ac.uk:: File System
PathName.com:: Filesystem Hierarchy Standard
To Be Completed in Unix II: ext2, ext3, rfs, nfs, tmpfs

{TopOfPage} {Resources}


Common Directory Structure

The standard system directories are shown below. Each one contains specific types of file. The details can vary between different Unix systems, but these directories should be common to many.

                                 / (root)
                                    |
    ----------------------------------------------------------------
    |         |       |       |        |       |       |           |
   bin       dev     etc     usr      tmp     lib     home     kernel file

On the 3.5 OpenBSD system, the following directory structure exists.

   /
   /home
   /usr
   /var
   /altroot
   /bin
   /dev
   /etc
   /mnt
   /root
   /sbin
   /stand
   /tmp

The root directory (/) has two executable files: boot and bsd.

{TopOfPage} {Resources}


I-nodes

Internally, Unix keeps track of files using a structure that is called an i-node (index node).

File names are used to make Unix a "user-friendly" system. Internally, Unix refers to files by number; more specifically, their i-node number.

Each file within a file system has a unique i-node number assigned to when the file is created.

Executing the ls command with the -i option causes the i-node numbers to be displayed.

If two files have the same i-node number, then the files are linked (i.e. they are the same physical file, but the file has two names).

An i-node structure (record) may contain the following information about a file:

   + it's size (in bytes)
   + permissions
   + owner
   + group
   + date and time created, last modified, last accessed
   + link count
   + "pointers" to where the data of the file is stored

{TopOfPage} {Resources}


Directory and File Permissions

Recall, every file has an owner and belongs to a group. In addition, every file has a set of permissions on it that controls who has access to the file (e.g. who can read it, who can write it, and so on). The owner, group, and permissions are initialized when a file is created.

The chmod command is used to change the permissions on a file.

Typically, by default, when a file is created, it is initialized with the following file permissions.

   -rw-rw-rw-   =>  read/write by everybody
   || || || |
   || || |+---> other permissions
   || || |
   || |+---> group permissions
   || |
   |+---> owner permissions
   |
   +-> indicates a regular file 

[Sidebar] The default permissions are determined by the setting of the user's umask. Most users want the default permissions to be: -rw-r--r-- (read/write by owner, read-only for everybody else).

Numerically, permissions of -rw-r--r-- is represented by the octal (base 8) number 644.

read4 write2 execute1

The octal modes are specified by adding together a 4 for read, 2 for write and 1 for execute permission. The three digits specify, as in ls -l, permissions for the owner, group, and everyone else.

Examples
   444  =>  -r--r--r--
   777  =>  -rwxrwxrwx
   602  =>  -rw-----w-
   000  =>  ----------
   222  =>  --w--w--w-
   460  =>  -r--rw----
   541  =>  -r-xr----x
   644  =>  -rw-r--r--
   711  =>  -rwx--x--x
   711  =>  drwx--x--x       # this is a directory
   777  =>  drwxrwxrwx       # this is a directory

   $ chmod 644 foo
   $ chmod 777 foo bar junk
   $ chmod 400 /etc/shadow

   If only one octal digit 'N' is used, 
   then that is equivalent to 00N.

   If two octal digits 'MN' are used,
   then that is equivalent to 0MN.

   $ chmod 7 foo                # results in ------rwx
   $ chmod 24 foo               # results in ----w-r--

Only the owner of a file (or super-user) can change the permissions on a file.

Permissions can be set by symbolic description, but details of this are left to the reader. The following is a brief overview. The syntax of using the symbolic-mode of chmod is:

      [who] op permission

who is a combination of the letters.

      u ... user
      g ... group
      o ... others
      a ... all  (ugo)

op can be a + to add permission or - to remove or = to assign permissions.

permission is a combination of the letters r, w, and x. (note: there are additional letters).

   $ chmod a=rw foo    # set permissions to read/write for all
   $ chmod +x foo      # set execute permission
   $ chmod o-w foo     # remove write permission from other
Directory Permissions

Permissions have a different meaning when applied to directories. In a nutshell.

   r  =>  you can read the directory (e.g. do an 'ls' on it)
   w  =>  means you can add files to and remove files from the directory
   x  =>  allows you to 'cd' into the directory or use it as part 
          of a path

Recall, that a directory is a file: it contains information about other files.

About the umask Command

The umask command is used to set the default creation mask for files and diretories. It is kind of like the reverse of chmod: it tells the system which permissions should not be given when a file is created.

   $ umask 000
   $ >foo
   $ ls -l foo
   -rw-rw-rw- ... foo
   $ umask 022
   $ >bar
   $ ls -l bar
   -rw-r--r-- ... bar
   $ umask
   022

Many SysAdmins put the umask 022 command in the /etc/profile file. As a result, the default permissions on directories and files are drwxr-xr-x and -rw-r--r--, respectively.

{TopOfPage} {Resources}


Environment Variables

An environment variable is a variable that is made available to commands as part of the environment that the shell maintains.

Environment variables are useful for a variety of reasons:

Personal environment variables can be defined in your .profile (or .bash_profile) file stored in your HOME directory. The .profile file is executed every time you log into the system.

Prior to executing the .profile, the shell executes /etc/profile. This ensures that all users start with a common environment upon successful login.

On most systems, the following environment variables are automatically set for you.

HOME absolute path of your home directory
PATH used by your shell to find programs
PS1 your shell prompt
SHELL the name of the shell program you are using
LOGNAME your login account name
TERM the type of terminal you are using
MAIL absolute path of where your email is stored

An environment variable has a name and an associated value. Prefixing an environment variable with a dollar sign causes the shell to get the variable's value. The echo command can be used to display the value of a environment variables. Here are some examples.

   $ echo $HOME
   /export/home/gthurman

   $ echo $SHELL
   /usr/bin/ksh

   $ echo $MAIL
   /var/mail/gthurman

   $ echo TERM is $TERM and LOGNAME is $LOGNAME
   TERM is vt100 and LOGNAME is gthurman

One of the most useful environment variables is PATH. PATH is used by the shell to locate commands. The following is a common value assigned to the PATH environment variable:  /bin:/usr/bin:/usr/local/bin. When a command is entered, the shell (using the value of PATH) first looks in the /bin directory for the program and executes it if it finds it; otherwise, is searches /usr/bin for the command, followed by /usr/local/bin. If the shell searches all the components without finding the executable file, then it prints a "command not found" error message.

By default, the shell does not look in your current working directory for the command. This can be changed to modifying your PATH as follows.

   $ PATH=$PATH:.        or    PATH=$PATH::

      Important note:  If you do put the current working directory on
      the PATH, then it should be last the component.  You never want
      it to be the first component.

If you enjoy playing games, then you may alter the PATH as follows.

   PATH=$PATH:/usr/games

You can use your own environment variables for abbreviations. For example.

   $ letters=$HOME/personal/letters
   $ cd $letters

If you have an environment variable that you want available to other commands, then you need to export it.

   $ EDITOR=/usr/bin/vi export EDITOR

      Now the EDITOR environment variable can be used by email
      programs, pager programs, and so on.

Environment variables are often used to "hide" the complexities of the directory structure. In other words, you can access a specific directory by name without having to know its absolute path.

By convention, personal environment variables are spelled in lower case to help distinguish them from those setup by the system.

Environment variable names must begin with a letter.

The env program can be used to display all the environment variables you have defined.

A variable can be removed from the environment using the unset command. Syntax.

   $ unset variable_name

{TopOfPage} {Resources}


Tilde Expansion [minimal notes]

Shells such as ksh and bash support tilde expansion. The ~ expands to the operand's HOME environment variable setting.

   example 1
   ---------
   $ echo ~thurmunit
   /users/n-z/t/th/thurmunit

   example 2
   ---------
   $ pwd
   /users/n-z/t/th/thurmunit
   $ cd ~hmumford
   $ pwd
   /users/a-m/h/hm/hmumford

   example 3
   ---------
   $ who am i
   gdt   /dev/pts/14  Sep 25 05:08
   $ cd ~/tmp
   $ pwd
   /export/home/g/gdt/tmp

GNU.org:: Bash Reference Manual: Tilde Expansion

{TopOfPage} {Resources}


Standard Input/Output/Error

In Unix, a file is a sequence of bytes that is stored somewhere on a storage device (e.g. a disk). The content of a file does not have any significant meaning to the OS (it is simply a sequence of bytes). The "structure" of the bytes may have to conform to a particular format in order to be used by specific applications (i.e. programs or commands).

Every command, when invoked, has three I/O (input/output) streams opened for it: two for output and one for input. The output streams are called standard output and standard error. The input stream is called standard input. In "C" terminology, these I/O streams have the names stdout, stderr and stdin, respectively.

A command can get (read) input from the standard input stream, but it doesn't know where the input comes from (it could come from a file, the keyboard, or another command).

       file--->+
               |
   keyboard--->+ ---> standard input ---> command
               |
    command--->+

The output of a command (if any) can be written to either the standard output stream or the standard error stream. The command doesn't need to know where the output is going (it could go to a file, the screen, or another command).

                                     +---> file
                                     |
   command ---> standard output ---> +---> screen
                                     |
                                     +---> command

The shell manages the standard I/O streams for a command.

The shell assigns a file descriptor to each of the I/O streams: standard input is 0, standard output is 1, and standard error is 2.

By default, standard input is the keyboard, and standard output and standard error are both the screen. Re-directing these streams is accomplished on the command-line when the command is invoked using the re-direction operators. [Note: It can also be done internally by the command itself.]

The re-direction operators are as follows.

SymbolExampleFunction
< cmd < file take input for cmd from file
> cmd > file send output of cmd to file
>> cmd >> file append output of cmd to file
| cmd1 | cmd2 run cmd1 and send output to cmd2

{TopOfPage} {Resources}


Redirecting Standard Output and Standard Error

In some instances you want to save the output of a command into a file. This can be accomplished using the shell's redirection operators > and >>.

Here are examples.

  $ who > who.out

    Executes the command 'who' and redirects the output into
    a file named "who.out".  No output is seen on 
    the screen.  The file "who.out" is created in 
    your current working directory.

  $ date > /tmp/now

    Executes the command 'date' and redirects the output to a
    file named "now" stored in the "/tmp" 
    directory.  (Note: If the directory "/tmp" contains 
    a directory named "now", then the shell will not execute the
    command.

  $ ls -l>myfiles

    Use of whitespace around the > operator is optional.  
    Executes the 'ls -l' command with output redirected into a 
    file named "myfiles".

If you redirect the output of a command into a file that doesn't exist, then the shell creates the file for you (assuming permissions are not a problem). If you redirect the output of a command into a file that already exists, then the content of the existing file is replaced with the command's output (again, if allowed).

   $ ps > /etc/foo

     This command should fail on most Unix systems.  We are attempting
     to create a file named "foo" in the "/etc"
     directory.  "/etc" is a system-level directory and 
     regular users are not allowed to write to it.

If you want to redirect the output of a command and have it appended to the content of an existing file, then you must use the >> operator.

   $ who > cmd.log

     The 'who' command is executed and its output redirected into
     a file named "cmd.log".

   $ date >> cmd.log
      
     The 'date' command is executed and its output is appended to
     the file named "cmd.log".  (If file "cmd.log"
     doesn't exist, then the system creates it.)  Again, the use of 
     whitespace around the operator is optional.

   $ ps >> cmd.log

     The 'ps' command is executed and its output is appended to the
     file named "cmd.log".

   $ cat 'mygroup:*:40000:' >> /etc/group

     This command should fail on most Unix Systems when executed as 
     a regular (i.e. non-root) user.  We are attempting to add a
     group to file that should be read-only.
Re-directing Both Output and Error

The following are some examples of how you can re-direct the standard error stream. Note: these examples work with Bourne, Korn and Bash shells, but they do not work with "C" shell.

   Assume  cmd  is some Unix command.

   $ cmd >out 2>err
     
     'cmd' is executed.  Standard output is re-directed to a file
     named "out" and standard error is re-directed to a 
     file named "err".

   $ cmd >out 2>&1

     All output (both standard output and standard error) are
     redirected into a file named "out".

   $ cmd 2>foo

     The standard error is re-directed to a file named "foo".

{TopOfPage} {Resources}


Redirecting Standard Input

Many commands obtain their input from files specified on the command-line. Most of these commands also work if no files are specified. In these cases, the command reads input from the standard input stream, which by default is the keyboard. The cat command provides a good example.

   $ cat /etc/group

     The command opens the file  /etc/group  and uses 
     the content of the file for input.

   $ cat
   Now I am entering data (via the keyboard) that is
   going to be used as input to the 'cat' command.
   This is the standard input stream.  You tell the
   shell that you are done entering data by typing
   a <CTRL-D> character.  [Note: on an ASCII
   system, <CTRL-D> is EOT.]
   <CTRL-D>

     The 'cat' command is executed without any arguments; therefore,
     it gets input from the standard input stream.

   $ cat < /etc/group

     From a user perspective this has the same effect as if
     /etc/group  was specified as a command-line argument,
     but internally the standard input stream was re-directed
     from the keyboard to the file  /etc/group.

In some cases you want to execute a command and have it get input from both the standard input stream and a file.

   $ cat - /etc/group
   This is the content of /etc/group:
   ==================================
   <CTRL-D>

      On some systems, the use of  -  may not be supported by
      all commands.  The file  /dev/stdin  can be used instead.

         $ cat /dev/stdin /etc/group
         This is the content of /etc/group:
         ==================================
         <CTRL-D>
Redirecting Input and Output

You can execute a command and have both the input and output streams re-directed.


   $ cat < /etc/group > foo

     Execute the 'cat' command re-directing standard input to
     the file  /etc/group  (i.e. the content of  /etc/group
     becomes the standard input stream).  The output of the
     'cat' command is re-directed to the file  foo.

{TopOfPage} {Resources}


Pipes

Recall, the output of a command be redirected by using the > operator.

Many commands receive their input from a file or from in the standard input stream. Input can be redirected into a command by using the < command.

   $ sort /etc/group

      The  sort  program displays the content of  /etc/group
      sorted in alphabetical order.

   $ sort < /etc/group

      Instead of getting its input from a file, the  sort  command
      sorts the standard input stream (which in this happens to be
      the content of the  /etc/group  file).

In many cases you need to the take the output of command and use it as input to another command. This can be easily accomplished by using the | (or pipe) operator.

   $ ls -l | grep "Oct 28"

      Do a long listing and pipe the output into the  grep
      command searching for the pattern "Oct 28".  The output
      of the command sequence will be a long listing on all
      files that were created and/or modified on "Oct 28".

   $ wc -l /etc/passwd

      wc -l  counts and prints the number of lines found
      in the file argument  /etc/passwd.

   $ who | wc -l

      The output of the  who  command is piped into the
      word count program  wc  .   wc  when executed with
      the  -l  option prints the number of lines it finds
      in its input.

   $ cut -f1 -d":" /etc/passwd | sort | uniq | wc -l

      The 'cut' command prints the values of field
      one found in the /etc/passwd file having colon
      delimited fields.  The output is sorted using
      the Unix 'sort' command and the sorted output
      is sent into the 'uniq' program which eliminates
      duplicate values.  The unique list of values is
      then counted by the 'wc -l' command.

   $ cat /etc/passwd | tr [a-z] [A-Z] | grep THURM | wc -l

      This script prints the number of times the string
      "thurm" is located in the /etc/passwd file.  Searching
      is not case sensitive.  Exercise:  describe what is
      going on.

   $ tail -100 $logs | grep "^csnet:" 2>/dev/null | sort | pr -s | more

      Exercise:  describe what is going on.

Bell-Labs.com:: Why Ken Had to Invent the | [hyperlink to Dennis Ritchie's homepage]

Early Unix History and Evolution

The following was copy/pasted from Ritchie's website.

   Pipes appeared in Unix in 1972, well after the PDP-11 
   version of the system was in operation, at the suggestion 
   (or perhaps insistence) of M. D. McIlroy, a long-time 
   advocate of the non-hierarchical control flow that 
   characterizes coroutines. Some years before pipes 
   were implemented, he suggested that commands should 
   be thought of as binary operators, whose left and
   right operand specified the input and output files. 
   Thus a copy utility would be commanded by 

     inputfile copy outputfile

{TopOfPage} {Resources}


Some Pipe Examples

Print your LOGNAME excluding the first character in all upper-case.

   $ echo $LOGNAME
   gthurman

   $ echo $LOGNAME | cut -c2-8  | tr '[:lower:]' '[:upper:]'
   THURMAN

Print your uid using the id command without using the id command's -u option.

   $ id
   uid=879(gthurman) gid=100(users) groups=100(users)

   $ id | cut -f2 -d"=" | cut -f1 -d"("
   879

The following command-line will be analyzed during lecture.

   $ tail -100 $logs | grep "^cszero:" 2>/dev/null | sort | pr -s | more

{TopOfPage} {Resources}


Shell Meta-Characters

Shell meta-characters (or wildcards) can be used to reduce the amount of stuff you need to type and to concisely refer to a group of commonly related files.

The meta-characters are: * (asterisk), ? (question mark) and [] (square brackets).

* An asterisk matches 0 or more characters in a file name.
? A question mark matches any single character.
[ ] Square brackets can surround a choice of characters you want to match.

Note: asterisk does not match file names that start with a dot (i.e. hidden files).

   $ ls -x
   d01.shtml     d02.shtml     d03.shtml   d04.shtml   d05

      Display all files names found in the current directory.

   $ ls -x *.shtml
   d01.shtml     d02.shtml     d03.shtml   d04.shtml

      Display all file names that end with the string  .shtml

   $ ls -x ?05
   d05

      Display all file names that are 3 characters long and
      end in  05

   $ ls -x d0[1-2]*
   d01.shtml     d02.shtml

      Display all file names that start with  d0  and are at 
      least three characters long.  The third character must 
      fall in the range of 1 to 2 (inclusive) and can be 
      followed by 0 or more characters.

   $ ls -x *5*l
   d05.shtml

      Display all file names that have a 5 somewhere in them
      (except the last character) and that end with a lowercase 
      ell character.

   $ ls -x d0[123].shtml
   d01.shtml     d02.shtml     d03.shtml

      Display all file names that start with  d0  followed
      by either a 1, 2, 3 followed by  s.html.  Every file
      name must be 9 characters long.

   $ ls -x ???
   d05

      Display all file names that are three characters long.

   $ cat d*

      Display all file names that start with a lowercase dee.

   $ rm *.shtml

      Removes all files that end with the string  .shtml


   $ grep "the end" d0?

      Searches for the pattern "the end" in all files having names
      that are three characters long and start with  d0

   $ mv *0* /tmp
      
      Moves all files having a 0 in their name to the /tmp directory.

   $ mv temp[0-9] /tmp
      
      Moves all files having the name  temp  followed by a single
      digit to the  /tmp  directory.

   $ mv [A-Z]* /tmp

      Moves all files beginning with an uppercase letter to 
      the /tmp directory.

When you enter a command-line that contains meta-characters. The shell expands the command-line to include the fully named files. In otherwords, commands see complete file names; they do not know about the meta-characters.

   Assume we have a directory that has the following files:
      x.out    y.out   z.err   w.out    t.err   c.obj

   $ rm *.out

      The shell expands  *.out  and the following command-line
      is executed:  rm x.out y.out w.out

   $ rm *.junk

      No file names are found ending with  .junk  therefore  *.junk 
      is the argument that is passed to the  'rm'  program.  The 'rm'
      command will try to open a file named  *.junk  and will fail
      resulting in an error message.

Caution needs to be exercised when using meta-characters on the command-line. The following has happened to many users:

   $ rm x *

      The user wants to remove all files starting with  x  but
      the space between the  x  and  *  causes all files to be removed.
      The shell expands the * to all file names which in turn get
      passed onto the 'rm' command.

There is a maximum length that the command-line can end up being. For example, if you have directory containing a 1000 files having long file names, then a command like rm * may not work.

What happens if you have file name that has an asterisk in it? For example, suppose we have directory containing the following files:

   x.out  x.err  x.*  x.obj  x.c
and we want to remove the file named "x.*".
   $ rm x.\*

{TopOfPage} {Resources}


Home Previous Next