##################################################################### #Lesson 4 #Winter 2004 #Date: 02/02/04 #Written by : Jen Chen #Copyrighted by Jen Chen ##################################################################### - Assume that we have a csv file (comma seperated value) as follow: Jen Chen,$15.75,45 Donald Duck,$56.32,35 Mickey Mouse,$45.71,35 Snow White,$67.35,43 Donald Rump,$120.50,55 Cinderella,$32.12,23 Daisie Duck,$17.32,40 To convert any .csv file into a .tab (tab-delimited file), we use the sed- command as follow: sed 's/,/ /g' EmployeeSalary.csv > EmpSalary.tab Note that the blank spaces between the 2nd double slashes is created by pressing the tab key. After this your EmpSalary.tab file should look like this: Jen Chen $15.75 45 Donald Duck $56.32 35 Mickey Mouse $45.71 35 Snow White $67.35 43 Donald Rump $120.50 55 Cinderella $32.12 23 Daisie Duck $17.32 40 - Sometime when we upload a file from a Win 32 OS to a Unix/Linux OS we will encounter some unpleasant situation. One of these situation is that file uploaded from Win 32 OS will automatically append some hidden character(s) at the end of each line (or record). These hidden characters are not seen if we use the "more" command from the shell prompt. We have to open the file using the VI editor in order to see these hidden characters. Assume that these characters occur at the end of each line/record, and this hidden character is a control + M character (it will appear in VI as "^M"). We can not just type in "^M" as we see on the screen; instead to generate a hidden character, we first press Control + v, then follow by the Control + M. Remember that "^M" means "Control + M", then while we're in VI, issue the VI command for global substitution to remove all these hidden characters. Jen Chen,$15.75,45^M Donald Duck,$56.32,35^M Mickey Mouse,$45.71,35^M Snow White,$67.35,43^M Donald Rump,$120.50,55^M Cinderella,$32.12,23^M Daisie Duck,$17.32,40^M The Vi-command to replace these ^M characters is as follow: :%s/(press Control + v, then press Control + M)//g There are other ways to accomplish the task, but I found out that this is the better way to obtain the result. - To substitute all $ sign in the file EmployeeSalary.csv with nothing, we could use the sed command from the shell prompt as follow: sed 's/\$//g' EmployeeSalary.csv If you want to save this file to another file then at the end of the previous command line add a redirection as follow: sed 's/\$//g' EmployeeSalary.csv > $$csv. Note that the double $$ sign means that we use the system PID (Process ID). - To create another field (let's say we want to add field 4) to the file above, we would issue an awk command as follow: sed 's/\$//g' EmployeeSalary.csv | awk -F, '{print $1","$2","$3","$2*$3}' > $$.csv rm EmployeeSalary.csv #Make sure that you have a BACKUP copy of this #file BEFORE you delete it! mv 4953.csv EmployeeSalary.csv #Assume that the $$.csv is the file 4953.csv. Note that since the EmployeeSalary.csv is a csv file, we HAVE to specify this delimiter by using the option -F follow by the field seperator itself ( in this case it is a comma). - To add another field, let's say field 5, where field 5 is the defined to be the 5% rebate if the value of field 4 exceeds 200, we do as follow: awk -F, '{if($4 > 200) print $1","$2","$3","$2*$3","$2*$3*0.05 \ else print $1","$2","$3","$2*$3","0}' EmployeeSalary.csv > $$.csv #Remember that we already move the $$.csv to the #EmployeeSalary.csv above. rm EmployeeSalary.csv mv 4953.csv EmployeeSalary.csv #Again, assume that the $$.csv file is #the file 4953.csv Now your EmployeeSalary.csv will look like: Jen Chen,15.75,45,708.75,35.4375 Donald Duck,56.32,35,1971.2,98.56 Mickey Mouse,45.71,35,1599.85,79.9925 Snow White,67.35,43,2896.05,144.802 Donald Rump,120.50,55,6627.5,331.375 Cinderella,32.12,23,738.76,36.938 Daisie Duck,17.32,40,692.8,34.64 Note that there are more than 2 decimals for the values of fields 4 and 5. We'll format this as follow: awk -F, '{if($4 > 200) print $1","$2","$3","$2*$3","int(100*($2*$3*0.05 + 0.005))/100 \ else print $1","$2","$3","$2*$3","0}' EmployeeSalary.csv > $$.csv #Remember that we already move the $$.csv to the #EmployeeSalary.csv above. rm EmployeeSalary.csv mv 4953.csv EmployeeSalary.csv #Again, assume that the $$.csv file is #the file 4953.csv Note that the int() function in the awk command will TRUNCATE (not ROUND UP) any real number to an integer of lesser value. Therefore in order to round up a real number in awk, we add 0.005 to any real number then multiply this value by 100 (assume that we would like to round up to 2 decimal. If we would like to round up to 3 decimals then multiply the real number by 1000 and so on...). After this we will truncate this real number by using the function int(). This will accomplish the job. After you issue the above command, your EmployeeSalary.csv file will look likethis: Jen Chen,15.75,45,708.75,35.44 Donald Duck,56.32,35,1971.2,98.56 Mickey Mouse,45.71,35,1599.85,79.99 Snow White,67.35,43,2896.05,144.8 Donald Rump,120.50,55,6627.5,331.38 Cinderella,32.12,23,738.76,36.94 Daisie Duck,17.32,40,692.8,34.64 - Conditioning in awk (AND, OR in the IF-statement): We can issue an awk command with the AND (&&) and OR (||) inside the awk command as follow: awk -F, '{if($2 > 50 && $3 > 50) print}' EmployeeSalary.csv The above awk command will print all records from the file EmployeeSalary.csv where the values of field 2 and field 3 are greater than 50. The result will look like: Donald Rump,120.50,55,6627.5,331.375 - If we would like to print out all records where the value of field 2 is greater than 100 but less than 30 then we'd use the conditioning OR as follow: awk -F, '{if($2 > 100 || $2 < 30) print}' EmployeeSalary.csv The result of the above command should be: Jen Chen,15.75,45,708.75,35.4375 Donald Rump,120.50,55,6627.5,331.375 Daisie Duck,17.32,40,692.8,34.64 - If we'd like to issue more than one awk statement inside the awk command, we seperate each statement with a semi-colon as follow: awk -F, '{sum = 0; if($2 > 50){sum += $2*$3; print $1"\t"sum}}' EmployeeSalary.csv Note that we enclose the body of the If-statement inside a pair of curly bracket to indicate that these 2 statements belong to the IF-statement. Without putting these 2 statements inside a pair of curly brackets we'll get wrong returned values. The result of the above sommand should look like: Donald Duck 1971.2 Snow White 2896.05 Donald Rump 6627.5 - Let's take away the pair of curly bracket then run the awk command. Look at the result then compare with the one above and you will see the difference. awk -F, '{sum = 0; if($2 > 50)sum += $2*$3; print $1"\t"sum}' EmployeeSalary.csv will result in the following: Jen Chen 0 Donald Duck 1971.2 Mickey Mouse 0 Snow White 2896.05 Donald Rump 6627.5 Cinderella 0 Daisie Duck 0 ############################################################################## - Unix shell scripting: ############################################################################## * Note: All Unix C-shell scripts MUST begin with the line: #!/usr/bin/csh where /usr/bin/csh is the path of your CSH. You can find out the path of your CSH by issuing the command: whereis csh similarly for bash whereis bash This line HAS to be placed at the first occurence of the first character of the first line. If it is misplaced then your script will produce error. Inside your C-shell script you can write any Unix shell command(s) or combination of Unix commands together with keywords for writing script, such as if()then endif if()then else endif while() end - Example of your first C-shell script: using VI edit then add the followings in your script: -------------------------------------------------------------------------- #!/usr/bin/csh ######################################## #Name: script1.csh #Author: Jen Chen #Date: 01/31/04 ######################################## echo "Hello world\!" #Note that we have to escape the ! sign by putting #the backslash in front of it. Without escaping this #special character will get an error message. Quit VI (or save this file while you are in VI), then chmod 700 on this file so that you can execute this script. A scrip without an executable bit will not execute. chmod 700 script1.csh Now run this script by issuing the command: script1.csh If everything is correct then you will see on the screen: Hello world! -------------------------------------------------------------------------- ########################################################################## #!/usr/bin/csh ######################################### #Comments: Script 2 for W04. #Name: script_2.csh #Date: 01/31/04 ################################# #Comments: This script will print out the message "Try shell script using awk, #then execute the Unix command # sed 's/\$//' EmpSalary.tab | awk -F\t '{print $1,$2,$3,$2*$3}' # Make sure that you have the file EmpSalary.tab. # - This script will execute as if the type the following commands at #the command line: # echo "Try shell script using awk" # sed 's/\$//' EmpSalary.tab | awk -F\t '{print $1,$2,$3,$2*$3}' #- Instead of repeating the typo we could put all these commands into #a shell script. If we'd like to modify this script later, we could easily #do so by editing this script. echo "Try shell script using awk" sed 's/\$//' EmpSalary.tab | awk -F\t '{print $1,$2,$3,$2*$3}' -------------------------------------------------------------------------- ########################################################################## #!/usr/bin/csh ################################# #Comments: Script 3 for W04. #Name: script3.csh #Date: 01/31/04 ################################# #A script to show how to pass command line argument(s) into a script. #All command line arguments are stored in the array "argv". #The total number of elements in the array "argv" is defined to be #"$#argv". #To call the elements of the array "argv" we do as follow: # $argv[0], $argv[1], $argv[2],... #- Use loop to print these elements. #- The argument(s) are placed AFTER the name of your script, each is #seperated by a space. #- For example: # %script3.csh EmpSalary.tab 2 3 # When we run our script there will be 3 arguments passed into our script. #- These arguments are: $argv[1] = EmpSalary.tab, $argv[2] = 2, $argv[3] = 3 #- Note that the array argv represents all arguments passing from the shell prompt, #this means the name of the shell script is defined to be argv[0]. #To Initialize/define a numerical variable (or any variable) we use the keyword "set". set count = 1 #Initialize the variable count to be 1. echo "The number of arguments passing from the command line is/are: $#argv" if ($#argv < 1) then #the word "then" has to be on the same line as the echo "Wrong number of argument\!" #word "if"!!! else while ($count <= $#argv) #Loop thru all elements in the array argv. echo "$argv[$count]" #Display the elements of the array argv. @ count = $count + 1 #Put the @ sign in front of every computation. end endif # Now pass the 2nd argument (since argv[0] is the 1st argument at the command #line and usually it is the name of your script), from the command line, argv[1] #to the sed command then pipe the result to the awk command as follow: sed 's/\$//' $argv[1] | awk -F\t '{print $1"\t"$2"\t"$2*$3}' #Do not forget to UNSET all defined variables. unset count # This will produce the same result as in script_2.csh, but instead of hard- #coding the name of the file (EmpSalary.tab) we will pass the name of this file #from the command line into our script. This will make our script more versatile #and more re-usable. -------------------------------------------------------------------------- ########################################################################## #!/usr/bin/csh ################################# #Comments: Script 4 for W04. #Name: script4.csh #Date: 01/31/04 ################################# set usage="Please supply arguments when use with this script." #To Initialize/define a numerical variable (Or any variable) we use the keyword # "set". set count = 1 #Initialize the variable count to be 1. if ($#argv < 1) then #the word "then" has to be on the same line as the echo "$usage" #word "if"; otherwise we'll get an error mesage. exit 1 #Something is wrong so we quit this script. endif if (-e $argv[1]) then #The option -e inside the IF- statement means to #check for file existence. # Make sure that you ENCLOSE every variable PASSED to the awk command in #a pair of SINGLE QUOTE. Assume that we pass the file EmpSalasry.tab as #argv[1] and $argv[2] represents the field number that we'd like to pass #to the awk-command. sed 's/\$//' $argv[1] | awk -F\t '{if($'$argv[2]' > 50) print}' else echo "$argv[1] does not exist" endif --------------------------------------------------------------------------