UNIX TOOLS FOR CODE DEVELOPMENT Tom Statler, Dept. of Physics and Astronomy, Ohio University Feb. 1996, rev. Aug. 1996, rev. (again) June 1998 This document is intended to help students with some experience in computing and programming, but little experience with the Unix operating system, learn what they need to know in order to get some real work done. This is not a Unix primer. There are a number of books available that serve that purpose. Instead I am assuming the reader is familiar with basic concepts like logging on and off, files, directory tree structure, listing and printing files, and so on. What follows is an introductory-level discussion of the most important tools needed to write, compile, execute, and debug original programs. In short, it's the stuff I wish somebody had told me the first time I sat down in front of a Unix machine with a programming assignment due the next day. 0. Conventions The default Unix prompt is "%", though many people have it set to something else. Anything you are supposed to type at the Unix prompt is written as: % vi fred.f meaning that you should type "vi fred.f" (followed by the return key) at the prompt, whatever it is. Names of Unix commands are in single quotes, e.g. 'vi', 'more', or 'awk'. Like the example above, commands may require one or more arguments to be typed with them in order to actually do anything. Special characters are (the return key), (the escape key), ^C (control-c, i.e. hold down control and type c), etc. If you don't have an escape key, ^[ is the same thing. 1. Basic steps There are 3 basic steps to programming: writing the source code, compiling, and executing. The source code is the "program", written in Fortran, C, or some other language. (Fortran will be discussed here.) You create a file containing the source code using a text editor. The usual Unix text editor is called 'vi'; it is rather an archaic editor but simple and fast. The compiling step is actually two steps: compiling, which translates your Fortran into "object code" which is mostly langauge-independent, and linking, which turns object code into an executable file. Finally, executing means running your program, which may or may not require additional action from you depending on what your program does. 2. Using 'vi'. I am assuming you are familiar with what a piece of Fortran code should look like; this section is just to help you write code using 'vi'. Files containing Fortran source code should have names ending in ".f". To write a piece of code called "fred.f", first start the editor: % vi fred.f Vi has two modes, input and edit. Input mode means every letter you type goes into your file. Edit mode means every letter you type is a command to the editor for moving around in the file, deleting letters, saving, and so on. The cursor marks where you are in the file, so anything you do in edit mode will take effect at the cursor unless you specify otherwise. Basic commands are: i Go into input mode, inserting new text just before the cursor position Exit input mode, go back to edit mode h move cursor left - (You can also use the arrow keys to do this, j move cursor down | but 'vi' was written before terminals had k move cursor up | arrow keys!) l move cursor right - x delete one character dd delete one line 5dd delete 5 lines (this kind of construction works for most commands) ZZ quit and write the file, overwriting any previous version :wq same as ZZ :q quit the editor but don't write the file Anything you need to do you can accomplish using these commands; some extra things that are helpful are a Go into input mode, appending new text just after the cursor position o Go into input mode, starting a new line after the current one O Go into input mode, starting a new line before the current one ^D Move down about half a screenful ^U Move up about half a screenful ^L Redraw the screen dw delete a word cw change a word; puts you in input mode (experiment to see how it works) s substitute one character; works like cw r replace this character with the next character you type Y "Yank" a line of text; known in the PC lexicon as "copy" p "Put" (paste) the last thing that was yanked or deleted after the cursor P "Put" before the cursor /string Find the next occurrence of "string" There are lots more, but these are more than enough. One subtle thing that can sometimes be extremely useful is this: suppose you have to change every occurence of "joe" in a file to "george". Then you'd type :%s/joe/george/g The : says you are starting a long command, the % means every line in the file, the next part says search for the string "joe" and replace it with "george", and the trailing g means get ALL of the "joe"s, rather than just the first one in each line. In the "search" part of this construction you have to be careful of some special characters: /, \, ., ^, and $ (and probably a few more that I'm forgetting) need to be preceeded by a backslash if you are actually searching for those characters. Without the backslash, ^ means the beginning of a line, $ means the end of a line, and . means any character. This is powerful but can lead to some really cryptic commands: :%s/^\.\.\//..\/..\// It's left as an exercise to figure out what this does and why. One obvious and one non-obvious thing about Fortran files in particular: first, every line ends with a "newline" -- that is, what you get when you hit . Second, old standard Fortran from the days of punched cards required that program statements start in column 7; nowadys it is good enough to hit and then start typing the line (but the old way works too). Statement labels, of course, still have to be in the first 5 columns, and column 6 is for continuation lines just like in the olden days. Telling 'vi' to ":set autoindent" can save you a little typing time. (Autoindent is counterracted temporarily with ^D in input mode and turned off with ":set noautoindent".) 3. Compiling and Linking You now have your source file called fred.f. Exit the editor. You can then do any of the following: % f77 fred.f This will compile and link your program, and create a file called "a.out" that is the executable code. Depending on how the particular computer is configured you may also get a "fred.o" file with the object code. % f77 fred.f -o fred Same thing but call the executable "fred" rather than "a.out". Most people do it this way. % f77 -c fred.f % f77 fred.o -o fred This just breaks the compile and link procedure into the two separate parts. This is pointless for short programs, but a good thing to know if you are developing a larger program with several subroutines. The C compiler is called 'cc', and for the most part is used the same way. 4. Executing If you called your executable file "fred", then to run the program you type % fred and it runs! Yay! Any output the program produces gets printed on the screen. If you want to save the output (as you probably will) you'd type instead % fred > fred.out which redirects the output into the file fred.out. This could also be done by writing the source code appropriately, as shown in the next section. 5. Subtleties of I/O You may also have written your program so that it asks you to type in some stuff, say some numbers to operate on. Suppose fred.f looks like this: implicit none real x,y,z print*,'I want a number.' read*,x print*,'I want another number.' read*,y z=x+y print*,'The sum is ',z stop end (The first 2 lines are good programming style, and prevent errors caused by misspelling variable names. The compiler will complain about anything that isn't explicitly defined. The 'print*' and 'read*' are examples of 'list-directed I/O'. If you have been learning Fortran format statements, this is a lot easier and generally useful unless you want nicely-aligned output and/or want control on how many significant figures you print.) You compile and run, answering the questions, and the whole escapade looks like this: % f77 fred.f -o fred % fred I want a number. 5 I want another number. 6 The sum is 11.0000 % If you have a complicated program with a lot of inputs then you will get sick and tired of typing them again and again and again while you test the program. So instead you use 'vi' to create a file containing exactly what you would type. Call the file "fred.in", and it looks like: 5 6 Now you can be a lazy bum: % fred < fred.in I want a number. I want another number. The sum is 11.0000 % You could even do this: % fred < fred.in > fred.out % Note that this types NOTHING to the screen, but the file "fred.out" contains: I want a number. I want another number. The sum is 11.0000 Now what if you don't want the "I want a number" stuff in your output file but you still want the program to ask for the numbers if you don't have a "fred.in" file? Then you have to rewrite the code: implicit none real x,y,z print*,'I want a number.' read*,x print*,'I want another number.' read*,y z=x+y open(7,file='fred.out') write(7,*)'The sum is ',z close(7) stop end OK, but now your output is going to be called "fred.out" every time, and any pre-existing "fred.out" is going to be overwritten. What if you want to decide on the name of the output file when you run the program? Well, then you'd do this: implicit none real x,y,z character*32 name print*,'I want a number.' read*,x print*,'I want another number.' read*,y print*,'Give me a file name for output.' read*,name z=x+y open(7,file=name) write(7,*)'The sum is ',z close(7) stop end If you are lucky, you will be able to do this: % fred I want a number. 5 I want another number. 6 Give me a file name for output. fred2.out % and the file "fred2.out" will contain the output. Some systems, however, will give you an error message something like "type conversion error" or some such thing, meaning that it was expecting a number and got confused by the character string. There are two solutions: you can either remember to enclose the file name in quotes when you type it in, or you can change the read statement in the source code to a formatted read, i.e., read(5,'(a32)')name and then you won't need the quotes. 6. Typical drudgery Often you will need to do some basic chore, like read a file of numbers output from some other program, manipulate them, and write a new file with the manipulated data. The basic structure of a line-by-line manipulation is something like this: character*32 infile,outfile open(4,file=infile) open(7,file=outfile) do 10 i=1,nlines read(4,*)data1,data2,... (manipulations) write(7,*)data3,data4,... 10 continue close(4) close(7) Remember that units 5 and 6 are reserved for standard input and output (i.e. the keyboard and the screen). You could also use a do/end do construction in fortran 90 instead of do/continue. For careful work this is always best-- write a program to do the job. However, for quick and dirty work it's good to be aware that there are lots of Unix utilities for doing data manipulations, like 'sort', 'paste', and the most important, 'awk'. 'Awk' is way too complicated to explain fully, so I'll just give a simple example. Suppose you have a data file (fred.out!) of x, y, and z values that looks like this: 2.4 325.4 34.2 64.0 35.0 612.4 6.1 23.1 26.5 and you decide what you really need is x, y+z, and y-z. You COULD write a Fortran program, OR you could just do this: % awk '{print $1,$2+$3,$2-$3}' fred.out > fred.2.out The $1 means column 1 of the file, and so on. If you have some junk in the file, like... The answers are 2.4 325.4 34.2 The answers are 64.0 35.0 612.4 The answers are 6.1 23.1 26.5 you could operate on only every OTHER line by doing this: % awk 'NR%2==0 {print $1,$2+$3,$2-$3}' fred.out > fred.2.out The first part filters only those lines whose line numbers ("record numbers"), modulo 2, are equal to 0. 7. Code development If you are writing a complicated program it's often a good idea, depending on the nature of the job, to break it up into subroutines. The big advantage of this is that you can test each subroutine individually before putting it together. If you put each subroutine into its own file, then you can use a really great Unix utility called 'make'. Suppose you have the source code for your main program in "fred.f", and two subroutines called "arthur.f" and "harry.f". Use 'vi' to create a file called "Makefile" that looks like this: .f.o: ; f77 -c $*.f fred: fred.o arthur.o harry.o f77 fred.o arthur.o harry.o -o fred (If I remember correctly, the blank second line is necessary, as is the TAB at the beginning of line 4.) This is a set of instructions that says: (line 1) To make a ".o" file out of a ".f" file, take the ".f" file and execute 'f77 -c' on it. (line 3) To make the executable file "fred" you are going to need the files "fred.o", "arthur.o", and "harry.o". (line 4) To make "fred", use f77 to link the 3 object files. If you have several programs, you can use the same Makefile; leave line 1 as it is, and add more units like lines 2-4 containing the instructions to make each of your executables. Now, to compile and link your program, you don't have to type % f77 fred.f arthur.f harry.f -o fred you just type % make fred The beauty of this is that 'make' keeps track of what you have changed, so if you only make a change to "harry.f", 'make' will only recompile "harry" and then link it with the existing "fred.o" and "arthur.o". If you've got a big program, with, say, 15 subroutines, this can be a HUGE timesaver. If you haven't changed anything, 'make' will just say "fred is up to date". If you want to FORCE a subroutine to be recompiled, you can delete the .o file (that's the .o file, NOT the .f file), or, more safely, % touch arthur.f which doesn't do anything to the file but changes the "last access" time so it LOOKS like you did something to it. 8. Debugging If your program runs correctly the first time you are the luckiest person on the planet, and don't expect it to ever happen again. EVERY piece of code needs to be tested and debugged before it is used for anything. Keep in mind that there are two different aspects to this: the code has to run without crashing AND it has to produce correct results. System error messages tell you only about the former, and precious little even about that; the latter is your responsibility. One of the worst things about Unix is that error messages are almost always unhelpful; but then again this is true of every operating system there is. Gradually one learns what kind of mistakes produce what kind of error messages. When you've made the same mistake enough times you'll learn to recognize the signs. Symbolic debugging programs exist on most systems to help you track down errors in your code. Whether or not they are useful depends strongly on what operating system and what compiler you are running. Recently, Sun Microsystems has done an admirable job of bringing the long-neglected Unix debugger, dbx, into the modern era. Sun's implementation of dbx is integrated into a code-development environment called 'workshop'. If you are on an up-to-date Sun system running Solaris 2.5 or higher, try typing % workshop & If you get a funky little window with a bunch of icon buttons, you are in business; but you are also on your own since using 'workshop' is far beyond the scope of this document. If the compilers are installed in the right place, you should find hypertext documentation (readable with a web browser) in the file /opt/SUNWspro/DOC4.0/lib/locale/C/html_docs/index.html If you are using the debugger, you need to compile all of your code with the debugging options, like this: % f77 -g -C fred.f -o fred The '-g' part writes extra stuff in the object files that the debugger needs, and '-C' checks to see if you use an invalid subscript on an array. Always keep in mind, however, that the debugger may or may not trap your error correctly, and may or may not point to the line of code where the problem actually lies. Recently I have seen an illegal character in a fortran statement produce an error, not at the offending line, but at the line where the variable closest to the illegal character was dimensioned; and a misuse of complex variable precipitate a "segmentation fault" at a print statement far downstream. And here is a particularly scary example I encountered using the previous version of dbx: % f77 -g fred.f -o fred % dbx fred Reading symbolic information for fred Reading symbolic information for rtld /usr/lib/ld.so.1 Reading symbolic information for /opt/SUNWspro/lib/libF77.so.2 Reading symbolic information for /usr/lib/libm.so.1 Reading symbolic information for /usr/lib/libc.so.1 Reading symbolic information for /usr/lib/libdl.so.1 (dbx) run Running: fred (process id 6519) I want a number. 3 I want another number. 4 The sum is 0. execution completed, exit code is 0 Despite a horrible mistake in the source code, the program happily executed and produced the wrong answer. This is a good example to keep in mind, whether or not you are compiling with the debugging options-- in fact, especially if you're not. For good reason, modern compilers and operating systems are generally designed to run fast, without a lot of checking that could alert you to trouble but slow down execution. Truly egregious programming errors that once produced evil error messages now produce nothing but wrong answers. Taking the square root of a negative number is no problem, you can happily go on computing and notice nothing until you look at your output file and find it filled with "NaN" (Not a Number). Here you are lucky, because you have a visible sign that something is wrong. It's just as likely that you will simply get a wrong answer without warning. If this sounds utterly terrifying, it should. It's up to you to test your code well enough to know that it is doing the right thing, because the computer is NOT going to tell you. Remember, the only way to test a code is to run it on known problems with known solutions. JUST BECAUSE YOUR CODE RUNS DOESN'T MEAN IT'S PRODUCING CORRECT RESULTS. (Incidentally, one of the most common ways to go astray (and, in fact, what I did in the example above) is to call a subroutine and pass to it a different list of variables than what it is expecting. If you are fortunate you will get a 'segmentation fault' and the program will crash.) 9. Some good habits for scientific computing. 1. Always write code in small pieces and TEST EACH PIECE THOROUGHLY on fake data (where you know what the answer is supposed to be) before going on to the next piece or applying it to real data. 2. See #1. 3. See #2. 4. Notice a pattern here? 5. Most machines have enough memory so always declare floating-point variables as "double precision" rather than "real" unless you have a really compelling reason not to, and you'll hardly ever have to worry about round-off error. Always use the double precision version of functions, e.g., dsqrt rather than sqrt, and always specify constants as double precision, e.g., 2.d0 instead of 2. 6. Use variable names that mean something relevant (like "mass" and "luminosity" as opposed to "m" and "l"), so that somebody else can understand what your code does.