Book Excerpt: Linux Programming by Example, Part 2
In Part 1 of this article, we began a tour of the V7 Unix ls program. We looked at how it parses options and at how it maintained the flist array as a two-level stack of files to print information about. We then saw how ls found and printed out the long format information about each file. This article continues the tour through the code.
238 long _long nblock(long size)_ 239 nblock(size) 240 long size; 241 { 242 return((size+511)>>9); 243 }
The nblock() function reports how many disk blocks the file uses. This calculation is based on the file's size as returned by stat(). The V7 block size was 512 bytes--the size of a physical disk sector.
The calculation on line 242 looks a bit scary. The >>9 is a right-shift by nine bits. This divides by 512, to give the number of blocks. (On early hardware, a right-shift was much faster than division.) So far, so good. Now, a file of even one byte still takes up a whole disk block. However, 1 / 512 comes out as zero (integer division truncates), which is incorrect. This explains the size+511. By adding 511, the code ensures that the sum produces the correct number of blocks when it is divided by 512.
This calculation is only approximate, however. Very large files also have indirect blocks. Despite the claim in the V7 _ls_(1) manpage, this calculation does not account for indirect blocks.
Furthermore, consider the case of a file with large holes (created by seeking way past the end of the file with lseek()). Holes don't occupy disk blocks; however, this is not reflected in the size value. Thus, the calculation produced by nblock(), while usually correct, could produce results that are either smaller or larger than the real case.
For these reasons, the st_blocks member was added into the struct stat at 4.2 BSD, and then picked up for System V and POSIX.
245 int m1[] = { 1, S_IREAD>>0, 'r', '-' }; 246 int m2[] = { 1, S_IWRITE>>0, 'w', '-' }; 247 int m3[] = { 2, S_ISUID, 's', S_IEXEC>>0, 'x', '-' }; 248 int m4[] = { 1, S_IREAD>>3, 'r', '-' }; 249 int m5[] = { 1, S_IWRITE>>3, 'w', '-' }; 250 int m6[] = { 2, S_ISGID, 's', S_IEXEC>>3, 'x', '-' }; 251 int m7[] = { 1, S_IREAD>>6, 'r', '-' }; 252 int m8[] = { 1, S_IWRITE>>6, 'w', '-' }; 253 int m9[] = { 2, S_ISVTX, 't', S_IEXEC>>6, 'x', '-' }; 254 255 int *m[] = { m1, m2, m3, m4, m5, m6, m7, m8, m9}; 256 257 pmode(aflag) void pmode(int aflag) 258 { 259 register int **mp; 260 261 flags = aflag; 262 for (mp = &m[0]; mp < &m[sizeof(m)/sizeof(m[0])];) 263 select(*mp++); 264 } 265 266 select(pairp) void select(register int *pairp) 267 register int *pairp; 268 { 269 register int n; 270 271 n = *pairp++; 272 while (--n>=0 && (flags&*pairp++)==0) 273 pairp++; 274 putchar(*pairp); 275 }
Lines 245-275 print the file's permissions. The code is compact and rather elegant; it requires careful study.
Lines 245-253: The arrays m1 through m9 encode the permission bits to check for along with the corresponding characters to print. There is one array per character to print in the file mode. The first element of each array is the number of (permission, character) pairs encoded in that particular array. The final element is the character to print in the event that none of the given permission bits are found.
Note also how the permissions are specified as I_READ<<0, I_READ>>3, I_READ<<6 and so on. The individual constants for each bit (S_IRUSR, S_IRGRP, etc.) had not been invented yet. (See Table 4.5 in my book.)
Line 255: The m array points to each of the m1 through m9 arrays.
Lines 257-264: The pmode() function first sets the global variable flags to the passed-in parameter aflag. It then loops through the m array, passing each element to the select() function. The passed-in element represents one of the m1 to m9 arrays.
Lines 266-275: The select() function understands the layout of each m1 through m9 array. n is the number of pairs in the array (the first element); line 271 sets it. Lines 272-273 look for permission bits, checking the global variable flags set previously on line 261.
Note the use of the ++ operator, both in the loop test and in the loop body. The effect is to skip over pairs in the array as long as the permission bit in the first element of the pair is not found in flags.
When the loop ends, either the permission bit has been found, in which case pairp points at the second element of the pair, which is the correct character to print, or it has not been found, in which case pairp points at the default character. In either case, line 274 prints the character that pairp points to.
A final point worth noting is that in C, character constants (such as x) have type int, not char. This is different in C++: there, character constants do have type char. This difference does not affect this particular code. So there's no problem putting such constants into an integer array; everything works correctly.
277 char * _char *makename(char *dir, char *file)_ 278 makename(dir, file) 279 char *dir, *file; 280 { 281 static char dfile[100]; 282 register char *dp, *fp; 283 register int i; 284 285 dp = dfile; 286 fp = dir; 287 while (*fp) 288 *dp++ = *fp++; 289 *dp++ = '/'; 290 fp = file; 291 for (i=0; i<DIRSIZ; i++) 292 *dp++ = *fp++; 293 *dp = 0; 294 return(dfile); 295 }
Lines 277-295 define the makename() function. Its job is to concatenate a directory name and a filename, separated by a slash character, and produce a string. It does this in the static buffer dfile. Note that dfile is only 100 characters long and that no error checking is done.
The code itself is straightforward, copying characters one at a time. makename() is used by the readdir() function.
297 readdir(dir) _void readdir(char*dir)_ 298 char *dir; 299 { 300 static struct direct dentry; 301 register int j; 302 register struct lbuf *ep; 303 304 if ((dirf = fopen(dir, "r")) == NULL) { 305 printf("%s unreadable\n", dir); 306 return; 307 } 308 tblocks = 0; 309 for(;;) { 310 if (fread((char *)&dentry, sizeof(dentry), 1, dirf) != 1) 311 break; 312 if (dentry.d_ino==0 313 || aflg==0 && dentry.d_name[0]=='.' && (dentry.d_name[1]=='\0' 314 || dentry.d_name[1]=='.' && dentry.d_name[2]=='\0')) 315 continue; 316 ep = gstat(makename(dir, dentry.d_name), 0); 317 if (ep==NULL) 318 continue; 319 if (ep->lnum != -1) 320 ep->lnum = dentry.d_ino; 321 for (j=0; j<DIRSIZ; j++) 322 ep->ln.lname[j] = dentry.d_name[j]; 323 } 324 fclose(dirf); 325 }
Lines 297-325 define the readdir() function, whose job is to read the contents of directories named on the command line.
Lines 304-307 open the directory for reading, returning if fopen() fails. Line 308 initializes the global variable tblocks to 0. This was used earlier (lines 153-154) to print the total number of blocks used by files in a directory.
Lines 309-323 are a loop that reads directory entries and adds them to the flist array. Lines 310-311 read one entry, exiting the loop upon end-of-file.
Lines 312-315 skip uninteresting entries. If the inode number is zero, this slot isn't used. Otherwise, if -a was not given and the filename is either . or .., skip it.
Lines 316-318 call gstat() with the full name of the file, and a second argument of false, indicating that it's not from the command line. gstat() updates the global lastp pointer and the flist array. A NULL return value indicates some sort of failure.
Lines 319-322 save the inode number and name in the struct lbuf. If ep->lnum comes back from gstat() set to -1, it means that the stat() operation on the file failed. Finally, line 324 closes the directory.
The following function, gstat() (lines 327-398), is the core function for the operation of retrieving and storing file information.
327 struct lbuf * _struct lbuf *gstat(char *file, int argfl)_ 328 gstat(file, argfl) 329 char *file; 330 { 331 extern char *malloc(); 332 struct stat statb; 333 register struct lbuf *rep; 334 static int nomocore; 335 336 if (nomocore) _Ran out of memory earlier_ 337 return(NULL); 338 rep = (struct lbuf *)malloc(sizeof(struct lbuf)); 339 if (rep==NULL) { 340 fprintf(stderr, "ls: out of memory\n"); 341 nomocore = 1; 342 return(NULL); 343 } 344 if (lastp >= &flist[NFILES]) { 345 static int msg; _Check whether too many files given_ 346 lastp--; 347 if (msg==0) { 348 fprintf(stderr, "ls: too many files\n"); 349 msg++; 350 } 351 } 352 *lastp++ = rep; _Fill in information_ 353 rep->lflags = 0; 354 rep->lnum = 0; 355 rep->ltype = '-'; _Default file type_
The static variable nomocore [sic] indicates that malloc() failed upon an earlier call. Since it's static, it's automatically initialized to 0 (that is, false). If it's true upon entry, gstat() just returns NULL. Otherwise, if malloc() fails, ls prints an error message, sets nomocore to true, and returns NULL (lines 334-343).
Lines 344-351 make sure that there is still room left in the flist array. If not, ls prints a message (but only once; note the use of the static variable msg), and then reuses the last slot in flist.
Line 352 makes the slot lastp points to point to the new struct lbuf (rep). This also updates lastp, which is used for sorting in main() (lines 142 and 152). Lines 353-355 set default values for the flags, inode number, and type fields in the struct lbuf.
356 if (argfl || statreq) { 357 if (stat(file, &statb)<0) { _stat() failed_ 358 printf("%s not found\n", file); 359 statb.st_ino = -1; 360 statb.st_size = 0; 361 statb.st_mode = 0; 362 if (argfl) { 363 lastp--; 364 return(0); 365 } 366 } 367 rep->lnum = statb.st_ino; _stat() OK, copy info_ 368 rep->lsize = statb.st_size; 369 switch(statb.st_mode&S_IFMT) { 370 371 case S_IFDIR: 372 rep->ltype = 'd'; 373 break; 374 375 case S_IFBLK: 376 rep->ltype = 'b'; 377 rep->lsize = statb.st_rdev; 378 break; 379 380 case S_IFCHR: 381 rep->ltype = 'c'; 382 rep->lsize = statb.st_rdev; 383 break; 384 } 385 rep->lflags = statb.st_mode & ~S_IFMT; 386 rep->luid = statb.st_uid; 387 rep->lgid = statb.st_gid; 388 rep->lnl = statb.st_nlink; 389 if(uflg) 390 rep->lmtime = statb.st_atime; 391 else if (cflg) 392 rep->lmtime = statb.st_ctime; 393 else 394 rep->lmtime = statb.st_mtime; 395 tblocks += nblock(statb.st_size); 396 } 397 return(rep); 398 }
Lines 356-396 handle the call to stat(). If this is a command-line argument or if statreq is true because of an option, the code fills in the struct lbuf as follows:
Lines 357-366: Call stat() and if it fails, print an error message and set values as appropriate, then return NULL (expressed as 0).
Lines 367-368: Set the inode number and size fields from the struct stat if the stat() succeeded.
Lines 369-384: Handle the special cases of directory, block device, and character device. In all cases the code updates the ltype field. For devices, the lsize value is replaced with the ltype field. For devices, the lsize value is replaced with the st_rdev value.
Lines 385-388: Fill in the lflags, luid, lgid and lnl fields from the corresponding fields in the struct stat. Line 385 removes the file-type bits, leaving the 12 permissions bits (read/write/execute for user/group/other, and setuid, setgid, and save-text).
Lines 389-394: Based on command-line options, use one of the three time fields from the struct stat for the lmtime field in the struct lbuf.
Line 395: Update the global variable tblocks with the number of blocks in the file.
400 compar(pp1, pp2) _int compar(struct lbuf_ **pp1, 401 struct lbuf **pp1, **pp2; _struct lbuf_ **pp2) 402 { 403 register struct lbuf *p1, *p2; 404 405 p1 = *pp1; 406 p2 = *pp2; 407 if (dflg==0) { 408 if (p1->lflags&ISARG && p1->ltype=='d') { 409 if (!(p2->lflags&ISARG && p2->ltype=='d')) 410 return(1); 411 } else { 412 if (p2->lflags&ISARG && p2->ltype=='d') 413 return(-1); 414 } 415 } 416 if (tflg) { 417 if(p2->lmtime == p1->lmtime) 418 return(0); 419 if(p2->lmtime > p1->lmtime) 420 return(rflg); 421 return(-rflg); 422 } 423 return(rflg * strcmp(p1->lflags&ISARG? p1->ln.namep: p1->ln.lname, 424 p2->lflags&ISARG? p2->ln.namep: p2->ln.lname)); 425 }
The compar() function is dense: there's a lot happening in little space. The first thing to remember is the meaning of the return value: A negative value means that the first file should sort to an earlier spot in the array than the second, zero means the files are equal, and a positive value means that the second file should sort to an earlier spot than the first.
The next thing to understand is that ls prints the contents of directories after it prints information about files. Thus the result of sorting should be that all directories named on the command line follow all files named on the command line.
Finally, the rflg variable helps implement the -r option, which reverses the sorting order. It is initialized to 1 (line 30). If -r is used, rflg is set to -1 (lines 89-91).
The following pseudocode describes the logic of compar(); the line numbers in the left margin correspond to those of ls.c:
407 if LS HAS TO READ DIRECTORIES # dflg == 0 408 if P1 IS A COMMAND-LINE ARG AND P1 IS A DIRECTORY 409 if P2 IS NOT A COMMAND-LINE ARG AND IS NOT A DIRECTORY 410 return 1 # first comes after second else FALL THROUGH TO TIME TEST 411 else # p1 is not a command-line directory 412 if P2 IS A COMMAND-LINE ARG AND IS A DIRECTORY 413 return -1 # first comes before second else FALL THROUGH TO TIME TEST 416 if SORTING IS BASED ON TIME # tflg is true # compare times: 417 if P2'S TIME IS EQUAL TO P1'S TIME 418 return 0 419 if P2'S TIME > P1'S TIME 420 RETURN THE VALUE OF RFLG (POSITIVE OR NEGATIVE) # p2's time < p1's time 421 RETURN OPPOSITE OF RFLG (NEGATIVE OR POSITIVE) 423 Multiply rflg by the result of strcmp() 424 on the two names and return the result
The arguments to strcmp() on lines 423-424 look messy. What's going on is that different members of the ln union in the struct lbuf must be used, depending on whether the filename is a command-line argument or was read from a directory.
The V7 ls is a relatively small program, yet it touches on many of the fundamental aspects of Unix programming: file I/O, file metadata, directory contents, users and groups, time and date values, sorting and dynamic memory management.
The most notable external difference between V7 ls and modern ls is the treatment of the -a and -l options. The V7 version has many fewer options than do modern versions; a noticeable lack is the -R recursive option.
The management of flist is a clean way to use the limited memory of the PDP-11 architecture yet still provide as much information as possible. The struct lbuf nicely abstracts the information of interest from the struct stat; this simplifies the code considerably. The code for printing the nine permission bits is compact and elegant.
Some parts of ls use surprisingly small limits, such as the upper bound of 1024 on the number of files or the buffer size of 100 in makename().
Arnold Robbins is a professional programmer and technical author. He is the long-time maintainer of gawk, the GNU Project's version of the Awk programming language. He is the author of numerous well-known technical books, such as Linux Programming by Example: The Fundamentals, recently published by Prentice Hall, as well as Unix In A Nutshell, Learning the vi Editor and Effective awk Programming, published by O'Reilly Media Inc. He is happily married with four wonderful children. Arnold is an amateur Talmudist and particularly enjoys studying the Jerusalem Talmud. He can be reached at arnold@skeeve.com.