Word Frequency Count: Part 3
Last modified
Background
Implement ‘catlike’ interface pattern
-
user1
: implement and test ‘catlike’ iterface pattern as shown in class.Any additional command line arguments that do not start with
-
should be treated as file names to read from. Attempt to open each argument as a file and append the words contained into a single word list (replace the call tosplit_words
with a call toappend_words
, implementation shown below.If no additional arguments are given then the program should work exactly the same as it did for part 2, i.e. read from standard input.
void append_words(word_list_t wl, FILE* fp) { /* append words found in fp to list wl */ token_inject(fp, " \n\t.,?!-", wl, word_list_injector, TOK_INJECT_COMPACT); } int main(int argc, char *argv[]) { /* variable declarations */ word_list_t wl; FILE* fp; /* call getopt in while loop */ wl = word_list_create(100); /* used to be done in split_words, * which we are no longer using */ /* optind will contain the offset from the start of the original * argv to the first argument not consumed by getopt */ argv += optind; argc -= optind; /* now use argv as a list of remaining command line arguments. */ if ( argc > 0 ) { /* any arguments left over after getopt should be treated as * file names */ do { fp = fopen(*argv, "r"); /* TODO: user2 implement error checking */ append_words(wl, fp); } while (*++argv); } else { /* no additional arguments means read from standard input */ append_words(wl, stdin); } /* remainder of program shouldn't need to change from part 2, * but since append_words replaces split_words, you won't need * that line */ }
update the usage message to reflect the new functionality:
$ ./wordfreq -h Usage: ./wordfreq [-hr] [-k N] [FILE ...]
-
user2
: afteruser1
has completed their work, add error checking to handle filenames that can not be read.If a given command line argument referrs to a file that does not exist, or can not be read then print a cooresponding message to standard error and continue to the next file in the list.
Assuming files named
file1
andfile2
exist and are readable, but no file namednot_a_file
exists, then the standard output of$ ./wordfreq file1 file2
and
$ ./wordfreq file1 not_a_file file2
should be identical, but the latter command should also produce a message on standard error:
not_a_file: No such file or directory
Remember, don’t do more work than you have to. You don’t need to know the reason
fopen
failed to open a file, just that it will set the globalerrno
with some value. Usestrerror
to generate a human-readable message from the globalerrno
.#include <stdio.h> #include <string.h> ... fprintf(stderr, "%s: %s\n", *argv, strerror(errno));
Alternatively, you could use the
warn
function defined inerr.h
.#include <err.h> ... warn("%s");
Compiling and Linking
If you receive linking errors about ‘token_inject’ being undefined, you may have to tell the linker to explicitly link to the streamtoken
library:
$ clang -o wordfreq -lanalytics -lstreamtoken main.o
The extra -lstreamtoken
option is not needed if compiling with
clang 3.4 on the ece2524 VM but I found it necessary when using
clang 3.3 on my local machine.
Submission
The source files should exist in their own git repository, if you change to the directory containing your source files and run ls -a
you should see a directory named .git
. If not, run git init
to initialize a git repository in the current directory. You should only run git init
once for each new project.
Push your git repository to the remote at git@ece2524.ece.vt.edu:USER/wordfreq.git
where USER
is your git user name.
If you have initialized a new repo but have not added a remote yet:
$ git remote add origin git@ece2524.ece.vt.edu:USER/wordfreq.git
where is your git user name.
If you have already added a remote named origin
, but the URL is incorrect, replace add
with set-url
in the above command. You can always check that remotes you have added by running git remote -v
.
Remember, if this is the first time pushing to a new remote you need to specify a destination branch (usually `master`). Using the `-u` option will save this default destination for future pushes.
$ git push -u origin master
Testing
Feature repo path: features/wordfreq
The following features will be tested using cucumber:
@compile
Feature: Compile
Background:
Given I am working from a clean git clone to "wordfreq"
And I cd to "wordfreq"
Scenario: Clean Repo
Then a file named "wordfreq" should not exist
Scenario: Compile
When I successfully run `clang -c -o main.o main.c`
Then a file named "main.o" should exist
When I successfully run `clang -o wordfreq -lanalytics main.o`
Then a file named "wordfreq" should exist
@part3 @no-clobber
Feature: catlike interface pattern
Background:
Given I cd to "wordfreq"
And a file named "fox.txt" with:
"""
the quick brown fox jumped over the lazy cow.
but the cow jumped over the moon!
what does the fox say?
"""
And a file named "numbers" with:
"""
four two four one
two four three three
three four
"""
Scenario: One file argument
When I run the shell command "./wordfreq numbers"
Then its stdout should contain exactly 4 lines
And its stdout lines should match:
| ^\s*4\s+four$ |
| ^\s*3\s+three$ |
| ^\s*2\s+two$ |
| ^\s*1\s+one$ |
Scenario: Two file arguments
When I run the shell command "./wordfreq numbers fox.txt"
Then its stdout should contain exactly 10 lines
And its stdout lines should match:
| ^\s*5\s+the$ |
| ^\s*4\s+four$ |
| ^\s*3\s+three$ |
| ^\s*2\s+cow$ |
| ^\s*2\s+fox$ |
| ^\s*2\s+jumped$ |
| ^\s*2\s+over$ |
| ^\s*2\s+two$ |
| ^\s*1\s+brown$ |
| ^\s*1\s+but$ |
Scenario: A bad file argument
Given the file "not_a_file" should not exist
When I run the shell command "./wordfreq numbers not_a_file fox.txt"
Then its stdout should contain exactly 10 lines
And its stdout lines should match:
| ^\s*5\s+the$ |
| ^\s*4\s+four$ |
| ^\s*3\s+three$ |
| ^\s*2\s+cow$ |
| ^\s*2\s+fox$ |
| ^\s*2\s+jumped$ |
| ^\s*2\s+over$ |
| ^\s*2\s+two$ |
| ^\s*1\s+brown$ |
| ^\s*1\s+but$ |
And its stderr should contain exactly 1 line
And its stderr should contain "not_a_file: No such file or directory"
@part3 @no-clobber
Feature: Command Line Arguments
Background:
Given I cd to "wordfreq"
And a file named "fox.txt" with:
"""
the quick brown fox jumped over the lazy cow.
but the cow jumped over the moon!
what does the fox say?
"""
And a file named "numbers" with:
"""
four two four one
two four three three
three four
"""
Scenario: "-k argument error checking"
When I run the shell command "./wordfreq -k five < fox.txt"
Then its stdout should contain exactly 0 lines
And its stderr should contain exactly 1 line
And its stderr should contain "No digits were found"
You can run the tests manually with
$ cucumber /usr/share/features/wordfreq
when logged in to your shell account. This command assumes your current working directory is your project directory.