🔍
Grep exercise
Problem statement
Write a command line program that implements Unix grep
like functionality. Read the man page for grep if you don't know what it does. Implement all options such as -i
(case insensitive search), -r
(recursive search across directories), -A, -B, -C
(show lines before and after the match, and only show count of matches). Also, support reading the grep input from STDIN.
The program should have the following features.
Story 1
Ability to print lines containing a search string in a file. The program should print the results on standard output (STDOUT). Feel free to assume case-sensitive and exact word matches of the string for now.
Here's how a sample grep command will be executed along with the sample output it may produce.
Assumptions:
Our grep utility is packaged as a standalone binary (
./mygrep
). Based on the programming language you're using, you may be able to create a standalone utility (e.g. in Golang). If not, use any CLI execution method in your language (e.g.java -jar grep.jar
for Java)
The file
filename.txt
is found in the current directory and contains multiple lines, two of which contains the word "search_string". These lines are printed on the output.
The output is printed on STDOUT
Expectations:
Write down test cases for zero, one and many matches of search string in a file
Your code should handle errors when file doesn't exist, or is not readable or is a directory instead of a file.
Story 2
Ability to search for a string from standard input (STDIN) and print the results on standard output (STDOUT). Your program should be able to read the text from standard input.
should produce the following output
Assumptions:
Assume case-sensitive search. In the above example,
Foobar
is not printed in the output since we are performing a case-sensitive search.
We are NOT matching exact full words. Instead, grep works similarly to string’s contains functionality. In the above example,
food
is matched in the output even if we search forfoo
.
You need to assume some way to indicate the "end of input", e.g., using (Ctrl + d), i.e.,
^D
character above shows that there's no more input and the program should execute and print the results. Your program should stop accepting more input it reads this escape character (Ctrl + d).
Expectations:
How will you write test cases for this code? Note that, you can't pause for user input in the test case.
Can you reuse code from the previous story? How will you refactor your existing code to make this code reuse possible?
Story 3
Ability to write output to a file instead of a standard output. When a -o filename
flag is passed, the output from the program should be added to the file instead of printing it on standard output.
Example:
should create an out.txt file with the output from mygrep. For example,
Assumptions:
Assume the output file doesn't exist in the current directory. If a file already exists, program should print an appropriate error.
Expectations:
Reuse code from previous stories as much as possible. Make your code modular and extensible.
Write a test case for this story.
Story 4
Ability to perform case-sensitive grep using -i
flag. For e.g.
should produce the following output
Assumptions:
Please note that various flags can be combined. i.e., We may use
./mygrep -i foo filename.txt -o outfile.txt
in a single command execution. This should perform case insensitive search for wordfoo
in the filefilename.txt
and the output should be saved inoutfile.txt
.
Expectations:
As previously mentioned, we need to reuse code as much as possible. We should also write modular and extensible code.
Story 5
Ability to search for a string recursively in any of the files in a given directory. When searching in multiple files, the output should indicate the file name (like the grep -r
command works). Also, all the output from one file should be grouped together in the final output. (in other words, output from two files shouldn't be interleaved in the final output being printed). A sample invocation of the program could be as follows:
Assumptions:
Assume the directory contains only text files (no binary files like images, pdf, mp4, etc).
If no matches are found in a file, the file shouldn't be included in the output. i.e., The program should generate output that is similar to the one produced by
grep
command. So in case of any doubts, refergrep
command manual page.
Story 6, 7, 8
Implement other grep
options
A
print n lines after the match
B
print n lines before the match
C
only print count of matches instead of actual matched lines
Feel free to make suitable assumptions if needed, and ensure to document them in README.md.