In this lab, you’ll get to use grep
and sed
to run some regular
expressions you’ve written. Writing good regular expressions is like
solving a puzzle!
You’ll clone your lab repository from GitLab as usual.
You should make an answers text file (name it answers.txt
) to answer
assignment questions.
Getting stuck?
Try using https://regex101.com/ to build your regular
expressions. You can test them out in your browser before you use them with sed
and grep
.
https://regex101.com helps you build PCRE1 regular expressions. We
can tell grep
that we’re giving it a PCRE using the -P
flag.
# Here's a PCRE example
$ echo "banana" | grep -P 'an+'
banana
# Here's the non-PCRE equivalent
$ echo "banana" | grep 'an\+'
banana
# Here's another PCRE example. This time there's grouping.
$ echo "banana" | grep -P 'b(an)+a'
# Here's the non-PCRE equivalent
$ echo "banana" | grep 'b\(an\)\+a'
banana
sed
can also understand PCRE-ish2 regular expressions. It needs the
-r
flag.
# Here's a PCRE-ish example
$ echo 'bananananana' | sed -r 's/(an)+/an/'
bana
# Here's the non-PCRE equivalent.
$ echo 'bananananana' | sed 's/\(an\)\+/an/'
bana
You’re going to need some input files for this assignment. Here are some links, so that you can download them.
Don’t add these to your repository. Consider adding a couple of
lines to your .gitignore
file to avoid committing them by accident.
You can download them real quick using the following shell script:
# Just copy/paste this in your shell to download the files to your
# current directory. There's no need to create a .sh file for this.
for NAME in story-plain.txt story-space.txt phonebook.txt numbers.log
do
wget "http://dsl.mwisely.xyz/labs/5/data/$NAME"
done
So it turns out that all that C++ could be done with grep
instead!
Write a command to filter out lines beginning with \#
. (You
can test it with story-plain.txt
.)
Modify the previous command to filter out lines beginning with
whitespace and then \#
, as well. (You can test it with
story-space.txt
.)
Write a sed
command to remove all vowels from its input.
Jake the dog runs a web service that generates random numbers3. Every time a user requests a number, a log message is recorded in a log file with the following format:
YYYY-MM-DD HH:MM:SS number
Jake wants to know how many requests in numbers.log
were made in
January of any year. (Hint: grep -c REGEX
will count the number of
lines of input that match the regular expression.)
Jake the dog has a phonebook of his customers (phonebook.txt
) that
contains phone numbers4. However, those numbers are not
written in a consistent format… Now he wants them to be formatted in
a consistent manner.
Write a grep
command to match all the phone numbers in the file.
Write a sed
command to format all the numbers like so: (ddd) ddd-dddd
sed
doesn’t have \d
, so you’ll have to use [0-9]
instead.Note: Your regular expression for this problem should not
exhaustively match the numbers (e.g., (555) 123 -
4567|573-555-1234|314 342 6678
). That’s silly.
answers.txt
The one exception: If a command is greater than 80 characters in length, you can keep going. It is easier to grade a line that is a little too long than a line like this:
cat file.txt
| grep "boy howdy look how long this \
expression is i bet that it's going to go more than 100 characters \
oh look it did time to make my code look terrible but at least \
it's fewer than 80 chars."
As with previous labs, your git repo on http://git-classes.mst.edu is your submission. Don’t forget to commit and push all relevant files. Make sure you see everything you expect on GitLab!
We expect to see the following files on your master branch:
README.md
answers.txt
.gitignore
, if you chose to make one