In this lab, you’ll get to use
sed to run some regular
expressions you’ve written. Writing good regular expressions is like
solving a puzzle!
You’ll clone your lab repository from GitLab as usual.
You should make an answers text file (name it
answers.txt) to answer
Try using https://regex101.com/ to build your regular
expressions. You can test them out in your browser before you use them with
# Here's a PCRE example $ echo "banana" | grep -P 'an+' banana # Here's the non-PCRE equivalent $ echo "banana" | grep 'an\+' banana # Here's another PCRE example. This time there's grouping. $ echo "banana" | grep -P 'b(an)+a' # Here's the non-PCRE equivalent $ echo "banana" | grep 'b\(an\)\+a' banana
sed can also understand PCRE-ish2 regular expressions. It needs the
# Here's a PCRE-ish example $ echo 'bananananana' | sed -r 's/(an)+/an/' bana # Here's the non-PCRE equivalent. $ echo 'bananananana' | sed 's/\(an\)\+/an/' bana
You’re going to need some input files for this assignment. Here are some links, so that you can download them.
Don’t add these to your repository. Consider adding a couple of
lines to your
.gitignore file to avoid committing them by accident.
You can download them real quick using the following shell script:
# Just copy/paste this in your shell to download the files to your # current directory. There's no need to create a .sh file for this. for NAME in story-plain.txt story-space.txt phonebook.txt numbers.log do wget "http://dsl.mwisely.xyz/labs/5/data/$NAME" done
So it turns out that all that C++ could be done with
Write a command to filter out lines beginning with
can test it with
Modify the previous command to filter out lines beginning with
whitespace and then
\#, as well. (You can test it with
sed command to remove all vowels from its input.
Jake the dog runs a web service that generates random numbers3. Every time a user requests a number, a log message is recorded in a log file with the following format:
YYYY-MM-DD HH:MM:SS number
Jake wants to know how many requests in
numbers.log were made in
January of any year. (Hint:
grep -c REGEX will count the number of
lines of input that match the regular expression.)
Jake the dog has a phonebook of his customers (
contains phone numbers4. However, those numbers are not
written in a consistent format… Now he wants them to be formatted in
a consistent manner.
grep command to match all the phone numbers in the file.
sed command to format all the numbers like so:
\d, so you’ll have to use
Note: Your regular expression for this problem should not
exhaustively match the numbers (e.g.,
(555) 123 -
4567|573-555-1234|314 342 6678). That’s silly.
The one exception: If a command is greater than 80 characters in length, you can keep going. It is easier to grade a line that is a little too long than a line like this:
cat file.txt | grep "boy howdy look how long this \ expression is i bet that it's going to go more than 100 characters \ oh look it did time to make my code look terrible but at least \ it's fewer than 80 chars."
As with previous labs, your git repo on http://git-classes.mst.edu is your submission. Don’t forget to commit and push all relevant files. Make sure you see everything you expect on GitLab!
We expect to see the following files on your master branch:
.gitignore, if you chose to make one