Introduction

In this lab, you’ll get to use grep and sed to run some regular expressions you’ve written. Writing good regular expressions is like solving a puzzle!

You’ll clone your lab repository from GitLab as usual.

You should make an answers text file (name it answers.txt) to answer assignment questions.

WHAT IS HAPPENING?

Getting stuck?

Try using https://regex101.com/ to build your regular expressions. You can test them out in your browser before you use them with sed and grep.

https://regex101.com helps you build PCRE1 regular expressions. We can tell grep that we’re giving it a PCRE using the -P flag.

# Here's a PCRE example
$ echo "banana" | grep -P 'an+'
banana

# Here's the non-PCRE equivalent
$ echo "banana" | grep 'an\+'
banana

# Here's another PCRE example. This time there's grouping.
$ echo "banana" | grep -P 'b(an)+a'

# Here's the non-PCRE equivalent
$ echo "banana" | grep 'b\(an\)\+a'
banana

sed can also understand PCRE-ish2 regular expressions. It needs the -r flag.

# Here's a PCRE-ish example
$ echo 'bananananana' | sed -r 's/(an)+/an/'
bana

# Here's the non-PCRE equivalent.
$ echo 'bananananana' | sed 's/\(an\)\+/an/'
bana

Problem 0: Some input files

You’re going to need some input files for this assignment. Here are some links, so that you can download them.

Don’t add these to your repository. Consider adding a couple of lines to your .gitignore file to avoid committing them by accident.

You can download them real quick using the following shell script:

# Just copy/paste this in your shell to download the files to your
# current directory. There's no need to create a .sh file for this.

for NAME in story-plain.txt story-space.txt phonebook.txt numbers.log
do
    wget "http://dsl.mwisely.xyz/labs/5/data/$NAME"
done

Problem 1: That filter problem… AGAIN

So it turns out that all that C++ could be done with grep instead!

  1. Write a command to filter out lines beginning with \#. (You can test it with story-plain.txt.)

  2. Modify the previous command to filter out lines beginning with whitespace and then \#, as well. (You can test it with story-space.txt.)

Problem 2: Lk m, n vwls!

Write a sed command to remove all vowels from its input.

Problem 3: Counting Lines

Jake the dog runs a web service that generates random numbers3. Every time a user requests a number, a log message is recorded in a log file with the following format:

YYYY-MM-DD HH:MM:SS number

Jake wants to know how many requests in numbers.log were made in January of any year. (Hint: grep -c REGEX will count the number of lines of input that match the regular expression.)

Problem 4: Phone numbers

Jake the dog has a phonebook of his customers (phonebook.txt) that contains phone numbers4. However, those numbers are not written in a consistent format… Now he wants them to be formatted in a consistent manner.

  1. Write a grep command to match all the phone numbers in the file.

  2. Write a sed command to format all the numbers like so: (ddd) ddd-dddd

    • Hint: backreferences are your friend!
    • Hint 2: Sadly, sed doesn’t have \d, so you’ll have to use [0-9] instead.

Note: Your regular expression for this problem should not exhaustively match the numbers (e.g., (555) 123 - 4567|573-555-1234|314 342 6678). That’s silly.

Format of answers.txt

  • Make sure no line goes over 80 columns in length.
    • The one exception: If a command is greater than 80 characters in length, you can keep going. It is easier to grade a line that is a little too long than a line like this:

      cat file.txt
          | grep "boy howdy look how long this \
          expression is i bet that it's going to go more than 100 characters \
          oh look it did time to make my code look terrible but at least \
          it's fewer than 80 chars."
      
  • If you are asked for a command, I should be able to copy/paste your answer into bash and run it.
    • You’re welcome to include an explanation, but make sure the complete command is there, too.
    • Put the command on a line by itself, so there is no confusion where it starts and stops.
  • Label your responses clearly. There should be crystal clear which response goes with which part of which problem.

Epilogue

As with previous labs, your git repo on http://git-classes.mst.edu is your submission. Don’t forget to commit and push all relevant files. Make sure you see everything you expect on GitLab!

We expect to see the following files on your master branch:

  • README.md
  • answers.txt
  • .gitignore, if you chose to make one
  1. Perl Compatible Regular Expressions 

  2. sed calls it “extended regular expressions” 

  3. It’s a lucrative business. Their IPO is next month. 

  4. Surprise, surprise.