Homework 7: Hashing

Before You Start,

Welcome to HW7: Hashing!

Spec Edits Will Be Listed Here:

Problem 1: Oh Nana, What’s My Num? [20m]

Help Rihanna hash her num! The way you help her is by answering the questions in Nana.java in the designated corresponding methods of Nana.java. Remember, you're encouraged to do the questions in a nonconventional order! If you're failing autograder tests, check to ensure the formatting check, the formatting check test must pass, or else the other tests will always fail.

If you need a quick 1-page refresher on what hash codes are, take a look at Chris's Hashing Refresher for HW7 Purposes. This is also available under the course website's resources page.

Problem 2: Hashing A Card [5m]

See the given skeleton file Card.java and implement both the .equals() and the .hashCode() functions. The hash function should never have any collisions (aka must be a perfect hash function). Although it won't graded, you should write your own tests to ensure that your code works.

No seriously, you should write some tests. Please!

Okay fine, let's compromise; we'll give you the submit/hw7 autograder test file for free: CardAutoGrader.java. You can run this autograder locally from your terminal just like a normal java program: javac CardAutoGrader.java && java CardAutoGrader && echo wheee!. Please feel free to use this to your advantage and cheat the autograder by hardcoding solutions that pass the autograder. But in return, you should test your own code for the other problems!

Problem 3: Fibonacci Optimization [20m]

See the given skeleton file FibonacciMemo.java. Observe that fibNoMemo is a classic recursive implementation of calculating Fibonacci numbers.

And it sucks.

If you try calculating the 44th Fibonacci, you'll see that it's very slow. Let's fix this! We have two options:

  1. Optimize this method entirely by using an iterative solution, because iterative Fibonacci is a lot faster, and not limited by fear of StackOverflow errors.
  2. Optimize this method partially by introducing memoization. Memoization means the program will remember previous inputs and outputs in hopes of computing future inputs more quickly.

You actually do not get to choose; you must do #2.

Write fibMemo, the optimized recursive Fibonacci calculator. You should implement your memoization so that your program will avoid calculating redundant values. You may make additional private variables as necessary (hint: one way is to make a static variable). Here are some Fibonacci pairs for your convenience:

Note: The autograder will ask for 1000 random Fibonacci numbers Fx 1 < x < 9000 (ish)and ensure that your program calculates it quickly enough.

Problem 4: Bin15 [15m]

Let's get some more exercises with writing hash functions. Write the equals and hashCode() functions of Bin15.java. Your hash code implementation must be a perfect hash function. You may use Java's built-in Math operations, but it should be noted the staff solution didn't. For this problem, you may not use any of Java's built-in hash code functions.

It is recommended that you try a few examples in your main method to ensure your code works for some simple cases.

Problem 4B: Logins And Usernames [60m]

Congratulations! You have earned a prestigious unpaid software engineering internship at GoogMicrosofPalantir! This means you will be writing code, and we will be taking credit for it.

Your first task will be to implement a database of usernames and emails. Users will supply pairs of usernames and emails and should then later be able to retrieve the username that corresponds to a given email as well as the email that corresponds to a given username. The full task is described below. You should submit two files for this portion of the assignment: Username.java and UsernameBank.java.

Username.java

A username is just 2 to 3 alphanumeric characters (0-9, a-z, A-Z). Two usernames are defined to be equal if they have the same username string, case insensitive, i.e. "9bc" is equal to "9Bc". Your implementation of Username must include a perfect hash function. Exhaustive tests are highly recommended to check for a perfect hash. A skeleton file has been provided. Your implementation of Username should include these methods:

Please observe that we have maliciously placed the above bullet points in a tricky order. Be sure to check for wrong Usernames in a way that gracefully handles every case.

Hint: If you wish to store Usernames in a HashMap or HashSet, then make sure to override the default behavior for the equals and hashCode methods.

Optional challenge: Look up regular expressions (often referred to as regex) and use them to quickly check if a given string is a valid username.

UsernameBank.java

It is now time to implement the actual database of usernames and emails. Your underqualified boss heard somewhere that constant time is a good thing, so all of the functions below must run in constant time or else you'll be fired. You are encouraged to use clever abstraction and make helper methods to avoid writing duplicate code.

As mentioned above, you'll be implementing a database of usernames and emails. Users should be able to add pairs of usernames and emails. Users can retrieve an email if they have the corresponding username, and they can retrieve a username if they have the corresponding email. However, if the user tries to retrieve an email corresponding to a non-existent (i.e. not in the database) or invalid username (i.e. if the user supplies an invalid username or one that isn't in the databse while trying to retrieve an email), then the bad username should be recorded (to be sent to the police). Similarly, requests to get a username corresponding to an invalid or non-existent email should also be recorded (and sent to the police). Emails are case sensitive, whereas usernames are not. All emails must contain an '@' character. Your code will need to implement all of the following functionalities:

WARNING: Beware of bad wording! If you have trouble understanding the instructions, you should make a post on the corresponding pinned post if someone has not already asked your question.

(If the email or username are not valid or are already in the database, make sure not to add them to the database.)

UPDATE: For generateUsername, you do not need to handle bad username/email recording. The recording of bad usernames and emails will be much more rigorous in getEmail and getUsername.

Follow-up question: Supposing that all usernames had to be a minimum of two characters, what’s the maximum allowed length of logins that could still allow a perfect hash when implemented in Java? Write your answer in a method in UsernameBank.followUp() that returns your answer, an int.

(Hint: The answer is between 3 and 15.)

You will need to submit two files, Username.java and UsernameBank.java.

Problem 5: MyHashMap [80m]

Write a MyHashMap.java from scratch (no skeleton file. But you can tell your kids that you implemented a HashMap from scratch!) that is an implementation of Map61B<K, V>. You are required to implement all of the methods as declared in the Map61B interface file. Additionally, you should implement the following constructors:

public MyHashMap();
public MyHashMap(int initialSize);
public MyHashMap(int initialSize, float loadFactor);

It should be noted that Eclipse can easily auto-generate method headers based on an interface file. You can do this by creating a class and declaring that it implements Map61B<K,V>, and then clicking the red x compilation and choosing "Add unimplemented methods".

You should handle collisions by chaining. If you would like to touch up on what chaining is, here is the CS 61BL lesson on it. You may not import any libraries other than ArrayList, LinkedList, HashSet, and Set.

You must also write a HashMapTest.java JUnit test file (JUnit 4 strongly recommended over JUnit 3) that proves all of your implemented methods work. It just so happens that in the previous homeworks, tests that weren't graded were tests that less than 5% of the class made. In hopes that you can catch some mistakes on your own, you must write tests, and we will grade your tests.

(remember that any Object has a .hashCode() function that you can call to determine it's placement position in your structure)

EDIT: You are allowed to assume that the returned Set<K> from keySet() won't ever be used to remove.

In fact, it is HIGHLY recommended to write the test cases first as you review what each method does (use a static type of Map61B in your test code), then to write your MyHashMap implementation. This is called Black Box Testing and is used in industry and stuff. This time, the job is harder, so the kitten isn't going to work, and you have to actually write the tests.

You will need to submit MyHashMap.java and MyHashMapTest.java.

Problem 6: Hashing A Checkers61B Board [30m]

Remember proj0? Initially, we wanted to have you implement an AI that plays for you, but unfortunately we weren't able to squeeze it into the jam-packed curriculum. Instead, we'll just have you hash a Checkers61B board configuration, which would be useful in the development of an AI.

A minimally implemented version of Board.java and Piece.java have been provided for you. Write the .equals() and .hashCode() methods for both files. The autograder will run the same test as the submit test, which is just a check to ensure that random Board configurations have a hash collision chance of less than 10%.

Problem 7: Runtime of TreeMap vs. HashMap [30m]

So after implementing HashMap from scratch, and seeing a few cases where it can be used, you might be wondering what advantages it has over the BSTMap you implemented in HW6. The short answer to this is, they both have their ups and downs: TreeMap structures like BSTMap sacrifice a bit of speed to maintain an ordering among its keys, whereas HashMap structures completely disregard internal key ordering in hopes of faster operations.

It should be noted that Java's built-in TreeMap is actually a Red-Black tree, and thus is superior to your BSTMap because the worst case runtime for put, get, and remove are an attractive O(log n) instead of O(n).

We will now write some basic tests to compare the two runtimes of select operations, namely put, get, and remove. As promised earlier, we don't want you to write tedious code, so we have provided a skeleton file that encourages you to write flexible code (not hard-coded).

Testing the speed of .put

See the file MapRace.java. Finish implementing method timePuts61B that takes in a Map61B, puts num_puts random mappings, and returns a long representing in milliseconds how long the operation took. By "random mapping", we mean a key-value pairing such that:

(For our purposes, you can consider <= to also mean < sign; we don't care about the nitpicky edge cases)

Testing the speed of .get

Finish implementing method timeGet61B that takes in an already populated Map61B, attempts to get num_gets random mappings, and returns a long representing how long in milliseconds the operation took. By "random mapping", we mean a key-value pairing such that 0 <= key <= key_range. Notice that sometimes, the get should return null because the mapping doesn't exist; this is fine.

Testing the speed of .remove

Finish implementing method timeRemove61B that takes in an already populated Map61B, attempts to remove a random mapping num_removes times, and returns a long representing how long in milliseconds the operation took. Again, pick a key such that 0 <= key <= key_range. Sometimes, the remove will be unsuccessful; this is also fine.

Run these tests

You'll see that there are already calls to these methods in main. Run the program to have Java do all the grunt work and tell you the timing results.

Follow Up Question

Based on this data, answer the follow up question by hard-coding it into String output of the followUp method:

Based on what you know about the runtimes of TreeMap and HashMap, were your results consistent? If not, why might this be the case? Example answers:

You are encouraged to tweak the code to make your data better. However, such modifications need to be mentioned in your follow up. You may also use the built-in HashMap and TreeMap instead of your own implementation as long as you mention so in your follow up.