CS61B Review Doc

Created by Yunhao Cao (Github@ToiletCommander) in Fall 2021 for UC Berkeley CS61B.

Reference Notice: Material highly and mostly derived from Prof. Hillfinger's lecture slides, some ideas were borrowed from CS61C Fall 2021 Lectures(Wawrzynek & Weaver) as well as wikipedia lol.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Basics

Language Features

Pointers

Destructive / non-destructive functions

Destructive - means if a parameter is passed in as pointer / reference, the place the parameter is pointed to would be changed

Non-destructive - means if the parameter is passed in as pointer / reference, the place the parameter is pointed to would not changed

Lambda Expressions

Should remind you of a special class Consumer<T>, has signature

class Consumer<T>{
	public void accept(T something);
}

To implement Consumer<T>, either use:

//Anonymous class expression
Consumer<String> printerLambda = new Consumer<String>(){
	public void accept(String something){
		System.out.println(something);
	}
}

//Single-line lambda expression
Consumer<String> printerLambda = (something) -> System.out.println(something);

//Multi-line lambda expression
Consumer<String> printerLambda = (something) -> {
	System.out.println(something);
	return;
}

Static Imports

Say I want to use System.out.println many times, but println is actually a static method, and out is a static class in System, so what do we do?

import static System.out;

And now we can use out.println as many times as we want.

Arrays

To declare an array of type T, use the following expression

T[] varName = new T[lengthOfArray];

Note that T can be any primitive / class types, including array as an acceptable type!

Some fun things to look at may include:

Accessing Array Elements

//this prints out ith element of varName, i ranging from 0 to varName.length
System.out.println(varName[i]);

Copying Array Contents

import java.lang.System;

System.arrayCopy(src,srcStartIndex,dest,destStartIndex,length);

Note that System.arrayCopy has the following signature

public static void arraycopy(Object src, int srcPos, Object dest, int destPos, int length)

Parameters

    src − This is the source array.

    srcPos − This is the starting position in the source array.

    dest − This is the destination array.

    destPos − This is the starting position in the destination data.

    length − This is the number of array elements to be copied.

Exception

    IndexOutOfBoundsException − if copying would cause access of data outside array bounds.

    ArrayStoreException − if an element in the src array could not be stored into the dest array because of a type mismatch.

    NullPointerException − if either src or dest is null.

Access modifiers

Determines if a class method(function) / member(variable) can be seen by itself / its sprouts / other classes

We have:

public - can be seen by everyone

protected - can be seen by itself and its sprouts

private - can be only seen by itself

Other modifiers

static - means there's only one instance per class

final - means the value cannot be changed after its construction

default - in an interface declaration you can provide a default implementation for abstract methods.

Some Commonly Used Interfaces / Types in 61B

Comparable (Java Platform SE 8 )

AbstractChronology, AbstractRegionPainter.PaintContext.CacheMode, AccessMode, AclEntryFlag, AclEntryPermission, AclEntryType, AddressingFeature.Responses, Authenticator.RequestorType, BigDecimal, BigInteger, Boolean, Byte, ByteBuffer, Calendar, CertPathValidatorException.BasicReason, Character, Character.UnicodeScript, CharBuffer, Charset, ChronoField, ChronoUnit, ClientInfoStatus, CollationKey, Collector.Characteristics, Component.BaselineResizeBehavior, CompositeName, CompoundName, CRLReason, CryptoPrimitive, Date, Date, DayOfWeek, Desktop.Action, Diagnostic.Kind, Dialog.ModalExclusionType, Dialog.ModalityType, DocumentationTool.Location, Double, DoubleBuffer, DropMode, Duration, ElementKind, ElementType, Enum, File, FileTime, FileVisitOption, FileVisitResult, Float, FloatBuffer,

https://docs.oracle.com/javase/8/docs/api/java/lang/Comparable.html

/**
 * The comparable interface defines a class that can be compared to other classes
 * compares to returns value that < 0 if this item is smaller, = 0 is this item is the
 * same as o, or > 0 is this item is bigger than o
 *
 * In normal settings like int, usually A.compareTo(B) returns A - B;
 */
interface Comparable<T>{
	int compareTo(T o);
}

Collection (Java Platform SE 8 )

size int size() Returns the number of elements in this collection. If this collection contains more than Integer.MAX_VALUE elements, returns Integer.MAX_VALUE. the number of elements in this collection isEmpty boolean isEmpty() Returns true if this collection contains no elements. true if this collection contains no elements iterator Returns an iterator over the elements in this collection.

https://docs.oracle.com/javase/8/docs/api/java/util/Collection.html

Typings

Integer Typings

Yeah, it sucks, but 61B actually teaches you something that 61C should be teaching you - type conventions and integer radices, bitwise operations etc.

But good news is that it's not as hard as 61C, so yeahhh?

Java Integer Primitive Types Taken From CS61B Fall 2021 Lecture 14 by Prof. Hillfinger

We will touch on how Java represents its integers in memory later on...

But for now, think about how many numbers can each type represent?

Say we have an integer type that has n bits to store, we know each bit position can take 0 / 1 as its value, so how many numbers(N) can this type represent?

$N = 2^n$ because we can take 2 different values for each bit position and we have n different positions!

Convert Numbers

We have number 159 in decimal, how do we convert this into binary?

$159 = 0*2^8 + 1*2^7 + 0*2^6 + 0*2^5 + 1 * 2^4 + 1*2^3 + 1*2^2 + 1*2^1 + 1*2^0 = 128 + 16 + 8 + 4 + 2 + 1$

Thus 159 in binary is 0b010011111, I left the leftmost 0 there to avoid confusion with negative numbers.

Then what is 159 in hexidecimal? I love to view it starting from binary perspective

159 = 0b1001.1111

You see that I removed the front 0 because each 4 bit corresponds to a hexidecimal character.

159 = 0x9F since 0b1001 = 9 in decimal, and 9 in dec. corresponds to 9 in hex, and 0b1111 = 15 in decimal, and 15 corresponds to F in hex.

Note: A lot of confusion when talking about bitwise memories is the notion of Bit and Byte

And also just a side note: Let's say our data is 0xABCDEFGHIJKLMN, let the alphabets represent bit positions, we say that the N position is the LSB(Least Significant Bit), and A position is MSB(Most Significant Bit)

Just for a little bit extension and fun, see how this idea of MSB and LSB extends to the unit of byte and how data is laid out in memory (a concept in 61C)

A little bit extension on how bytes are stored (material of 61C)

Overflow - how does java deal with bigger number than type bounds?

Ans: Using finite fields, overflowed numbers will always be "wrapped around".

Complex Two's - How java represents signed integers

So basically the MSB(bit) is used as a sign(0 for positive, 1 for negative).

And when the sign is positive, just interpret it as an unsigned value.

If the sign is negative, you can calculate the value of remaining bits as k, and suppose your integer type has n bit positions, the real value of your memory is $-(2^n-k)$

For example, if I have a 16-bit integer, a value of 0b1111111111111111(16 of 1)

we would interpret it as $k = 0b111111111111111 = 32767$ , and the value of it being $val = -(2^{16}-k) = -(32768-32767)=-1$

Easier two ways to convert complex two's negative values:

flip the bits first and add 1 to the final result.

minus the number by 1 first and then flip the bits.

The two methods both work!

Think about this representation, why not just use first bit as the positive/negative sign and the rest bit as magnitude? Well the "magnitude" representation also exists and you will probably learn it in CS61C. But think about our complex 2's representation - adding 1 to max positive(in binary, a zero bit followed by arbitrary 1 bits) will result in a new sign bit of 1, followed by arbitrary 0 bits and the result will be exactly the min negative representation in our complex two's - so the java overflow automatically happens inside circuit logic - how cool is that! And also think about addition...

Addition in Complex Two's, an overflow bit of 1 is generated but discarded.

Bit operations in Java

Operation	&	\|	^	!
Description	AND(Bitwise Mask)	OR(Bitwise Set)	XOR(Bitwise Flip)	NOT(Flip ALL)
A	00101100	00101100	00101100
B	10100111	10100111	10100111	10100111
A op B	00100100	10101111	10001011	01011000

Operation	<<	>>	>>>
Description	Left Shift	Arithmetic Right (Sign-extended right shift)	Logical Right (Zero-extended right shift)
A	10101101	10101101	10101100
B	3	3	3
A op B	01101000	11110101	00010101

Algorithms

General

Complexity

We will define cost first.

In engineering scenarios, there are lots of costs, including:

Operational cost(for programs, time to run, space requirements).

Development costs: How much engineering time? When delivered?

Maintenance costs: Upgrades, bug fixes.

Costs of failure: How robust? How safe?

Currently stuck in lecture16 Complexity

In 61B we use asymptotic bound to determine the time complexity of a program

What does this mean?

We will use notion $f \in O(g(x))$ where $g(x)$ is an coefficient-free function based on N, the size of input of function f.

We have $O(g)$ being the upper bound ( $f(x) \leq g(x)$ ), $\Theta(g)$ being the tight bound ( $f(x) = g(x)$ ) and $\Omega(g)$ being the lower bound( $f(x) \geq g(x)$ )

Amortized Time 平摊时间

Note that there's also the concept of amortized time: actual costs of a sequence of N operations are $c_0, c_1, ..., c_{N-1}$ , which may differ from each other by arbitrary amounts and where $c_i \in O(f(i))$ .

Now if there exist another sequence $a_0, a_1, ..., a_{N-1}$ where $a_i \in O(g(i))$ .

If $\forall k (\sum_{0 \leq i < k} a_i \geq \sum_{0 \leq i < k} c_i)$ then we will say the operations all run in $O(g(i))$ amortized time

Tree

Traversal Order

Depth-first traversal (dotted path) of a binary tree:

Pre-order (node visited at position red ●): F, B, A, D, C, E, G, I, H;

In-order (node visited at position green ●): A, B, C, D, E, F, G, H, I;

Post-order (node visited at position blue ●): A, C, E, D, B, H, I, G, F.

Source: https://en.wikipedia.org/wiki/Tree_traversal

Hashing

Characteristics

Takes in a binary stream and outputs a fixed-length interger

Must output exactly the same integer when the same binary stream is given

For it to be good:
- Very hard to find two binary streams that outputs the same fixed-length integer (avoid hash collision)

Hash Table

Array called "buckets" to store Linked List of items

Load Factor = $L = N / M$ where N is the number of items stored and M is the bucket size

Open Addressing

Linear probes: If there's a collision at $h(K)$ , try $h(K) + m, h(K) + 2m, ...$

Quadratic Probes: try $h(K) + 1 \cdot m, h(K) + 2^2 \cdot m, h(K) + 3^2 \cdot m$

Double hashing: try $h(K) + h'(K), h(K) + 2h'(K)$

Heapify:

Heap can be viewed as array representing a priority queue, which the 0th item of the array is discarded.

Heapify: we have an array already, and we can heapify from the middle index downward moving up an index for each operation

Merge Sort

divide data in 2 equal parts; recursively sort halves ⇒ merge results

Quicksort

Partition data into pieces, everything > a pivot value at the high end of the sequence to be sorted, and everything ≤ on the low end

Usually the pivot is chosen as the median between first, middle, and last element

When the array is small enough, use insertion sort

Insertion Sort

Start with empty sequence of outputs, add each item from input, inserting into output sequence at right point.

Inversion Sort

Shell's Sort

Radix Sort

Sort keys one character at a time

we have

LSD(Least Significant Digit) radix sort (Direction: ←)

MSD radix sort (Direction →)
- Must keep lists from each step seperate

The Trie 字典树

Multi-way decision tree with one decision per character of key

Skip List

heights are random

Red-Black Tree 红黑树

Properties:

Root is black

Every leaf node is NULL and is black

Every leaf node has same number of black ancestors
- From any point on graph to every leaf, we pass through same number of black nodes

Every internal node has two children

Every red node has two black children.

Guarentees $O(lg(n))$ searches

Dijkstra's Algorithm

Use priority queue to update distance to every node.

A* Search A* 寻路算法

A hueristic distance is needed, but for consistance the heuristic funciton $h(C)$ has to be:

Has to follow triangular inequality
1. $h(A) \leq h(B) + d(A,B)$

Not too high

having $h(A) = 0$ for every heuristic value makes it a dijkstra

We will search just like what dijkstra, but we will stop when we take target from queue.

Minimum Spanning Trees

Prim's Algorithm

Start from an arbitraty node, at each step we added the shortest edge connecting some node already in the tree to one that isn't yet.

So obtain a subgroup and another subgroup that contains vertices not selected in the first subgroup. Whatever edges that hasn't been selected yet that connects the first group with the second and has the smallest weight will be selected.

Kruskal's Algorithm,

Sort the edges by their weights, everytime check if the smallest edge would add a cycle to the graph. If it doesn't then it gets added to the graph.

Compression

Shannon-Fano Coding

Count frequencies of all characters in text to be compressed

Break grouped characters into groups of roughly equal frequency

Encode left group with leading 1, right group with leading 1

Repeat until all groups are of size 1

Huffman Coding 哈夫曼编码

Put each symbol in a node labeled with the symbol's relative frequency

Repeat the following until there is just one node:
- Combine the two nodes with smallest frequencies as childre of a new single node whose frequency is the sum of those of the two nodes being combined.
- Let the edge to the left child be labeled as "0" and to the right be labeled as "1"

LZW Coding (multiple symbols per codeword)

Encoding:

Start with a trivial mapping of codewords to single symbols

After outputting a codeword that matches the longest possible prefix X, of the remaining input, add a new codeword Y that maps to the substring X followed by the next input symbol.

Decoding:

Reconstruct the table as we process
- $S(X)$ ⇒ The symbol encoded by the codeword X
- $Y_k$ ⇒ character k of string Y
- For each codeword X, add $S(X)$ to our result
- Whenever we decoded two consecutive codewords, X1 and X2, add a new codeword that maps to $S(X_1) + S(X_2)_0$