elements of programming interviews
Interviewing Strats
- Clarify the question: if the question seems unnecessarily hard, then clarify it
- work on concrete examples
- spell out the brute force solution
- think out loud
- apply patterns
- clarify assumptions that inputs are valid
- test for corner cases
- write unit tests if there is time
- add code checking for the result
- syntax - lots of bad syntax shows you have no coding experience
- memory management: avoid memory management opportunities all together
- keep the solutions short
Bit Primitives
- integers in Python3 are unbounded, can use max of memory size
sys.maxsize
can be used to find word-size- maximum value integer that can be stored in the word
- python does not have unsigned shifts, since integers have infinite precision
- how to use bitwise operators?
- how to use masks?
- fast ways to clear the lowermost set bit
- understand signedness and implications to shifting
- consider using cache for brute-forcing small inputs
- commutativity and associativity can be used to perform operations in parallel and reorder ops
- key methods for numeric types
abs
,math.ciel
,math.floor
,min
,max
,pow
,math.sqrt
- floats are not infinite precision, and easy to refer to infinity as
float('inf')
andfloat('-inf')
- key methods for random:
random.randrange
,random.randint
,random.random()
,random.shuffle(A)
andrandom.choice(A)
- parity - parity of a binary word is 1 if the word is odd, otherwise 0
- parity checks are used to detect single bit errors in data storage and communication
Arrays
bisect.bisect
andbisect.bisect_left
andbisect.bisect_right
gives you ranges for sorted lists that tell you where to put new elements so the list remains sortedbisect_right
andbisect_left
are one off for cardinatity 1 lists
any([])
-> False
Strings
- strings in python are immutable
- concatenating a single character n times to a string in a for loop has O(n^2) complexity
- strings often have brute-force solutions that use O(n) but can be reduced to O(1)
Linked Lists
- often conceptually simple, and more about cleanly coding what’s specified
- consider using a dummy head to avoid checking for empty lists
- easy to forget update next/previous for head and tail
- algorithms that operate on singly linked lists often benefit from using two iterators
- example: hierarchical navigable small words, skip lists
Stacks and Queues
- Stacks
- push/pop on a LL implemented stack has O(1) ops
- push/pop on an array implemented stack has O(1) ops, but maximum size, unless there is dynamic resizing, where the amortized is O(1)
- last-in/first-out semantics makes it good for making reverse iterators
- parsing typically benefits from a stack
- consider augmenting the basic stack or queue data structure to support additional ops
- Queue
- queue’s with LL’s have O(1) ops for enqueue and dequeue
- often has a “peek” operation
- push/inject for front/back ops in a deque
- note that you can use
collections.deque()
- which has
popleft
andpopright
functions
- which has
- note that you can use
- queues are ideal when order needs to be preserved
Binary Trees
- full binary tree is when every node other than the leaves have two children
- perfect binary tree is a full binary tree where all the leaves are at the same depth, and every parent has two children
- 2^(h+1)-1 nodes -> 2^(h+1) are leaves
- complete binary tree is a binary tree where every level is filled and all leaves are to the left as possible
- n nodes has height floor(log(n))
- recursive algorithms are well suited to problems on binary trees
- you might be able to reuse existing tree nodes to reduce space complexity to o(1)
- consider left and right skewed trees when doing complexity analysis, O(h) complexity, where h is the tree height, translates into O(log n) complexity for balanced trees, and O(n) complexity for skewed trees
- “parent” field makes code simpler
- easy to make the mistake of treating a node that has a single child as a leaf
Heaps
- Specialized binary tree
- complete binary tree that satisfies the heap property - the key at each node is as great as the keys stored in its children
- max-heap supports O(log n) insertions, O(1) time lookup, and O(log n) deletion of the max element. The extract max op is defined to delete and return max element
- sometimes referred to as a priority queue, each element has a “priority”, and deletion removes the element with the highest priority
- min-heap is a completely symmetric version of the data structure and supports an O(1) looked up for the minimum element
- you can use a heap to keep track the k lognest string in a streaming string system, each insert is log n time, and if you have a min-heap, you can choose when to drop
- use a heap when you care about the largest or smallest elements, and do not have to support fast lookup, delete, or search operations
- a heap is a good choice when you need to compute K largest or K smallest elements, former should use a min-heap, latter should use a max heap
- python has the
heapq
module to implement a heap, but it specifically only has a min-heap. Insert the negative for integers or floats, or for objects, implement__lt()__
appropriately
Searching
- Search algorithms can be classifed in a number of ways. Is the underlying collection static or dynamic, and are inserts/deletes interleaved with searching?
- can you preprocess the data?
- are there special statistical properties of the data that can be exploited?
- should we operate directly on the data or transform it?
binary search
- O(log n) search time, but requires a sort that takes at least O(n log n) time
- how do you implement a binary search?
def bsearch(t, A): L, U = 0, len(a) - 1 while L <= U: M = L + (U-L)/2 if A[M] < t: L = M + 1 elif A[M] == t: return M else: U = M - 1 return -1
- binary search is applicable than just sorted arrays, you can use it to search an interval of real numbers or integers
- if we use sorting, and the computation performed after sorting is faster than sorting, look for solutions that do not perform a complete sort
general searching
- objects have comparable
__lt()__
and__gt()__
for user defined types - langauge already knows how to compare built-in types
Hash tables
- Updating keys in hash tables -> don’t do it
- delete the key, then update it, then move it back
- keys should generally be not mutable
- we want to aim for rolling hash functions
- hash functions should examine all the characters in a string, and not let one character dominate
- we also want to implement a rolling hash function, where if one character is removed from the start and added to the end, it should not effect the final result
- Hash tables is a good data structure to represent a dictionary
- sometimes a trie gets used to store a dynamic set of strings, and has computational advantages when many letters are repeated
- anagram example: to find anagrams, you can preprocess each trail and sort, and append a list on the other side
- designing a hash table class
- you can implement
__hash__
, andhash
is a builtin
- you can implement
- hash tables have the best theoretical and real world performance for lookup, insert, and delete.
- consider using a hash table as a signature to enhance performance
- consider using a precomputed
lookup table
- when defining the own type that will be put in the hash table, make sure to know the relationship between logical equality and the fields the hash function inspects
- anytime equality is implemented, imperative that the hash function is also implemented
- sometimes a
multimap
can be used, which contains multiple values for single key - hash table types
set, dict, collections.defaultdict, collections.Counter
set.remove
raises an error if the value is not in the listset.discard
does not alert if the value is not in the list
- note that not every type is hashable, mutable containers are notably not hashable
Sorting
- sorting is a common problem
- naive sorting algorithms run in O(n^2) time, some run in O(nlog) time
- sorting table:
sorting in place (O(1) space)? stable (duplicate items remain in the same order)? heapsort yes no merge sort no yes quicksort yes yes - well implemented quicksort is usually the best choice
- for shorter arrays (<10), insertion sort is easier to code and faster
- if every element is known to be at most k places away from its final location, a min-heap can be used to get O(n log k) algorithm
- if there are a small amount of distinct keys, counting sort works well
- if there are many duplicate keys, we add the keys to a BST, with linked list for elements which have the same key; the sorted result can be derived from an in-order traversal of the BST
- most sorting algorithms are not stable, mergesort, carefully implemented, can be made stable.
- most sorting routines are based on a compare function. It’s also possible to use numerical attributes directly with radix sort
sorted()
returns a copy using__lt__
list.sort
sorts in place- time complexity of any reasonable library is O(n log n), most sorting functionalities are based on quicksort, which has O(1) space
- sorting problems usually come in two flavors:
- using sorting to make the subsequent steps in an algorithm simpler
- use a library sort function, possibly with custom comparator
- desinging a custom subroutine
- use a data structure like a BST, heap, or array indexed by values
- using sorting to make the subsequent steps in an algorithm simpler
- certain problems become easier to understand, as well as solve, when the input is sorted
- best when inputs have natural order, or preprocessing step to speed up searching
- for specialized input (very small range of values or small number of values), it’s possible to sort in O(n) time
- sorting can be implemented with less space than required by a brute force approach
- sometimes it is not obvious what to sort on
sort
-> inplace for list- implements stable in place sort for list objects, returning None.
- takes two arguments, both optional:
- key - assumed to be a function which takes list elements and maps to objects which are comparable
- reverse - whether to sort in descending order, by default
sort
does it in ascending order
- to sort an iterable, use
sorted()
- also take identical key and reverse objects
Binary Search Trees
- BSTs are workhorse data structures and can be used to solve almost every data structures problem reasonably efficiently, and offer the ability to efficiently search for a key as well as find min and max elements
- Just a binary tree where the ndoes store keys that are comparable such as integers or strings
- Key lookup, insertion, and deletion take time proportional to the height of the tree, which can be worst-case O(n)
- some implementations of insert and delete which guarentee that the tree heigh is O(log n)
- Common mistake with BSTs -> object that’s present in the BST is not updated
- avoid putting mutable objects in the BST! If you must mutate, remove, update, then add it back
- BSTs give you the ability to find min and max elements, and find the next largest/next smallest element
- implement some kind of ordering alongside this, finding the min, the max, next largest/smallest, and lookup, delete, and find take O(log n) time
- both BSTs and hash tables use O(n) space
- BST implementation
class BSTNode: def __init__(self, data=None, left=None, right=None): self.data, self.left, self.right = data, left, right def search_bst(tree, key): return (tree if not tree or tree.data == key else search_bst(tree.left, key) if k < tree.data else search_bst(tree.right, key))
-Since the program that descends the tree with each step, and spends O(1) time per level, time complexity is O(h), where h is the height of the tree
- We can iterate through the elements in sorted order in time O(n) regardless of balance
- Some problems need a combination of a BST and a hashtable. If you need to order something, you can use the dict to contain additional, mutable information, and the dict can contain direct elements in the tree
- Sometimes you need to augment a BST to make it possible to manipulate more complicated data such as intervals and give more complex queries, such as the number of elements in a range
- BST property is a global property - a binary search tree may have the property that each node’s key is greater than the key at its left child and smaller than the key at its right child, but it may not be a BST
- NO BST LIBRARY IN PYTHON
sortedcontainers
orbintrees
can be used
Recursion
- Recursion is an approach where the solution depends partially on solutions to smaller instances of related problems
- Often appropriate when the input is expressed using recursive rules, such as computer grammar
- Searching, enumeration, divide and conquer, decomposing a complex problem into a set of similar smaller instance are all scenarios where recursion may be suitable
- A recursive function consists of base cases and calls to the same function with different arguments
- Two main ingredients for successful recursion:
- identifying the base cases
- ensuring progress
- A divide and conquer algorithm works by repeatedly decomposing a problem into two or more smaller independent subproblems of the same kind, until it gets to instances that are simple enough to be solved directly
- solutions to the subproblems are then combined to give a solution to the original problem
- divide and conquer is not synonmyous with recursion
- the problem is divided into two or more independent smaller problems that are of the same type as the original problem, recursion is more general, there may be a single subproblem, but the subproblems may not be independent (dynamic programming), and they may not be as the same type as the original (regex matching).
- In addition, sometimes to improve runtime and reduce space complexity, you can implement divide and conquer using iteration
- use recursion as an alterative to deeply nested iteration loops
- removing recursion can be done by mimicking call stack with the stack data structure
- recursion can be easily removed from tail recursive programs by using the while loop, no stack is needed
- if recursive functions may end up being called with the same arguments more than once, cache the results (aka dp)
DP
- General technique for solving optimization, search, and counting problems that can be decomposed
- consider using DP when you have to make choices to arrive at the solution, especially when it involves subproblems
- key difference between divide-and-conquer and dp is that DP shows the same subproblem may reoccur
Greedy Algorithms and Invariants
- greedy algorithms computes a solution in steps, at each step the algorithm makes a decision that is locally optimum, and it never changes the decision
- does not necessarily produce the optimal solution. Consider making change for 48 pence in old British currency where the coins came in at 30, 24, 12, 6, 3, and 1