Building an Expression Tree
I recently solved a problem in yet another online judge website (LintCode) which asked to build an expression tree from an infix expression. I found this problem to be quite challenging and not as straightforward as I originally anticipated. Hence, I'm dedicating this post to it.
To start lets define an expression tree. This is a binary tree in which every inner node represents an operator and every leaf represents an operand. The input for the problem is an infix expression of the form: "2" "*" "6" "-" "(" "23" "+" "7" ")" "/" "(" "1" "+" "2" ")". The quotation marks are deliberate, as each operand/operator is a separate string. The set of strings is received in a vector.
We are given the definition of the tree structure as followed:
If we look carefully at the definition of the tree we notice that we need to transform the infix notation into a prefix notation. The reason is that the tree itself is built from the prefix notation. The problem then comes down to turning the infix expression into a corresponding prefix expression. To do so we realize that in order to transform the infix expression to its equivalent prefix notation we need to scan the infix backwards (from end to start). This way we put the operands last and the operators first with special care to operator precedence.
To implement the design we'll leverage two stacks. The first will hold only operators: "+,-,*,/,(,)". The stack will hold them according to their precedence. More precisely, "+" or "-" can follow only ")" and other "+" and "-" already pushed to the stack. However, if the top of the stack contains "*" or "/" then we need to handle them first before pushing the "+" or "-" to the stack. Also, when we encounter "(" that means that we just finished processing an inner expression. In this case we have to first handle all the operators that have been pushed after the matching ")". When we encounter: ")","*","/" we needn't do anything special and we can just push them into the stack.
The second stack that we'll use is an expression stack. this stack will contain all of our (Tree Node) expressions as we build them. Since we are building a binary tree, we have to make sure that every operator (from the operator stack) that is being handled, will consume two expressions from the expression stack.
The run time of this algorithm is O(N), for N - the size of the input vector.