Trie Data Structure
1. Trie Introduction
Tree data structure has many variants like Binary Search Tree, R- Tree, X-tree etc. Trie is one of them. It is similar to Binary Search Tree. Trie was first developed by Rene de la Briandais in 1959. Trie also called Prefix Tree is an ordered tree data structure that is used for indexing and searching the string sequence in a text. The difference between Trie and Binary Search Tree is that In Trie, the position of the node determines which key it is connected with, instead of using that node for storing keys. Here the keys are the strings.
One interesting fact is its pronunciation. Since, the main idea behind this data structure is retrieval of data. That is why it is named as Trie. Because of its origin, it is pronounced as tree. But to distinguish it from tree data structure, it is pronounced as try.
Trie consists of nodes and edges. Edges connect parent node to its child nodes. Each node contains an array of pointers, these pointers are 26 in number. One pointer for each alphabet. Empty string is associated with the root node and all the successors of a node have a common prefix string associated with the parent node. Generally, the values are associated with the leaf nodes. On the basis of prefix, strings are stored in top to bottom fashion. That is why, Trie is also known as Prefix Tree.
2. Binary Trie
2.1 In Binary Trie, only the leaf nodes contain keys.
In this section, we’ll discuss some properties of Binary trie.
- Binary trie follows order of keys strictly. For each node, all keys in its left subtree are smaller than keys in its right subtree. Hence, lexicographically ordered.
- The structure of the trie is dependent on its keys only.
- The traversal of trie is done in sorted order.
- If there are two keys, binary representation of one key is prefix of the other, then these two keys cannot coexist in trie. This condition is automatically satisfied when there are same number of bits of all keys. But if number of bits is not same for all the keys. Then, this condition may not be satisfied. To solve this problem, the concept of end bit was introduced.
2.2 Binary tries with end bits
- As we said earlier, the main objective behind this end bit concept is to store keys having binary representations as prefix of each other.
- The end bit stored in a node is used to represent the end of a stored key. In other words, an end bit containing in a node will be true if that node represents key.
- In addition to leaf nodes, internal nodes can also contains keys.
- The pre-order traversal provides keys in lexicographical order.
3. Multi way Trie
Multiway tries are also called R-ary tries.
- Multiway trie strictly follows the order of keys. That means, For each node, all the keys in left subtree are smaller than that of right subtree. Hence, giving lexicographical ordering.
- To reach up to a node say ‘N’, the sequence of the digits which will be followed would be the same sequence as prefix contained by all the keys of the subtree having ‘N’ as its root.
- If the radix is R=2^b that means number of bits per digit is ‘b’ and the maximum number of bits contained in a key is ‘B’. Then, worst case height containing ‘K’ keys will be B/b.
- The main disadvantage of this data structure is its additional space requirement.
4. Basic Operations in Trie
In this section, we’ll discuss different operations of trie.
Before moving towards the operations, Let’s first create the basic entities that is node and a class for using it.
NODE
public class Node
Trie class
public class Trie
Now, We are all set to discuss basic operations. First we’ll discuss the Insert operation.
Insertion : Here for adding an element, we need to search first. First it checks if no more nodes are there, it inserts the new node and then that new chain is followed. Here we are not considering empty strings.
public void insert(string a)
Search: In search operations, it scans through each part of the trie until no more parts or more specifically nodes remain.
Public bool search(string a)
Auto-complete : With a given prefix string, using this algorithm we can determine a list of available suffixes. This will help in completing chains using its child nodes.
{` public List< string> AutoComplete(string a) private void AutoComplete(Node node, string temp, List< string> suffixes) `}
Deletion : Find the node you want to delete. The main complication in this operation is to delete only the matching chain while preserving the suffix chain.
Pseudocode for recursive approach:
{` public void delete(string a) private bool delete(Node root, string a) `}
Now comes the iterative approach:
public void delete(string a)
5. Program in Java
In this section, we have given complete implementation of Trie
class Node
6. Analysis of Trie
To compare between characters stored in the trie and the characters of input string, at most 26 iterations needs to be done. Therefore, it will take O(n) time where n is the length of the input string.
Insertion will have the same asymptotic complexity.
Tries lie between the binary search trees and hash table.
Search: Search operation is faster in a trie as compared to binary search tree. But it is slower than that of hash table, where the time complexity is O(1) with an O(n) hash execution.
Trie Construction: For constructing trie, time complexity is O(Nn) where N is the number of elements and n is the length of largest element.
Both Insertion and deletion takes O(n) time.