performance - How Can I Speed Up This Anagram Algorithm -
i making mobile app find anagrams , partial matches. mobile important because there not whole lot of computational power, , efficiency key.
the algorithm takes number of letters, including repeats, , finds longest words made letters, using every letter once. interested in finding top results quickly, , not concerned bottoms (shorter ones) long n met. example:
stack => stack, tacks, acts, cask, cast, cats…
i have done googling , have found few algorithms, , came 1 thought efficient, not efficient like.
i have lookup dictionary pre-made maps sorted key real words generate key.
"aelpp" => ["apple", "appel", "pepla"]
i have further split each dictionary different ones based on length of key. keys 5 letters long in 1 dictionary, keys 6 in another. each of these dictionaries in array in index length of keys found in dictionary.
anagramarray[5] => dictionary5 dictionary5["aelpp"] => ["apple", "appel", "pepla"]
my algorithm starts taking input word "lappe
", , sorts it:
"lappe" => "aelpp"
now, each dictionary has contents of @ 5 letters, comparison pull out. here pseudocode:
word = input.sort (i = word.length; > 0; i--) dictionaryn = array[i] (key in dictionaryn) if word matches key add returnarray end end if returnarray count > n break end end returnarray.sort longest word, alphabetize
the dictionary has 170,000 words in it, searches taking 20 seconds 12 letter inputs. match
method makes regex out of key:
"ackst" => /a.*c.*k.*s.*t.*/
such that, example, 4 letter key such acst
(acts), match ackst
(stack) because:
"ackst" matches /a.*c.*s.*t.*/
i have seen other apps same thing in less time, , wondering if approach garbage or needs tweaking.
how can maximum computational efficiency generating top n anagrams word, sorted max length?
if think of (and perhaps represent) dictionary tree of letters can avoid looking @ lots of nodes. if "stack" in dictionary, there path root leaf labelled a-c-k-s-t. if input word "attacks" sort aackstt. can write recursive routine follow links down root, consuming letters aackstt go. when reach a-c-k have stt left in string, can follow s reach ackst, can rule out following u reach a-c-k-u , descendants, v reach a-c-k-v , descendants, , on.
in fact, scheme, use 1 tree hold words of number of letters, should save doing multiple searches, 1 each target length.
Comments
Post a Comment