performance - How Can I Speed Up This Anagram Algorithm -


i making mobile app find anagrams , partial matches. mobile important because there not whole lot of computational power, , efficiency key.

the algorithm takes number of letters, including repeats, , finds longest words made letters, using every letter once. interested in finding top results quickly, , not concerned bottoms (shorter ones) long n met. example:

stack => stack, tacks, acts, cask, cast, cats… 

i have done googling , have found few algorithms, , came 1 thought efficient, not efficient like.

i have lookup dictionary pre-made maps sorted key real words generate key.

"aelpp" => ["apple", "appel", "pepla"] 

i have further split each dictionary different ones based on length of key. keys 5 letters long in 1 dictionary, keys 6 in another. each of these dictionaries in array in index length of keys found in dictionary.

anagramarray[5] => dictionary5 dictionary5["aelpp"] => ["apple", "appel", "pepla"] 

my algorithm starts taking input word "lappe", , sorts it:

"lappe" => "aelpp" 

now, each dictionary has contents of @ 5 letters, comparison pull out. here pseudocode:

word = input.sort (i = word.length; > 0; i--)     dictionaryn = array[i]     (key in dictionaryn)         if word matches key             add returnarray         end     end     if returnarray count > n       break     end end  returnarray.sort longest word, alphabetize 

the dictionary has 170,000 words in it, searches taking 20 seconds 12 letter inputs. match method makes regex out of key:

"ackst" => /a.*c.*k.*s.*t.*/ 

such that, example, 4 letter key such acst (acts), match ackst (stack) because:

"ackst" matches /a.*c.*s.*t.*/ 

i have seen other apps same thing in less time, , wondering if approach garbage or needs tweaking.

how can maximum computational efficiency generating top n anagrams word, sorted max length?

if think of (and perhaps represent) dictionary tree of letters can avoid looking @ lots of nodes. if "stack" in dictionary, there path root leaf labelled a-c-k-s-t. if input word "attacks" sort aackstt. can write recursive routine follow links down root, consuming letters aackstt go. when reach a-c-k have stt left in string, can follow s reach ackst, can rule out following u reach a-c-k-u , descendants, v reach a-c-k-v , descendants, , on.

in fact, scheme, use 1 tree hold words of number of letters, should save doing multiple searches, 1 each target length.


Comments

Popular posts from this blog

c# - SharpSVN - How to get the previous revision? -

c++ - Is it possible to compile a VST on linux? -

url - Querystring manipulation of email Address in PHP -