Programming challenge: Which words have the most anagrams?
While on the train to work this morning, I wondered which English words have the most anagrams that are also valid English words. So, I knocked up few lines of Mathematica code and came up with 4 sets of 7:
{{"ates", "east", "eats", "etas", "sate", "seat", "teas"}, {"pares", "parse", "pears", "rapes", "reaps", "spare", "spear"}, {"capers", "crapes", "pacers", "parsec", "recaps", "scrape", "spacer"}, {"carets", "caster", "caters", "crates", "reacts", "recast", "traces"}}
So, according to my program (and Mathematica’s dictionary), the best you can do is 7. I’m not going to post the code until later because I don’t want to influence the challenge which is ‘Write a program in your language of choice that queries a dictionary to find which English words have the most anagrams that are also valid English words.’
Here are a couple of ways of doing it in Python: https://gist.github.com/christianp/6610845.
Because my wordlist has loads of not-really-words in it, I get:
[‘apers’, ‘apres’, ‘asper’, ‘pares’, ‘parse’, ‘pears’, ‘prase’, ‘presa’, ‘rapes’, ‘reaps’, ‘spare’, ‘spear’]
You forgot {“least”, “slate”, “Stael”, “stale”, “steal”, “tales”, “teals”, “Tesla”}
Or names don’t count? If not we get your solution, but a 1 set of 8 words exist. :)
Code Bellow:
char = CharacterRange[“a”, “z”];
char = char ~ Join ~ ToUpperCase@char;
n = Length@char;
WordToNumber[str_String] := Position[char, #] & /@ StringSplit[str, “”] // Flatten // Sort // FromDigits[#, n] &
SetAttributes[WordToNumber, {Listable}];
words = ToLowerCase /@ DictionaryLookup[{“English”, All}] // DeleteDuplicates;
wordnumb = WordToNumber@words;
tally = Transpose@Tally@wordnumb;
count = Last@tally;
vals = First@tally;
pos = Extract[vals, Position[count, Max@count]];
wordPos = (Position[wordnumb, #] & /@ pos);
wordAnagram = Extract[words, #] & /@ wordPos;
wordAnagram /. str_String :> First@DictionaryLookup[str, IgnoreCase -> True]
* Bonus Function (Slow as hell for big words, 1st function try, LOL)
Anagram[str_String] := Select[StringJoin /@ Permutations@StringSplit[str, “”], Length@DictionaryLookup@# > 0 &]
@Thales Fernandes
Improved version for non-english words. In Portuguese (my language) we have an anagram of 15 words!
{“aporeis”, “aproeis”, “aprosei”, “arpoeis”, “esporai”, “operais”, “opiares”, “poeiras”, “Poiares”, “posarei”, “proseai”, “proseia”, “raposei”, “repoisa”, “sopeira”}
{“arrastei”, “arrestai”, “arretais”, “arriaste”, “aterrais”, “atirares”, “iteraras”, “rasteira”, “rastreai”, “rastreia”, “ratareis”, “restaria”, “retraias”, “tarareis”, “traseira”}
{“arroteis”, “estoirar”, “ostreira”, “rasteiro”, “rastreio”, “reitoras”, “retroais”, “rotareis”, “soterrai”, “tesoirar”, “torareis”, “torreais”, “torreias”, “traseiro”, “troareis”}
Code:
Clear[words, lang, char, n, WordToNumber, wordnumb, tally, count, vals, pos, wordPos, wordAnagram, Anagram]
lang = “Portuguese”;
Anagram[str_String] := Select[StringJoin /@ Permutations@StringSplit[str, “”], Length@DictionaryLookup@# > 0 &]
words = ToLowerCase /@ DictionaryLookup[{lang, All}] // DeleteDuplicates;
Print[“Number of words to test (“, lang, “): “, Length@words];
char = Union@StringSplit[StringJoin@words, “”];
n = Length@char;
Print[“Character used(“,n,”): “, StringJoin@char];
WordToNumber[str_String] := Position[char, #] & /@ StringSplit[str, “”] // Flatten // Sort // FromDigits[#, n] &
SetAttributes[WordToNumber, {Listable}];
wordnumb = WordToNumber@words;
tally = Transpose@Tally@wordnumb;
count = Last@tally;
vals = First@tally;
Print[“Word with maximum number of anagrams: “, Max@count];
pos = Extract[vals, Position[count, Max@count]];
wordPos = (Position[wordnumb, #] & /@ pos);
wordAnagram = Extract[words, #] & /@ wordPos;
wordAnagram /. str_String :> First@DictionaryLookup[{lang, str}, IgnoreCase -> True]
Mathematica, without much thinking:
SortBy[GatherBy[DictionaryLookup[“*”],
Sort@Characters@ToLowerCase[#] &], Length][[-10 ;;]]
It gives the 10 longest lists. Should be a good exercise for other languages, which I’ll try later.
OK, so its not quite a week but here’s mine. Much longer than @Szabolcs!
(*Given a word, this function finds all permutations that are also \
words*)
wordperms = Flatten[
Map[DictionaryLookup, Map[StringJoin, Permutations[Characters[#]]]]
] &;
words = DictionaryLookup[]; (*Get all words from the dictionary*)
sorted = Sort[
Map[Sort,
Map[Characters,
words]]]; (*Sort letters of each word into alphabetical order \
and then sort the resulting words*)
found = Part[Select[Tally@sorted, Part[#, 2] > 6 &], All,
1]; (*Select all lists that are repeated more than 6 times. More \
than 7 gives an empty list*)
Map[wordperms, found]
I missed {“least”, “slate”, “Stael”, “stale”, “steal”, “tales”, “teals”, “Tesla”} because I didn’t consider case.
words = {#, Sort[Characters[ToLowerCase[#]]]} & /@ DictionaryLookup[];
words = SortBy[GatherBy[words, Last], Length][[All, All, 1]];
words[[-20 ;;]] // Grid
Same idea as Szabolcs
So far there are five Mathematica solutions and a single Python one. I would like to see solutions in other languages, in particular R. People on Unix based platforms can use /usr/share/dict/words as the database.
organ has 9 anagrams: rogan, ronga, orang, nagor, groan, grano, goran, argon, angor
reset has 8 anagrams: reest, estre, ester, stree, stere, steer, terse, tsere
trace has 8 anagrams: react, recta, crate, creta, creat, cater, carte, caret
plate has 7 anagrams: pleat, palet, patel, pelta, petal, leapt, tepal
This is using Ruby with words of length 7 or smaller.
WORDS = File.readlines(‘/usr/share/dict/words’).select{|w| w.length < 7 }
puts "There are #{WORDS.count} WORDS in the dictionary."
WORDS_HASH = Hash[WORDS.map{|w| [w.strip,w.strip]}]
results = []
class Result
include Comparable
attr_reader :original_word, :valid_anagrams, :size
def initialize(original_word, valid_anagrams)
@original_word = original_word
@valid_anagrams = valid_anagrams
end
def other
self.size other.size
end
def size
valid_anagrams.count
end
end
n = 0
for word in WORDS
word.strip!
word.downcase!
word_chars = word.chars
word_chars_a = word_chars.to_a
permutations = word_chars_a.permutation(word.length)
permutations = permutations.map(&:join)
valid_anagrams = []
permutations.each do |perm|
valid_anagrams < 0
results <= 25
#break
end
permutations = nil
word = nil
valid_anagrams = nil
word_chars = nil
word_chars_a = nil
end
results.sort.reverse.take(30).each do |result|
puts “#{result.original_word} has #{result.size} anagrams: #{result.valid_anagrams.join(‘, ‘)}”
end
Looks like some characters and formatting are lost. There is a gist here: https://gist.github.com/seanhandley/48d4e42c90b533f6d966
Still needs some work with regard to efficiency!
Here’s a Haskell version. (I don’t consider case.)
import Control.Arrow
import Data.Function
import Data.List
import Data.Ord
import System.Environment
import qualified Data.Map as M
import qualified Data.Set as S
anagrams :: [String] -> [[String]]
anagrams = map (S.toList . snd) . M.toList
. M.fromListWith S.union
. map (sort &&& S.singleton)
rank :: [[a]] -> [[a]]
rank = sortBy (compare `on` (Down . length))
mostPopular :: [[a]] -> [[a]]
mostPopular = concat . take 1 . groupBy ((==) `on` length)
mostAnagrams :: [String] -> [[String]]
mostAnagrams = mostPopular . rank . anagrams
main :: IO ()
main = do
args readFile path >>= mapM_ print . mostAnagrams . words
_ -> error “Supply one argument giving the path to a dictionary.”
When run on MacOS I get:
$ ./anawords /usr/share/dict/words
[“caret”,”carte”,”cater”,”crate”,”creat”,”creta”,”react”,”recta”,”trace”]
[“angor”,”argon”,”goran”,”grano”,”groan”,”nagor”,”orang”,”organ”,”rogan”]
[“ester”,”estre”,”reest”,”reset”,”steer”,”stere”,”stree”,”terse”,”tsere”]
$
listIta = {Sort[Characters[#]], #} & /@ DictionaryLookup[{“Italian”, All}];
Reverse[Sort[Length /@ GatherBy[listIta, First]]][[1 ;; 20]]
Reverse[Sort[GatherBy[listIta, First]]][[1 ;; 3, All, 2]]
Result
{{“aperti”, “aprite”, “pareti”, “patire”, “patrie”, “perita”,
“pietra”, “rapite”, “ripeta”}, {“avresti”, “restavi”, “servita”,
“stivare”, “svitare”, “versati”, “viraste”, “vistare”}, {“parasti”,
“piastra”, “rapasti”, “raspati”, “satrapi”, “sparati”, “sparita”,
“spirata”}}
Here’s how I did it in MATLAB, which gave me the same 10 “apers”:
wordList = urlread(‘http://www-01.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt’);
words = regexp(wordList, ‘\s+’, ‘split’);
map = containers.Map(‘KeyType’, ‘char’, ‘ValueType’, ‘any’);
for i = 1:numel(words)
key = sort(lower(words{i}));
if isKey(map, key)
map(key) = [map(key) words(i)];
else
map(key) = words(i);
end
end
vals = values(map);
counts = cellfun(@numel, vals);
maxCount = max(counts);
vals{counts == maxCount}
A bash one-liner (the line got a bit longer than I intended) It prints out any sets of anagrams with 10 or more words.
bash-4.2$ c=0; while read l; do echo `echo $l | fold -1 | sort ` @ $l;done < /usr/share/dict/words |sort | while read x; do oldk=$k;k=${x%%@*}; if [ "$k" = "$oldk" ] ; then c=$(($c + 1)); w="$w ${x#*@}"; else if [ $c -gt 9 ]; then echo $c $w; fi; c=1; w=${x#*@}; fi; done
14 anestri antsier asterin eranist nastier ratines resiant restain retains retinas retsina stainer starnie stearin
11 alerts alters artels estral laster lastre rastle ratels relast resalt salter
12 astel least salet setal slate stale steal stela taels tales teals tesla
10 aster astre rates reast resat serta stare strae tares tarse
10 estrous oestrus ousters sestuor sourest souters stoures toruses trousse tussore
so /usr/share/dict/words claims 14 (somewhat dubious:-) anagrams for nastier
This problem reminds me of playing Boggle: http://en.wikipedia.org/wiki/Boggle. Where if you can spell A R T from a triangle of letters, then you can spell R A T and T A R. I am not sure how you would do it, but could you modify the programs to look for a long word that would give you other smaller words in Boggle. I know it would really depend on the layout of the Boggle dice, and you would have less anagrams than without the Boggle constraint.