1402. Problem - Word CounterWord Counter
Count the number of words in file.
1. Requirement
Write a program to count the words in a given file. You can assume the file is very small.
2. Solution
Use Map to count the number for each word from the given words list.
public class WordCounter {
public SortedMap<String, Integer> process(List<String> words) {
if (words == null || words.size() == 0) {
return null;
}
SortedMap<String, Integer> map = new TreeMap<>();
for (String word : words) {
if (!map.containsKey(word)) {
map.put(word, 1);
} else {
map.put(word, map.get(word) + 1);
}
}
return map;
}
}
Test Class.
public class WordCounterTest {
@Test
public void testWordCounter() {
System.out.println("testWordCounter");
List<String> list = read("input1.txt");
WordCounter wordCounter = new WordCounter();
SortedMap<String, Integer> map = wordCounter.process(list);
assertEquals(3, map.size());
assertEquals(3, (int)map.get("hello"));
assertEquals(1, (int)map.get("howdy"));
assertEquals(2, (int)map.get("world"));
List<String> list2 = read("input2.txt");
SortedMap<String, Integer> map2 = wordCounter.process(list2);
assertEquals(6, map2.size());
assertEquals(1, (int)map2.get("apple"));
assertEquals(1, (int)map2.get("banana"));
assertEquals(7, (int)map2.get("honey"));
assertEquals(1, (int)map2.get("mango"));
assertEquals(1, (int)map2.get("orange"));
assertEquals(6, (int)map2.get("world"));
}
private List<String> read(String filename) {
List<String> list = new ArrayList<>();
ClassLoader classLoader = WordCounterExample.class.getClassLoader();
Path path = Paths.get("files", filename);
try (InputStream inputStream = classLoader.getResourceAsStream(path.toString())) {
System.setIn(inputStream);
Scanner sc = new Scanner(System.in);
while (sc.hasNext()) {
list.add(sc.next());
}
sc.close();
} catch (IOException e) {
e.printStackTrace();
}
return list;
}
}
Input 1.
hello world
world hello
hello
howdy
Output 1.
hello 3
howdy 1
world 2
Input 2.
apple banana orange mango
world world world world world world
honey honey honey honey honey honey honey
Output 2.
apple 1
banana 1
honey 7
mango 1
orange 1
world 6
3. More Requirements
- 1) What if the given file is very large? Read file by chunk, NIO maybe?
- 2) What if most of the words are unique? Use Trie to reduce the memory storage.
- 3) What if you are not given a file but a stream?
- 4) If you are given many servers, how will you use them to improve the performance? Map Reduce.
- 5) What if we just need to return the first 10 popular words?
- 6) How to return the top 10 words in the last one hour? Use hashmap. Create one hashmap for every minute.