Making informed decisions on the application layer

During the playtest session for our ACM puzzlehunt, I needed a tool to find an undisclosed list of words from the English dictionary in a word search. All of the existing tools required a list of search terms as input, and dumping exhaustive lists of words from the English dictionary usually crashed these applications. Frustrated by the absence of such a basic tool, I decided to build one and investigate the effect of application layer design decisions on network throughput.

Overview

The application consisted of two main tasks:

Implementation

As modern frontend web frameworks like AngularJS and BackboneJS take on more and more of the responsibilities traditionally reserved for the server, being able to anticipate the computational limitations of the client grows more and more important.

With respect to the word search tool, I wasn’t sure how to handle the corpus of English words. Should I serve the 1MB corpus of English words upfront and defer the overhead of constructing the trie to the client? Or should I construct the trie on the server and expose an API endpoint to answer membership and prefix membership queries from the client?

Results

In Approach 1, I served the 1MB corpus of English words upfront and constructed the trie on the client.

In Approach 2, I constructed the trie on the server and exposed an API endpoint.

For each approach, I measured the time required to solve the word search from the puzzlehunt either with the backend served from localhost or with the backend served from a remote Digital Ocean instance.

  local remote
1 0.31 ± 0.03 0.32 ± 0.02
2 9.44 ± 0.27 26.06 ± 1.54

Conclusion

These results make sense.

Users with a reliable Internet connection should not have any issues retrieving the 1MB corpus of English words. With corpus in hand, the client should be able to construct the trie without much trouble. This trie requires memory on the order of size of the corpus, much less than the 100MB at which thrashing becomes a concern on machines with less than 512 MB of RAM.

This approach does not scale with the size of corpus, but the corpus would need to exceed 10MB for the download time to be significant.