I recently discovered the Paper Machines add-on to Zotero, which allows you to perform visualizations and topic modeling analyses on papers in your Zotero collection. I just so happened to have the complete proceedings of both GECCO 2014 and ALife 2014 kicking around in my Zotero database, so I decided to try comparing them. As a quick background, GECCO, which focuses on Genetic and Evolutionary Computation, and ALife, which focuses on Artificial Life, are the two main computer science* conferences that we in the Devolab tend to go to. There is substantial overlap between these conferences (GECCO has an Artificial Life track, after all), but there are also some fundamental differences in approach and focus.
The word clouds above do a pretty good job of capturing these differences, I think. Evolutionary Computation tends to be focused on finding solutions to problems, as suggested by the prevalence of words like “solutions”, “objective”, and “search.” As a result, there is also a greater emphasis on developing algorithms, hence the presence of Greek letters. Artificial Life, on the other hand, tends to focus more on understanding how a system as a whole works. This is reflected in the popularity of words like “complexity”, “dynamics”, and “behavior.” Artificial Life, as a field, also tends to care a lot more about biology, which explains the frequency of words like “biological,” “natural,” and “species.”
Paper Machines also allows you to create phrase nets, wherein you create a query of the form “x [regular expression] y,” and a map of common x to y mappings using that regular expression is created. Here’s an example of searching for “x and y” in the GECCO proceedings:
As you can see, this reveals a variety of words commonly paired using “and.” These include a lot of commonly paired concepts, such as “time and space”, “exploration and exploitation”, and “theory and practice.” This sort of analysis also turns up frequently-cited pairs of authors, such as Lehman and Stanley.
You can also use phrase nets to get a very rough summary of some important findings across a set of papers. For instance, a lot of the the findings at ALife that I would be interested in involve something “being sufficient to evolve” something else, or “favoring the evolution of” something else, and so on. By stringing these commonly used phrases together into the search pattern, you can create a phrase net such as the following:
Obviously this is an imperfect summary. Clearly some of these words are part of longer phrases (predators -> sophisticated, for instance). Also, many of these phrases may have been from literature review sections rather than conclusions. Still, there are some interesting sounding connections here (I’m definitely curious about that eavesdropping -> competition connection!), and a number of phrases that might make good jumping off points. We could get much more precise results with a fuller set of text-mining tools, but that is a topic for a future post.
What do you think? Does this seem useful? Notice any interesting patterns that I didn’t comment on? Does this raise new questions about either of these bodies of text? Anything I should explore in a follow-up post (potentially using the r tm library)?