The start of summer also can often mean it’s the start of working on new projects and teaching new skills to undergraduate students. On the one hand, you want to teach students the best practices for various tasks. On the other hand, using all of the most appropriate libraries can incur a pretty hefty cost in the amount of time it takes to get up and running. Recently, I was talking an undergrad through the basics of data analysis. The data analysis for this project shouldn’t be too complicated, but I felt like it would be irresponsible not introduce him to numpy, scipy, and pandas. However, we were having some issues getting all of these libraries to play nicely with said undergrad’s current Windows set-up. On the one hand, since he has been thinking for a while of switching his coding over to Linux, it was pretty tempting to say we should just hold off on data analysis until he could get a Linux partition set up. This is by far more likely to be the environment in which he’ll be coding in the future.
On the other hand…
Where do you draw the line, call your environment good enough, and move on to actual science? You could spend an entire summer just trying to get the ideal workflow set-up. On the flip side, a bad workflow can dramatically slow progress. How do you guess correctly where the optimum is?
In this case, I decided that installing a whole new operating system was probably overkill/I didn’t want to be responsible if it went awry. So the undergrad learned to make plots by running data analysis scripts on our high-performance computing cluster and SCPing them over to his local machine. Not an ideal work-flow to be sure, but somehow he managed. Of course, within a week he was running Linux anyway, and much happier for it*. I think this means I erred a little too far on the side of moving on to the science in this case, but I suspect that that’s largely a product of the undergrad in question, so I’m not sure how much to extrapolate. How does everyone else balance these competing factors?