Chromebook for Data Science

I think in terms of 3 tiers for towers/laptops for Data Science:

1) The very thin client (laptop running ChromeOS or Ubuntu Linux) to do most everything in the cloud, hopefully with some client-side capabilities

2) A modest laptop to be able to do some Digital Humanities/Data Science/Computer Science locally (Mac OS X or Ubuntu Linux)

3) A powerful laptop/PC tower that can do processor-intensive tasks, especially deep neural nets with CUDA/GPU acceleration (most likely a custom tuned LinuxOS)

As an initial foray into Digital Humanities, we decided to go with the least expensive option (1) above. I’ve designed most of the exercises for this fall’s Digital Humanities course to have most/all of the labs on the cloud accessible via a chrome web browser and/or terminal command line.

As the capabilities and underlying technologies of the chrome books constantly shift every 2-3yrs, it would be useful to see the practical capabilities of current chromebooks (2017) and how compatible they are with various key software packages for Digital Humanities.

Here is a list of software to install on the Chromebook from need-to-have (eg Chrome browser) to nice-to-have but not necessary (Docker/VirtualBox). I’ve done a prelim search and it seems all the must haves and nice to haves are available in one form or another (eg SSH via Chrome extension). Not all solutions are ideal but they seem to work.

** Latest Chrome browser + compatible extensions

** ChromeOS/*nix-like command line interface

* typical *nix utilities

* SSH client

http://www.techrepublic.com/article/pro-tip-how-to-use-secure-shell-from-your-chromebook/

Enhanced terminal client

** basic IDE like VIM with configs for Python and JavaScript

* github

* miniconda to full anaconda distribution (if mini we’ll need various specific python libraries installed)

* additional python libraries

(*) JavaScript dev environment, eg: node.js, npm, webpack, etc

* enhanced IDE like LightTable (free), JetBrains (free for students)

virtualenv (for lightweight JavaScript envs)

* more dev apps I could test out like various Python/JavaScript web frameworks, databases, data science apps, etc.

* more specialized apps like for data science and visualization clients

– Docker/VirtualBox

** Must have

* Should have

(*) Should have, not absolutely necessary for fall DH class

– Unlikely to work/perform well

Ideally, we would work on a Mac OS X or Linux variant machine because Chrome OS is a locked down of vanilla Linux. In addition, Chrome OS may diverge from *nix standards as Google feels confident in making Chrome OS more proprietary. Still, it offers an easier to manage solution compared to other Linux solutions at the cost of configuration flexibility.

From a management perspective, we’d like to have one or several standard recovery images we could quickly reinstall should any chromebook end up in an inconsistent state. I’m designing the course/labs so that as much as possible is backed up to the cloud including data and programs. Rather than spending x? hours debugging any problems, we’ll just blow everything away with a fresh OS install and/or batch script install everything else. Little if anything should be lost because most all student data and programs either stored or mirrored to the cloud.

Another option worth exploring dual-booting both x86 and ARM Chromebooks into either ChromeOS or an open LinuxOS. This is definitely a route we should explore if only to know our options when we run into restrictions ChromeOS and evaluate the cost of workarounds. We may be able to squeeze out noticeably more performance from the machines if we dropped the GUI and potentially embedded bloatware in ChromeOS.

Finally, I’d like to run a number of tests/evaluations to benchmark what the limits of these machines are under various conditions. The information we get would help us set guidelines as to what are practical limits of assignments/work we can do locally on these machines. For example, could we setup at least trivial “hello world” type intro tensorflow DNN running a dual boot stripped down CoreOS server? If so, how complex can our network get/how much data can we train with?

Maybe we have to scale back our expectations to run only simpler machine learning algorithms on these Chromebooks. That raises the question of which ones we could reasonably run, on what sized data sets and with what expected run times? I’m hoping we’ll be able to run reasonably complex ML algorithms on small datasets like one or a few dozen novels for a planned digital literary analysis class. We’ll need to run preliminary test/benchmarks to quantify these limits.

We could explore additional packages and client-side installs that could be useful outside our upcoming Digital Humanities course. We could write up some guidelines, develop some support procedures and write some scripts to automate recoveries/installs.

Here are some Chrome developer extensions that should run on Chromebook.

Chrome Applications/Extensions:

– Enable Developer Mode and follow these instructions to install (careful with Node.js)

* Anaconda/Python

* Chromebrew Package Manager

* git client

* Node.js/JavaScript (follow these updated instructions)

– Configure SpyderIDE (installed with Anaconda)

– Configure Jupyter Notebook

– Vim extension

– SSH extension

– Postman or Insomnia REST Client