Search

PolyCogBlog

Technology, Cognition and The Digital Humanities

3 Stages of HackOH5

Although the iterative cycle of Data Science is usually more involved, for our purposes the HackOH5 Hackathon will roughly breakdown into three (not-necessarily sequential) steps that reflect our three teams:  Data Wranglers, Data Analysts and Data Visualizers.

Here are a few resources to give you a first look at each stage and a tangible idea of what tools we’ll be using on Saturday.

  1. Data Wranglers (Python+Libraries/Jupter Notebook)
    1. Introduction to Jupyter Notebook
  2. Data Analysts (Python+Libraries/Jupyter Notebook and RapidMiner)
    1. Introduction to RapidMiner
    2. Federal Reserve Text Mining (Sentiment, Clustering)
  3. Data Visualizers (Tableau Public)
    1. Tableau Public Overview

 

Don’t worry if you don’t absorb all the lingo in your first viewing, there will be numerous experts at the Hackathon to explain any questions you have and walk you through each step.  The main purpose of this first Hackathon is to expose you to the overall process and tools as well as the critical analysis and general flow of Data Science.

 

Simple-Storm-Reports-Ex1-1.gif

 

Essential Python for Data Science

 

Dmitry Zinoviev does a decent job in isolating and explaining the core of Python used most frequently for Data Science in his book Data Science Essentials in Python published by Pragmatic Bookshelf, 2016.  In it, he focuses on several areas of Python that Data Scientists need to become especially fluent in:

  • String functions
    • Case:  lower, upper, capitalize
    • Predicates(T/F): isupper, islower, isspace isdigit, isalpha
    • Encoding: b”<bin array>” vs “<string>” decode to string, encode to bin array
    • String Cleaning: lstrip, rstrip, strip
    • String Munging:  split(“x”), ” “.join(ls)
    • String Counting:  find(“.com”), count(“.”)
  • Data structures
    • Lists:  array not for large data O(n)
    • Tuples:  immutable lists O(n)
    • Sets:  unordered/unindexed O(log(N)) fast, for membership
    • Dictionaries:  map keys (hashable obj num,bool,str,tup) ->values O(log(N))
    • Dictionaries from List generators:  dict(enumerate(seq)), zip(kseq,vseq), range
  • List comprehensions
    • Transform a collection into a List
    • Faster and cleaners than loops
    • Nested for performance [line for line in [l.strip() for l in infile] if line]
    • List Generator with (): (x**2 for x in myList) # Eval to <generator obj <genexpr>…>
    • Counter class to find most/least common in resulting list
  • Counters
    • Dictionary-style collection for counting items in another collection
    • from collections import Counter
    • cntr = Counter(phrase.split())
    • cntr.most_common(n),
    • cf:  pandas:  uniqueness, counting, membership
  • File
    • f=open(name, mode=”<r|w|a or rb|wb|ab>”); <read the file>; f.close()
    • with open(name, mode=”<r|w|a>”) as f: <read the file> (auto closed)
    • f.read(<n>), f.readline(s)(<n>), \n not removed, unsafe unless file reasonably small
    • f.write(line), f.writeline(s)([“list”,”of”,”strings”]), \n not added
  • Web
    • urllib.request into cache directory
    • like readonly file handle:  read, readline, and readlines
    • Higher failure:  wrap in try:/except:/finally: exception handling
    • urllib.parse.urlparse(URL) for decomposing URL
    • urllib.parse.urlunparse(parts) for building URL
  • Regular expressions
    • compiledPattern = re.compile(pattern, flags=0) – flags at compile or execution
    • Most common flags:  re.I(gnore case), re.M(ultiline) works with ^start/end$
    • Raw strings do not interpret \ as escape characters (r”\n” == “\\n”)
    • Two forms:  re.function(rawPattern, …) or compliedPattern.function(…)
    • split(pattern, string, maxsplit=(), flags=0) returns List of substrings
    • match(pattern, string, flags=0) returns match obj/None if beg of str matches
    • mo = re.match(r”\d+”, “067 string”) mo.group(), mo.start(), mo.end()
    • search(pattern, string, flags=0)
    • re.search(r”[a-z]+”, “001 Has at least one 010 letter”, re.I)
    • findall(pattern, string, flags=0)
    • re.findall(r”[a-z]+”, “0010 Has at least one 010 letter”, re.I)
    • sub(pattern, repl, string, flags=0) replaces non-overlapping parts of string with repl optionally restrict the number of replacements with optional flag “count=”
  • Globbing
    • Match specific file names and wildcards *(0<=chars), ?(1 char)
    • import glob; glob.glob(“*.txt”)
  • Data pickling
    • Can store more than one object, read out sequentially
    • Can store intermediate results, faster
    • with open(“myData.pickle”, “wb”) as oFile:  pickle.dump(object, oFile)
    • with open(“myData.pickle”,”rb”) as iFile:  object = pickle.load(iFile)

 

This is one of the most concise summaries of Data Science-specific commands in Python.  The book does not go into depth, but I highly recommend it for a quick and simple overview of various aspects of Python core to Data Science (tabular data, database, network data, visualization, etc).

datasci-ess-in-py.jpg

Command Line Data Science

With the proliferation of Data Science tools from programming languages like Python (general), R (statistical) and Scala (big data) with proliferation of specialized libraries to advanced drag and drop solutions like RapidMiner and Orange and similar advanced solutions appearing in the cloud like AzuerML, it’s easy to forget the lowly Data Science tools available at the command line (mostly Unix-based).

Unix command line tools are particularly suited for the early phases of Data Science and there are both low profile books and courses on the subject that try to bring the bearded 70’s hippie Unix guru into the age of Deep Neural Nets.  The evolution of IPython into Jupyter which integrates not only multiple popular Data Science languages like Python, R and Julia with graphical libraries like matplotlib, ggplot and D3.js into a sharable and replayable notebook has replaced much of command line Data Science.  Even the command line including shell scripts can be integrated into Jupyter notebooks.

Still, there is nothing as fast and effortless as the command line for those who are familiar with the ubiquitous environment.   Early and agile data munging and exploration, automated big data processing and working on GUI-less servers in the cloud all seem idea use cases for command line Data Science over Jupyter notebooks.

Doing Data Science on the command line has several inherent advantages over GUI interfaces like Jupyter:

  1. Nearly every Data Science platform has an underlying Unix-like command line shell
  2. There are a plethora of well used and reliable Unix commands suitable for data exploration
  3. These Unix commands are generally simpler but can be easily combined into more complex with Unix pipeline
  4. Related to the last step, many of these commands work on streams rather that entire files so they can easily work on very large files without most typical memory constraints.

 

Jeroen Janssens’ Data Science at the Command Line by O’Reilly gives one of the more interesting illustrations and review of this topic.  The author also generously complied a Vagrant configuration/Virtual Box Ubuntu 14.04 instance that contains dozens of Unix commands, utilities, programs and shell scripts useful for doing Data Science at the Command Line.  You can see Janssens present his perspective, argument and an interesting demonstration for doing Data Science on the command line on YouTube.

I also recommend the excellent reference Unix Power Tools, 3rd Edition from O’Reilly.  Of the tools below sed and awk deserve their own book to explain how to harness their powerful capabilities (Sed & Awk, 2nd Edition by O’Reilly).  A more in-depth reference on Unix shell scripting like Mastering Unix Shell Scripting by Randal Michael, Wiley helps to explain more complex automated workflow from a sysadmin perspective.

Here is a summary of the most useful Unix commands for Data Science per Jeroen Janssens’ book:

Big 3:  

grep

awk

sed

Retrieving Information from the Internet:

curl, curlicue, wget, httpie, scrape

Displaying Text Files:

cat, more, head, tail, less(big files), body, header,

Manipulating Text Files:

sort, tr, uniq,, cols, cut, fieldsplit, paste, split, wc

Piping

tee, tr

Data Generation:

echo, seq, sample, shuf

Date Manipulations:

dseq (gen seq date rel today)

File Conversion:

xml2json, json2csv

CVS Manipulations: (csvkit)

csvcut, cvsgrep, cvsjoin, csvlook, cvssort, cvssql, cvsstack, csvstat, in2cvs, sql2csv

JSON Manipulations:

jq

File Compression:

unpack, unrar, unzip

OS-level Commands:

drake (workflow), env, alias, parallel, type, tree, which, find

Python:

run_experiment (scikit-learn)

R Language:

Rio (CSV to data.frame to PNG/CSV), Rio-scatter

Images:

display (imagemagick), feedgnuplot

Machine Learning

tapkee (red dim), weka

unix-sysadmin.jpg

Parsing XML and the HackOH5 Dataset

There are a variety of ways to parse structured/marked up text like HTML and XML files.  A few of the most common alternatives using Python are:

  1. Python built-in RegEx module (re)
  2. Python built in xml.etree.ElementTree API
  3. BeautifulSoup4 (using the lxml as the plug-in parser)
  4. lxml direct
  5. Other parsing libraries (untangle, xmltodict, html5lib, HTMLParser, htmlfill, Genshi)

The first three are popular choices based upon three different scenarios:

  1. Python RegEx:  Non-nested/simpler searches in relatively well-formed marked-up text:  RegEx (which may be the fastest when compiled and cached as object)
  2. Python ElementTree XML:  Although part of the standard Python distribution, this solution excels at neither speed nor leniency and is insecure in the face of malicious data, we we will not consider it here.
  3. BeautifulSoup:  This is a popular and fairly high-level/friendly to use parsing framework that has different performance characteristics depending upon what parse engine is plugged used (great summary in the book Data Science Essentials in Python)
    1. html.parser (default):  fast and inflexible for relatively simple HTML
    2. lxml:  very fast and flexible
    3. xml:  for XML only
    4. html5lib:  very slow and very lenient for complicated HTML or where speed is not an issue
  4. Parse with lmxl directly:  Less friendly but very complete parsing of complicated marked-up text:  (extremely expressive and fast due to underlying c parsing libraries)
  5. Other Solutions are either slower and/or less capable of handling mis-formed markup text but well-maintained alternatives find some usage for xml parsing in simpler cases with simpler syntax.

 

I’ve been told our dataset will be provided in at least two formats:  (a) an apparently older ABBYY OCR set of scans stored as one scanned page per file according to the standard ALTO Open XML Schema we’ll call ALTO XML Format, and (b) an apparently newer ABBYY OCR scan stored in a single 1.47GB text file stripped of all layout and formatting information and each scanned page stored within <pagetext></pagetext> tags we’ll call Simplified XML Format.

There are two fundamental and meaningful distinctions between the two formats the dataset will be provided:  (1) XML tag hierarchy complexity and (2) file size.  This suggests two different approaches to processing these two different formats:

[1] ALTO XML Format – numerous small *.xml files within a nested file directory structure with more complex xml tag hierarchy

APPROACH:  With thousands of files to process this will be an I/O bound task so we’ll probably want to distribute the corpus across 5 computers (one college subset per machine).  Secondarily, we’ll place the largest fragments on the fastest SSD HDD machines.  Finally, we’ll use the faster lxml parser within BeautifulSoup to extract out the actual text.  The text is stored as individual words in deeply nested <String> tags within the attribute CONTENT=”<word>”.

In addition, we want to use the richer ALTO XML tag hierarchy to identify words of special importance such as (Sub)Titles and proper nouns.  We can do this by searching within each page scan for words with larger font size and/or special capitalization.  Font size information is contained in each <String> tag within the attribute fields of HEIGHT and WIDTH while capitalization is reflected in the actual word of the CONTENT attribute.  See the block below for an illustration of 4 levels of importance for words.

Normal Font Words (indicated by HEIGHT and WIDTH attributes)

  1. Regular text words (e.g. “):
  2. Proper Noun text words (e.g. “Wednesday”, “Kenyon Lords”):
  3. ALL CAPS text words (e.g. “):

Larger Font Words (indicated by HEIGHT and WIDTH attributes)

  1. Regular text words (e.g.  “sees”, “unexpected”, “growth”)
  2. Proper Noun text words (e.g. “President Johnson”)
  3. ALL CAPS text words (e.g. “LORDS WINS IN OVERTIME”)

 

 

 

Here is an example of (Sub)Titles from the scan of the first page of the Nov 4th, 1964 Kenyon Review (notice the size of the HEIGHT and WIDTH attributes compared to normal text below)

<String ID=”TB.Img0001.6_1_0″ STYLEREFS=”TS_10.0″ HEIGHT=”284.0″ WIDTH=”1884.0″ HPOS=”864.0″ VPOS=”9704.0″ CONTENT=”Johnson” WC=”1.0″/><SP WIDTH=”448.0″ HPOS=”2748.0″ VPOS=”9680.0″/><String ID=”TB.Img0001.6_1_1″ STYLEREFS=”TS_10.0″ HEIGHT=”284.0″ WIDTH=”2244.0″ HPOS=”3196.0″ VPOS=”9680.0″ CONTENT=”Landslide” WC=”1.0″/></TextLine><TextLine ID=”TB.Img0001.6_2″ HEIGHT=”300.0″ WIDTH=”4700.0″ HPOS=”884.0″ VPOS=”10140.0″><String ID=”TB.Img0001.6_2_0″ STYLEREFS=”TS_10.0″ HEIGHT=”276.0″ WIDTH=”2228.0″ HPOS=”884.0″ VPOS=”10164.0″ CONTENT=”Democrats” WC=”1.0″/><SP WIDTH=”244.0″ HPOS=”3112.0″ VPOS=”10140.0″/><String ID=”TB.Img0001.6_2_1″ STYLEREFS=”TS_10.0″ HEIGHT=”276.0″ WIDTH=”848.0″ HPOS=”3356.0″ VPOS=”10144.0″ CONTENT=”Add” WC=”1.0″/><SP WIDTH=”244.0″ HPOS=”4204.0″ VPOS=”10140.0″/><String ID=”TB.Img0001.6_2_2″ STYLEREFS=”TS_10.0″ HEIGHT=”272.0″ WIDTH=”1136.0″ HPOS=”4448.0″ VPOS=”10140.0″ CONTENT=”Seats” WC=”1.0″/>

TEXT:  “Johnson Landslide Democrats Add Seats”

Here is an example of normal text scanned from the same page of the Kenyon Review.

<String ID=”TB.Img0001.6_6_2″ STYLEREFS=”TS_10.0″ HEIGHT=”72.0″ WIDTH=”76.0″ HPOS=”1776.0″ VPOS=”11388.0″ CONTENT=”a” WC=”1.0″/><SP WIDTH=”52.0″ HPOS=”1852.0″ VPOS=”11332.0″/><String ID=”TB.Img0001.6_6_3″ STYLEREFS=”TS_10.0″ HEIGHT=”124.0″ WIDTH=”580.0″ HPOS=”1904.0″ VPOS=”11356.0″ CONTENT=”majority” WC=”1.0″/><SP WIDTH=”72.0″ HPOS=”2484.0″ VPOS=”11332.0″/><String ID=”TB.Img0001.6_6_4″ STYLEREFS=”TS_10.0″ HEIGHT=”104.0″ WIDTH=”128.0″ HPOS=”2556.0″ VPOS=”11352.0″ CONTENT=”of” WC=”1.0″/><SP WIDTH=”60.0″ HPOS=”2684.0″ VPOS=”11332.0″/><String ID=”TB.Img0001.6_6_5″ STYLEREFS=”TS_10.0″ HEIGHT=”108.0″ WIDTH=”632.0″ HPOS=”2744.0″ VPOS=”11348.0″ CONTENT=”36802500″ WC=”1.0″/><SP WIDTH=”68.0″ HPOS=”3376.0″ VPOS=”11332.0″/><String ID=”TB.Img0001.6_6_6″ STYLEREFS=”TS_10.0″ HEIGHT=”100.0″ WIDTH=”336.0″ HPOS=”3444.0″ VPOS=”11352.0″ CONTENT=”votes” WC=”1.0″/><SP WIDTH=”68.0″ HPOS=”3780.0″ VPOS=”11332.0″/><String ID=”TB.Img0001.6_6_7″ STYLEREFS=”TS_10.0″ HEIGHT=”100.0″ WIDTH=”128.0″ HPOS=”3848.0″ VPOS=”11348.0″ CONTENT=”to” WC=”1.0″/><SP WIDTH=”64.0″ HPOS=”3976.0″ VPOS=”11332.0″/><String ID=”TB.Img0001.6_6_8″ STYLEREFS=”TS_10.0″ HEIGHT=”108.0″ WIDTH=”800.0″ HPOS=”4040.0″ VPOS=”11336.0″ CONTENT=”Goldwateis” WC=”1.0″/>

TEXT:  “a majority of 36802500 votes to Goldwateris” (Goldwater)

 

[2] Simplified XML Format:  One 1.47GB file has all scanned pages from all newspaper editions across all 5 Ohio Colleges.  In contrast to the ALTO XML Format, the Simplified XML Format has collapsed all text scanned on each page into a single <pagetext></pagetext> tag losing all original layout information and associated semantic meaning.

 

APPROACH:  Because of the large size of this file we may have to read the file in a streaming fashion unlike the approach for ALTO XML Format where we simply read the entire file into memory before searching for and extracting out terms.  With 8GB of memory, we may not encounter problems using the same BeautifulSoup/lxml parser technique above, but we’ll prepare to use one of two streaming techniques: (a) lxml with SAX or (b) RegEx reading in a line at a time.

Here is an extract from the Simplifed XML Corpus

<pagemetadata>
<title>Page 1</title>
<description></description>
<subject></subject>
<creator></creator>
<publisher></publisher>
<contributor></contributor>
<unmapped></unmapped>
<unmapped></unmapped>
<date>1976-09-17</date>
<type></type>
<format>.jp2</format>
<identifier></identifier>
<publisher>Oberlin College</publisher>
<language></language>
<relation></relation>
<coverage></coverage>
<rights></rights>
<audience></audience>
<isPartOf></isPartOf>
<unmapped>Oberlin Review (Oberlin, Ohio), 1976-09-17</unmapped>
</pagemetadata>
</page>
<page>
<pagetitle>Page 2</pagetitle>
<pageptr>4288</pageptr>
<pagefile>
<pagefiletype>thumbnail</pagefiletype>
<pagefilelocation>http://server15963.contentdm.oclc.org/cgi-bin/thumbnail.exe?CISOROOT=/p15963coll9&amp;CISOPTR=4288</pagefilelocation&gt;
</pagefile>
<pagefile>
<pagefiletype>access</pagefiletype>
<pagefilelocation>http://server15963.contentdm.oclc.org/cgi-bin/showfile.exe?CISOROOT=/p15963coll9&amp;CISOPTR=4288</pagefilelocation&gt;
</pagefile>
<pagefile>
<pagefiletype>master</pagefiletype>
<pagefilelocation></pagefilelocation>
</pagefile>
<pagetext>Page 2 THE OBERLIN REVIEW Friday September 17 1976 v vv VVVVV Ve Tl I II i ne aamimstration aeiays while the union pays Administration backpedalling is needlessly prolongingimportant contract negotiations with College secretaries and administrative assistants The delay has been costly to the union After four months of talks between Oberlin College Office and Professional Employees union OCOPE and theadministration an agreement on principles was reached June 30 On August 17 the Colleges lawyer presented a first draft of the contract supposedly settled a month and a half earlier except for details in language The draft differed in almost every major area from the agreement of June 30 Items not even discussed such as a costofliving decrease were included and OCOPEs four and a half to six percentincrease mysteriously became a straight four percent hike Neither OCOPE nor its members are rich Administrative assistants receive as little as 4400 for the school year and they are not forced to join the union If dues are increased as they almost certainly will be thanks to the expense of extra months of negotiations then membership could very well fall off In this light the administrations deliberate footdragging looks less like an understandable attempt at cutting costs and more like unionbreaking A four to six and a half percent pay increase to some of the lowest paid workers at Oberlin is both overdue and thoroughly acceptable Further administration delay of OCOPEs contract is not Tht REVIEW encourages comrades and adversaries to submit articles and letters Both must be typed on a S space line double spaced and signed and may be mailed to The Oberlin REVIEW Student Union Box 34 Deadlines are Sunday and Wednesday after noons for the Tuesday and Friday issues respectively IK emove Darners Grant disabled students some independence Graduate laments To the editor I looked at my transcriptyesterday On it appeared the pattern of my education for four years Once or twice I selected courses that were minor variations ofinformation I already knew I chose those courses because I knew I could do well with average effort I did it for the old CPA Other courses I winced to think about floundering helplessly I had insufficentbackground to even formulate the questions that might have rescued me from incomprehension I did respectably only by the grace of fanatically researched term papers And about the remaining courses Im not complaining I made good choices The point of this letter is that I made those choices alone The academic advisor was a person I visited as an irritating formality because I couldntregister without his signature At least I was comforted by his efficency I was always out the door in ten minutes he never inquired about my course grades or capabilities I was happy with this freedom to take what I pleased withoutchallenge I encouraged the faculty belief that Oberlin students could decide for themselves that it was our decision and in the end we knew best The purpose of anadvisor was to agree But Ive discovered that I didnt know everything about the courses I selected Nor did I know much The College can do itself andhandicapped students a favor by making campus buildings more accessible By quickly making the top priority changes suggested in a recent report about obstacles on campus for the handicapped the College can put out a welcome mat for handicapped prospective students and make life easier for those now on campus The little expense required to make the changes could easily be offset by theadditional sources of outside financial aid that handicapped students can tap Augmenting the natural accessibility provided by the flat terrain here can make the College a desirable campus for the one out of every six Americans who ishandicapped A drawing card for talented highly motivated handicapped students would add handsomely to Oberlins other attractions After the priority items Mudds steep ramp and heavy doors for example have been taken care of more expensive projects like elevators for Kettering and Severance should be considered The Colleges first order of business however should be removing those first barriers that prevent most disabledstudents from leading a reasonably inde pendent life on campus f V VVWVV v about designing an integrated and comprehensive education for a four year period My foresight was based upon the experience of previous semesters an experience which was less than the four years of college I now possess It is only now that I can begin to suggest the sequence of courses a person might pursue in English or Psychology my major areas Half the time I felt no connectionbetween one course and another I was not building upon knowledge I was only widening my mud hole of unconnected facts Now that college is over I wish an advisor had come alongheedless of the current attitudes toward the Oberlin studentindependent wants to make his own decision who scrutinized my course schedule who was even at times disagreeable Jonathan Brakarsh Class of 76 Injustice to dancers To the editor We are writing to inform this community about what we feel is an injustice to the dancers in our College Basically we do not agree with the priorities that the Oberlin Dance Company has established for this year We feel our dance company should serve the Collegecommunity by being a place where dancers can learn more about all aspects of dance It should not be an organization whose primary concern is to further the creative work of the dance professors Realizing that the professors are artists themselves struggling to do their own work makesapparent the conflict betweensimultaneously being an artist and a teacher of art Metzker and Woideck are being paid to teach dance to Oberlin students This should be their first responsibility By accepting only eight new dancers to the company and devoting a large part of the companys time to Metzkers work the company is serving the professors needs more than the students needs As shown by the large turnout at the auditions there is a significant number of dancers who desire the experience of being in thecompany This year especially when there is no advanced class and all other classes are filled with long waiting lists it is apparent to us that this small company is going to be an organization which fails to serve the needs of too many eager dancers in this College We hope that in the future the Oberlin Dance Company can be more responsive to the needs of the dancers in this community Ann Scheman Marcy Olmsted and other members of the Oberlin Dance Community Ms murder political To the editor The Oberlin Tradition and Oberlini history linger on our campus like latenight fog and I for one am weary of the lie Rhetoric is useless and I will not waste my time or yours explaining the numerous times Third World people have had to hear a motto and know that it overtly ignored them I simply want to expressfrustration and disappointment at the difficulty so many students of all colors seem to have in expressing support of a campus movement to express disapproval of the daily murders of black colored people in South Africa One student refused to voice an opinion on the subject saying Im trying to lay low on political issues I ask all of you is murder political I have seen Oberlin students campaign for saving trees byrecycling paper with moreenthusiasm than they have shown in See LETTERS p 3 obSEVIEW VOLUME 105 NUMBER 3 FRIDAY SEPTEMBER 17 1978 Published by the students of Oberlin College every Tuesday and Friday during the lad and spring semester excepting holidays and examination periods and on Fridays during Winter Term Subscription 1800 per year Second Class postage paid at Oberlin Ohio Entered as second class matter at the Oberlin Ohio post office April 2 1911 Offices 60 South Pleasant Street Oberlin Ohio 44074 Telephone 2161 775 8123 775 5440 TOM ROSENSTIEL EXECUTIVE EDITOR JEFF HORTY BUSINESS MANAGER Carolyn Butter Scon Maier Evelyn Shunaman Managing Editors C S Heinbockel Steve Maas Editorial Board Chairman Kiren Ghei Dave Meardon Hal Straus Naws Editors Josh Levin Pern Sommera Commentary Editors C S Heinbockel Robin Wallace Arts Editors Peggy Dorf Bill Warner Sports Editors Daniel Friedman Photography Editor Welling Had Advertising Managar Francit Alley Assistant Business Managar Editorial comment and policy are collectively determined by the members of the editorial board composed of the editors business manager and senior staff The opinions expressed in editorials are the ultimate responsibility of the elected chairman and are not necessarily those of Oberlin College or of the Association of Students of Oberlin College CAROLYN DULLER ISSUE EDITOR
</pagetext>

Page One of the Oberlin Review, Sep 17th, 1976


Potential additional areas of improvement:

  • Spell Check for topic words (too expensive for entire corpus)
  • Contract/correct spelling based upon different OCR engines FineReader ver 9 and 12
  • Identify Topic Sentences and Summary Sentences at the start and end of each identifiable paragraph to help find key terms.
  • Exploit journalism “funnel” style to augment key word/topic discovery
  • Use known domain space (news: political, sports, arts, etc) to help clean text and guide machine learning algorithms
  • Some form of Entity Recognition to identify multi-word Proper Nouns like “Kenyon College”
  • Compare/Contrast unsupervised categorization on individual pages as well as entire newspaper issue, synthesize to overcome chopping of articles across page boundaries
  • Seed supervised classification with mined topic words

 


A Note on why the ALTO XML Format is of particular value for Digital Humanities Research as opposed to the Simplified XML Format:The ALTO (

The ALTO (Analyzed Layout and Text Object) XML Schema was designed to preserve as much layout meta information as necessary to enable recreation the original appearance of the document.  ALTO XML Schema is currently in version 3.1 as of January 2016.

Although the single 1.47GB file may have slightly better OCR accuracy and be marginally easier to parse it is not nearly as useful without all the layout meta information encoded in the ALTO tag set.  The OCR accuracy over the entire corpus ranges from acceptable to horrible depending largely upon the quality of the scanned microfiche and the complexity of the original columnar newsprint layout.  The number of articles per page varies tremendously and frequently articles on the same page are scrambled together with numerous articles are split across page.  Since a scanned page is the smallest unit of text that can be fed into machine learning algorithms, the OCR errors combined with the scrambled text severely handicaps common algorithms like topic extraction or sentiment analysis.

So much of what machine learning algorithms could tell us in the best possible case is already encoded in the ALTO layout tags.  For example, we could directly extract nearly all article titles and subtitles based upon font size tags.   Positional tags could also tell us potentially indicate topic sentences and summary sentences that are stylistically more informative in newspapers.  The larger fonts could also help assist spell correction of key terms and focus more computationally intensive machine learning algorithms on key subsets of the relatively large corpus.

Best OCR Settings for Creating XML Files

 

Word_Spotting_and_Recognition_with_Embedded_Attributes.jpg

OCR with CNN from Interesting Github OCR Resource Page

 

Our HackOH5 Hackathon has newspaper OCR scans saved as *.xml files with two different structures.  One makes it easier to simply grab all the text on the page.  The other is slightly more difficult to parse but provides a lot more syntactic and semantic information that would be valuable for downstream natural language processing.

The full *.xml dataset was released a few days ago.  You can download it at a *.zip file (514MB) which decompresses into one large *.xml file (1.47GB).  The dataset will be available in a variety of batch file formats (pdf, jp2, html, xml) as well as via online interactive REST-like API requests based upon OCLC’s CONTENTdm digital content server.  A sample dataset with several college newspapers is available on the HackOH5 website and has a wider variety of formats including one with a richer *.xml tag structure.

The OCR/scan configuration for generating *.xml files is different for both the full *.xml dataset as well as the smaller sample dataset.  These different xml tag structures can be visualized using online xml visualizers.

The full *.xml dataset has each page of OCR text embedded with the text area of <pagetext> tags.  The *.xml files in the smaller sample dataset have OCR text set the to value of the attribute “CONTENT” in <String> tags.

Full *.xml dataset with simplified *.xml file with all text between <pagetext> tags

<metadata>

..<record>

….<structure>

……<page>

……..<pagefile>

……..<pagetext>(all the newspaper text as one long string)</pagetext>

……..<pagemetadata>

 

The full *.xml dataset makes it easier to pull out each page of OCR text by concatenating within a <pagetext> tag for each scanned page, but this ignores potentially valuable structural information found in the smaller sample dataset *.xml files.  For example, the more complex *.xml markup in the smaller sample dataset has the tag structure.

Sample dataset with complex *.xml tag structure that preserves more syntactic and semantic information

<alto>

..<Description>

..<Styles>

..<Layout>

….<Page>

……<PrintSpace>

……..<TextBlock>

……….<TextLine>

…………<String>

…………<SP>

…………<String … HEIGHT=”<float>” HPOS=”” VPOS=”” CONTENT=”word”>

… (any number of String e.g. words alternating with SP e.g. spaces)

……….</TextLine>

… (any number of TextLines e.g. Sentences))

……..</TextBlock>

… (any number of TextBlocks e.g. Paragraphs)

……</PrintSpace>

….</Page>

..</Layout>

</alto>

Although it takes more work, the more complex *.xml structure of the smaller sample dataset provides us structural/syntactic information that denotes meaning/semantics:

  • <TextLine> tags mark sentence units as far as the ABBYY can detect (parsing for a period cannot always successfully parse a character stream into sentence units, especially with noisy text from newspaper OCR)
  • <TextBlock> tags mark paragraph units (this valuable syntactic/semantic information is completely lost when all pages text is concatenated together)
  • <String… HEIGHT=”<float>”> the HEIGHT attribute of the <String> tag gives valuable semantic clues  as to which words on the page are titles and which are simply the body of the text.
  • <String… STYLEREFS=””> may give some clue as to special text like titles or italicized text depending on scanner settings, font sets, etc.  I didn’t see a lot of information conveyed in the few files I looked at for this characteristics, but it may apply in other scans or future rescans.
  • <String… HPOS=”<float>” VPOS=”<float>”> show exact positioning of each word on the page and could prove useful in disambiguating layout.  Probably too complex for this exercise, but no mete/information should be unnecessarily deleted as a rule.

Of all the formats potentially available for our HackOH5 hackathon, the richer xml tag structure provides the most information in ready to use format.  Much of this information could be retrieved from *.htm files.  While the HTML markup is not as precise as XML, the OCR conversion process to *.htm does some of the fuzzy categorizations for us in terms of binning font sizes for titles (in the case of *.xml font size is given as floating point numbers and are not uniform requiring statistical analysis).  Although ABBYY FineReader can output *.htm files, no *.htm files were provided in either sample datasets so they may not exist.

The only other format that provides more information are the *.jp2 image files of original microfiche, but these would need to be run through another OCR program(s) to extract information.  My initial experiments suggest rescanning older ABBYY *.jp2 files can only be done for on a sampled basis or for retrieving subsets identified and severely limited by NLP algorithms.

In sum, when creating ABBYY OCR documents opt for outputting the following four file types:  *.txt (raw text), *.htm (marked-up text), *.xml (more precise marked-up text) and *.jp2 (default res should enable future re-scans with newer OCR software although a higher res lossless TIFF format might be a way to take advantage of future OCR enhancements).  For the *.xml configuration, try to preserve as much structural and font information with XML tags rather than simplifying the tag structure.

Parse engines like BeautifulSoup4 and lxml make it simple to extract out text from even the most complex xml structures.  But once you simplify your XML tag structure you lose potentially valuable semantic information for subsequent NLP that can never be recovered and result in less accurate textual analysis.

 

Chromebook for Data Science

Toshiba-Chromebook-13-inch1.jpg
I think in terms of 3 tiers for towers/laptops for Data Science:
1) The very thin client (laptop running ChromeOS or Ubuntu Linux) to do most everything in the cloud, hopefully with some client-side capabilities
2) A modest laptop to be able to do some Digital Humanities/Data Science/Computer Science locally (Mac OS X or Ubuntu Linux)
3) A powerful laptop/PC tower that can do processor-intensive tasks, especially deep neural nets with CUDA/GPU acceleration (most likely a custom tuned LinuxOS)
As an initial foray into Digital Humanities, we decided to go with the least expensive option (1) above.  I’ve designed most of the exercises for this fall’s Digital Humanities course to have most/all of the labs on the cloud accessible via a chrome web browser and/or terminal command line.
 
As the capabilities and underlying technologies of the chrome books constantly shift every 2-3yrs, it would be useful to see the practical capabilities of current chromebooks (2017) and how compatible they are with various key software packages for Digital Humanities.
 
Here is a list of software to install on the Chromebook from need-to-have (eg Chrome browser) to nice-to-have but not necessary (Docker/VirtualBox).  I’ve done a prelim search and it seems all the must haves and nice to haves are available in one form or another (eg SSH via Chrome extension).  Not all solutions are ideal but they seem to work.
 
** Latest Chrome browser + compatible extensions

** ChromeOS/*nix-like command line interface

* typical *nix utilities
* SSH client
Enhanced terminal client
** basic IDE like VIM with configs for Python and JavaScript
* github
* miniconda to full anaconda distribution (if mini we’ll need various specific python libraries installed)
* additional python libraries
(*) JavaScript dev environment, eg: node.js, npm, webpack, etc
* enhanced IDE like LightTable (free), JetBrains (free for students)
virtualenv (for lightweight JavaScript envs)
* more dev apps I could test out like various Python/JavaScript web frameworks, databases, data science apps, etc.
* more specialized apps like for data science and visualization clients
– Docker/VirtualBox
 
** Must have
* Should have
(*) Should have, not absolutely necessary for fall DH class
– Unlikely to work/perform well
 
Ideally, we would work on a Mac OS X or Linux variant machine because Chrome OS is a locked down of vanilla Linux.  In addition, Chrome OS may diverge from *nix standards as Google feels confident in making Chrome OS more proprietary.  Still, it offers an easier to manage solution compared to other Linux solutions at the cost of configuration flexibility.
 
From a management perspective, we’d like to have one or several standard recovery images we could quickly reinstall should any chromebook end up in an inconsistent state.  I’m designing the course/labs so that as much as possible is backed up to the cloud including data and programs.  Rather than spending x? hours debugging any problems, we’ll just blow everything away with a fresh OS install and/or batch script install everything else.  Little if anything should be lost because most all student data and programs either stored or mirrored to the cloud.
 
Another option worth exploring dual-booting both x86 and ARM Chromebooks into either ChromeOS or an open LinuxOS.  This is definitely a route we should explore if only to know our options when we run into restrictions ChromeOS and evaluate the cost of workarounds.  We may be able to squeeze out noticeably more performance from the machines if we dropped the GUI and potentially embedded bloatware in ChromeOS.  
Finally, I’d like to run a number of tests/evaluations to benchmark what the limits of these machines are under various conditions.  The information we get would help us set guidelines as to what are practical limits of assignments/work we can do locally on these machines.  For example, could we setup at least trivial “hello world” type intro tensorflow DNN running a dual boot stripped down CoreOS server?  If so, how complex can our network get/how much data can we train with?
 
Maybe we have to scale back our expectations to run only simpler machine learning algorithms on these Chromebooks.  That raises the question of which ones we could reasonably run, on what sized data sets and with what expected run times?  I’m hoping we’ll be able to run reasonably complex ML algorithms on small datasets like one or a few dozen novels for a planned digital literary analysis class.  We’ll need to run preliminary test/benchmarks to quantify these limits.
 
We could explore additional packages and client-side installs that could be useful outside our upcoming Digital Humanities course.  We could write up some guidelines, develop some support procedures and write some scripts to automate recoveries/installs.
Here are some Chrome developer extensions that should run on Chromebook.
Chrome Applications/Extensions:
– Enable Developer Mode and follow these instructions to install (careful with Node.js)
     * Anaconda/Python
     * Chromebrew Package Manager
     * git client
     * Node.js/JavaScript (follow these updated instructions)
– Configure SpyderIDE (installed with Anaconda)
– Configure Jupyter Notebook
– Install at least one local IDE and cloud IDE Extension listed below
Chrome Browser Local IDE Extensions:
Text
Web-based IDE:
Cloud9
 Prolong SSD/Flash Memory Lifespan by tuning your system
 

Optimizing Python for Data Cleaning

ns-as-secs

If 0.3 nanosecond were 1 second it would take 32 millennia to reboot your computer

 

In our HackOH5 hackathon we have 24 hours to clean, analyze and visualize over 170,000 files.  The files consist of unstructured and relatively noisy OCR’d newspaper text dating from 1856.  Even in the best of situations (eg with clean, structured numeric data), machine learning and especially neural network algorithms/model building can take relatively long times on even modestly sized datasets.

The quality of our analysis will depend upon the quality of our dataset which we’ll have to clean up in a number of ways.  To ensure we maximize our limited 24hour window of time, we want to optimize both our cleaning of the dataset as well as our ML/NN algorithms.  In this post, we’ll talk about cleaning our dataset.

Our Python data scrubbing could be optimized at a number of points like any other program.  In thinking about optimizing, there is a clear priority that will result in the fastest performance boost.  Although it was published in 2012 and needs to be updated around the edges (7200rpm HDD and SSD numbers are probably different, but not orders of magnitude so), here is a good guide by which to set your priorities:

Latency Comparison Numbers
--------------------------
L1 cache reference                           0.5 ns
Branch mispredict                            5   ns
L2 cache reference                           7   ns                      14x L1 cache
Mutex lock/unlock                           25   ns
Main memory reference                      100   ns                      20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy             3,000   ns        3 us
Send 1K bytes over 1 Gbps network       10,000   ns       10 us
Read 4K randomly from SSD*             150,000   ns      150 us          ~1GB/sec SSD
Read 1 MB sequentially from memory     250,000   ns      250 us
Round trip within same datacenter      500,000   ns      500 us
Read 1 MB sequentially from SSD*     1,000,000   ns    1,000 us    1 ms  ~1GB/sec SSD, 4X memory
Disk seek                           10,000,000   ns   10,000 us   10 ms  20x datacenter roundtrip
Read 1 MB sequentially from disk    20,000,000   ns   20,000 us   20 ms  80x memory, 20X SSD
Send packet CA->Netherlands->CA    150,000,000   ns  150,000 us  150 ms

Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns

Credit
------
By Jeff Dean:               http://research.google.com/people/jeff/
Originally by Peter Norvig: http://norvig.com/21-days.html#answers

Contributions
-------------
Some updates from:       https://gist.github.com/2843375
'Humanized' comparison:  https://gist.github.com/2843375
Visual comparison chart: http://i.imgur.com/k0t1e.png
Animated presentation:   http://prezi.com/pdkvgys-r0y6/latency-numbers-for-programmers-web-development/latency.txt

There is also a good Prezi visualization that expands upon these numbers

A great presentation that shows how these relative numbers change by year thru 2017

Basically, what we see is that we need to prioritize our Python data scrubbing in order of efficiently utilizing (1) Network Bandwidth, (2) Disk I/O, (3) Memory I/O with each element having an order of magnitude or two less impact that the previous element.  As our datasets get larger and algorithms more complex, even optimizing lesser factors become noticeable.  Here are common ways we can optimize these three factors:

  1. Network Bandwidth
    1. Filter out and prune noise in the data files to shrink file size like unused metatags or misspelled words that cannot be corrected
    2. Preprocess dataset into a more condensed form without losing valuable information if possible either semantically or syntactically
    3. Split the dataset across multiple computers transferring data in parallel
    4. Compress datasets like plaintext that reduce file sizes significantly before transferring over the network, decompress on the other end in streaming or parallel fashion
  2. Disk I/O
    1. Use fastest disk possible:  SSD > HDD > HDD.  Generally, a HDD with 7200rpm > 5400rpm although look at less advertised throughput which also takes in caching strategies and other factors for a more meaningful global measure.
    2. If available use SSD disk over mechanical spinning HDD, some virtual storage cloud providers offer this at a premium price.  Intel claims SSD are 8x faster than mechanical HDD but SSD also cost about 5x more per GB (2017).
    3.  If possible, configure the disk for optimal performance including:  multiple striped disks with fast bus, performant file system type (eg ZFS with a lot of fast memory vs the default Linux Ext4), file system configuration (eg Journaling off, not fragmented, etc).
  3. Memory
    1. The general rule is more memory at all levels is better since all data must pass through memory of some sort to be processed or transmitted/received.  With proper configuration and programming techniques, the more memory the fewer disk and network accesses are necessary, the smaller the relative overhead of handling these data transfers and the faster the CPU/GPU can process the data whether it be a machine learning algorithm or de/compression.
    2. The general rule is a faster memory like DDR3 SDRAM with upto twice the data transfer bandwidth of its predecessor DDR2 is always good.  DDR2, DDR3 and the newest DDR4 all have different physical specs, are not interoperable and have to have computer systems specifically designed for each type.  DDR5 is in design phase and DDR4 has not reached PC market penetration yet.  As an aside, CPU cache hit ratecache hit rate is interesting topic that relates more to ML/NN algorithms but explains some of the underlying mechanisms at work.
  4. Programming
    1.  

      Unix command line programs like awk may be faster if processing is simple and single pass

    2. To reduce File I/O at the Python instruction set level read in large blocks or entire files at a time.
    3. Open a file as few times as possible rather than multiple open/closes for processing steps (less reusable code)
    4. Using different versions of Python, Cthyon, etc may make a difference for unusual edge cases like unusual network processing that bypasses TCP for a faster network protocol for certain data types/sizes.  It also helps to understand the interactions between Python, C-code it’s often calling and the underlyting Unix OS.

Finally, all of these are general guesstimations.  The only real way to know what improves performance is by finding well designed, realistic and applicable benchmarks and iteratively refining.  Both Python and Unix provide a wealth of tools for assessing performance and automating testing.

These optimizations are for the data cleaning stage.  For our analytic stage there are additional optimization steps with distributed processing and multiple GPUs being the most significant, but that remains for another post.

ABBY FineReader Versions 9 vs 14

For our HackOH5 Hackathon there is a sample dataset posted on the event’s website at https://hackoh5.ohio5.org/.  According to the metatags in the *.xml version of files in the sample dataset the file was parsed with <OCRProcessing… abbyy9.version:9.0.0.7394-3> into the original <fileName>…/TheFiveCollegesOfOhio_2012-Paper.  The obvious interpretation is that these scans were created using the ABBYY OCR engine FineReader version 9 released 2007-10-01.

ABBYY recently released FineReader version 14 on 2017-1-24.  OCR’ing old newspapers from microfiche is notoriously error-prone one could expect significant improvements over the last 5 version updates released over the past 9 years.  The question is, how much of an improvement would there be?  As poor text quality is perhaps the biggest limitation in our project, addressing this issue is a high priority.

Fortunately, ABBY has a limited download 30-day trial for their newest OCR software (only the Win has the newest FineReader OCR engine).  Using this, let’s quantify how much improvement we see using several representational data points across the sample dataset.

First, let’s look at the Sep 12, 1890 issue of The Wooster Voice which is printed in a book-like dual column format of largely long-form editorial content.

0001.jpg

Here is an excerpt of the original ABBYY FineReader version 9 OCR text in XML format:

<String ID=”TB.Img0001b.1_0_0″ STYLEREFS=”TS_10.0″ HEIGHT=”120.0″ WIDTH=”92.0″ HPOS=”952.0″ VPOS=”648.0″ CONTENT=”3″ WC=”0.992″/></TextLine></TextBlock><TextBlock xmlns:ns2=”http://www.w3.org/1999/xlink&#8221; ID=”TB.Img0001b.2″ HEIGHT=”904″ WIDTH=”7116″ HPOS=”208″ VPOS=”1120″ ns2:type=”simple” language=”en”><TextLine ID=”TB.Img0001b.2_0″ HEIGHT=”200.0″ WIDTH=”304.0″ HPOS=”432.0″ VPOS=”1120.0″><String ID=”TB.Img0001b.2_0_0″ STYLEREFS=”TS_10.0″ HEIGHT=”200.0″ WIDTH=”304.0″ HPOS=”432.0″ VPOS=”1120.0″ CONTENT=”V” WC=”0.992″/></TextLine><TextLine ID=”TB.Img0001b.2_1″ HEIGHT=”404.0″ WIDTH=”5636.0″ HPOS=”928.0″ VPOS=”1212.0″><String ID=”TB.Img0001b.2_1_0″ STYLEREFS=”TS_10.0″ HEIGHT=”376.0″ WIDTH=”1076.0″ HPOS=”928.0″ VPOS=”1220.0″ CONTENT=”The” WC=”0.992″/><SP WIDTH=”188.0″ HPOS=”2004.0″ VPOS=”1212.0″/><String ID=”TB.Img0001b.2_1_1″ STYLEREFS=”TS_10.0″ HEIGHT=”388.0″ WIDTH=”2528.0″ HPOS=”2192.0″ VPOS=”1212.0″ CONTENT=”Wooster” WC=”0.992″/><SP WIDTH=”180.0″ HPOS=”4720.0″ VPOS=”1212.0″/><String ID=”TB.Img0001b.2_1_2″ STYLEREFS=”TS_10.0″ HEIGHT=”388.0″ WIDTH=”1664.0″ HPOS=”4900.0″ VPOS=”1228.0″ CONTENT=”Voice” WC=”0.992″/></TextLine><TextLine ID=”TB.Img0001b.2_2″ HEIGHT=”132.0″ WIDTH=”420.0″ HPOS=”208.0″ VPOS=”1884.0″><String ID=”TB.Img0001b.2_2_0″ STYLEREFS=”TS_10.0″ HEIGHT=”128.0″ WIDTH=”248.0″ HPOS=”208.0″ VPOS=”1888.0″ CONTENT=”Vol” WC=”0.992″/><SP WIDTH=”116.0″ HPOS=”456.0″ VPOS=”1884.0″/>

Running this through the parser BeautifulSoup4 to extract the text embedded in the CONTENT attributes of the <String> tags gives the original FineReader version 9 OCR text as:

3 V The Wooster Voice Vol I WOOSTEII OHIO SEPTEMBER 12 1800 No 1 Tlie WOOSter Voice 9l Irvin Literary Society G G Burns 93 Athletic Association J M Gaston 92 BOARD OF EDITORS Articles f Publication under which t u irruDAw n The Voice is issued provide for the election of 11 II IIERRON EditorinChief i i T nitmr i Importers by each ot the organizations repre K L CAMPBELL Business Manager c T n n V 1 seuted on the Board of Control whose duty ASSOCIATES je jQ jgpyj WCekly the proceedings of Aylette Fullerton Locals and Personals his respective organization Associate Editors W R Newell Religious 0f ability and experience have charge of partic F L Blllakd Miscellaneous ular Departments and they with the strong corps of Reporters provided for insure that the The Woosteii Voice under the supervision of a Board of WU not on y representative but that Control representing I lie Faculty and Students of the Urn 1 1 vcrsity of Woostcr is published every Saturday throughout it will also contain the 11CWS Items of inter t he college year Subscriptions may be left at McClelian est WH1 have a hard time escaping SO many Bros E Liberty St or with the Librarian at the University to I Per Annum in advance l5 diligent Searchers I Six Months In advance 75 w n The Editors solicit communications from Alumni Students W Ust that the changes HOW Operative and friends of the University will meet with the approval of all interested All communications designed for publication should be iiiirp it n n i i addressed to the EditorinChief Correspondence of a busi and tllat ThE VOICE Will receive the hearty ness nature to the Business Manager support of every reader of these lines Ncither v v v r the Editors nor the Board of Control will spare Ilditoril aliv cn01ts to make the paper more worthy of such support with each issue THIS issue begins a new era in Wooster University journalism Most of our We noiE the organizations enumerated above readers are doubtless familiar with fVe will make the election of Reporters a part of changes agreed upon but for the benefit of the business of their first meetings It is im those who are not we restate the facts in the portant that good Reporters be chosen and case that very soon The Uiihienuti Voice has been purchased and combined with The Woosler ColleijUtn and A week ago we received a very neat invita the combination will hereafter be known as tion card postmarked at Brookfield Mo which The Woosteii Voice and published every Sat read about as follows Harry C Myers Clara urday of the school term in the form you now C Bradshaw married at Brookfield Mo Tues see it day August twentysixth eighteen hundred The Voice is under the supervision of a and ninety At Home after September tenth Board of Control representing the Faculty and Harry doubtless thought hed surprise us Students The Faculty have two representa but we refused to be surprised Ever since he fives on the Board Drs S J Kirkwood and left here his friends have had their suspicions Vv Z Bennett The Students have six mem of him and they were prepared to hear the hers representing as many organizations as worst Mr Myers spent the Sophomore and follows Y W C A Miss Winona Hughes Junior years with 90 and surrounded himself 91 AVillard Literary Society Miss Luella while in AVooster with a wide circle of friends Wall ace 92 Y M C A S B Linhart 91 who join The Voice in extending hearty con Athenaan Literary Society W E Henderson gratulations

 

And here is an excerpt of the same file using the newer FineReader version 14 OCR engine:

The “Wooster Voice.
Vol. I.
WOOSTER, OHIO, SEPTEMBER 12, 1890.
No. 1.
The Wooster Voice.
BOARD OF EDITORS.
R. IL HERRON, Editob-in-Chief,
R. L. CAMPBELL, Business Manager.
ASSOCIATES.
Aylette Fullerton, – Locals and Personals. W. IL Newell, —– Religions. F. L. Bullard, – – _ Miscellaneous.
The Wooster Voice, under the supervision of a Board of Control representing the Faculty and Students of the University of Wooster, is published every Saturday throughout the college year. Subscriptions may be left nt McClellan Bros , E. Liberty St., or with the Librarian at I ho University.
Per Annum, In advance, – $1.?5 lERMS. j six Months, in advance, – .75
The Editors solicit communications from Alumni Students and friends of ihe University.
All communications designed for publication should be addressed to the Editor-in-Cbief. Correspondence of a business nature to the Business Manager.
Editorial.
THIS issue begins a new era in Wooster University journalism. Most of our readers are doubtless familiar with n:e changes agreed upon, but for the benefit of those who are not we restate the facts in the case.
The Uninersity Voice has been purchased and combined with The Wooster Colleyian and the combination will hereafter be known as The Wooster Voice and published every Saturday of the school term in the form you now see it.
Tiie Voice is under the supervision of a Board of Control representing the Faculty and Students. The Faculty have two representatives on the Board, Drs. S. J. Kirkwood and W. Z. Bennett. The Students have six members, representing as many organizations, as follows: Y. W. C. A., Miss Winona Hughes, ’91; Willard Literary Society, Miss Luella Wallace, ’92; Y. M. C. A., S. B. Linhart, ’91; Athensean Literary Society, W. E. Henderson,
’91; Irving Literary Society, G. G. Burns, ’93; Athletic Association, J. M. Gaston, ’92.
The Articles of Publication, under which The Voice is issued, provide for the election of Reporters, by each of the organizations represented on the Board of Control, whose duty will be to report, weekly, the proceedings of his respective organization. Associate Editors of ability and experience have charge of particular Departments, and they, with the strong corps of Reporters provided for, insure that the paper will not only he representative, hut that it will also contain the news. Items of interest will have a hard time escaping so many diligent searchers.
We trust that the changes now operative will meet with the approval of all interested and that The Voice will receive the hearty support of every reader of these lines. Neither the Editors nor the Board of Control will spare any efforts to make the paper more worthy of such support with each issue.
* * *
We hope the organizations enumerated above will make the election of Reporters a part of the business of their first meetings. It is important that good Reporters he chosen, and that very soon.
* * *
A week ago we received a very neat invitation card post-marked at Brookfield, Mo., which read about as follows: “Harry C. Myers, Clara C. Bradshaw, married at Brookfield, Mo., Tuesday, August twenty-sixth, eighteen hundred and ninety. At Home after September tenth.”
Harry doubtless thought he’d surprise us, but we refused to he surprised. Ever since he left here his friends have had their suspicions of him and they were prepared to hear the worst. Mr. Myers spent the Sophomore and Junior years with ’90, and surrounded himself, while in Wooster, with a wide circle of friends who join The Voice in extending hearty congratulations.

Here is a comparison between the old version 9 and new version 14 FineReader OCR scans:

FineReader Version 9 (10/07):  620 words, 171 Non-English words (171/620 =  28% Error Rate)

FineReader Version 14 (1/17): 591 words, 14 Non-English words (14/591 = 2.4% Error Rate)


Second, from the Mar 28, 1928 issue of the Oberlin Review where the format is very complicated with many columns and irregular title bars and multiple font sizes/types like a mainstream newspaper.

0041.jpg

Here is the text parsed from an *.xml file created with FineReader version 9:

OBERLIN REVIEW OAVOH VOLUME 55 SENIORS PLAN TO TO COLLEGE DURING NEXT
TEN YEARS AS MEMORIAL TO CLASS CLASS OF 1928 ARRANGES HUGE
GFZTeNTnRsISrT N ANCE POLICIES ALTERNATIVES SUBMITTED Individuals Have
Choice of Three Ways of Giving Quota to Budget lie greatest class
memorial In the history of uborlin college is being cirnted by the class
of 108 for insi il of the familiar type of memor ial arrangements have
been made whorehv a irlft of ubiinl KJI WVt tn be used as the class
stiuplates upon presenrauon 10 uoerun college will he made by this years
graduating clas at their tenth annual reunion The Individual
appropriations which will go to make up the gift are being made in
several days The principal method is That each senior take out with the
Equity Life Insurance Society a 1000 life insur ance polity and that for
a period of ten years he turn over the dividends iiivnnug on said
policies to the treasury of the present graduating class toward the
memorial j hi event or me death ot any one having drawn out n policy t
use is immneii providing lor tnc payment ni one hundred dollars to the
class treasury by t lie beneficiary of the policy upon receipt of which
the bene ficiary will receive a check for all interest that had accrued
on policy to time of the holders death Otherwise the dividends at the
close of the leu year period will be either applied to u i roue uwn iii
t lie annual premiums A 1 11 of this sort is a necessity for the payor
paid directly to the holder of the j f Continue Page 1 j I 13 LEAVES OF
ABSENCE APPROVED BY TRUSTEES Seven College Five Conservatory Fac ulty
Members Granted Furloughs Next Year Ten Professors Will Return From Ab
sences for School Year 192 8to 1929 Seven members of the college fac
iiIm and five members of the Conserj vniory faculty were granted leaves
of sil e nce for next year by the Hoard of IV i tees and one was granted
by the j A evrinSti mIlt feeling of hi ieiitial commutes accord1 to an
t1l v svm realism are the Uiouncement made public mis mom outstanding
characteristics of all at the trustee meeting Spanish art according to
Senora Isa the Wleaves that were granted lo VuhmU Wio lectured Tuesm wus
for 11 Dniester only iiv evpnillg th An building ne list of the leaves
granted is as Snora le ivlemia widely known V as a playwright and author
as well Kuril Gelser professor of political 1S a lecturer illustrated
her lecture i nee one year for study and travel Spinlih raiiting with
actual Continued on puge 2 Spanish costumes which she donned DISCUSSION
GROUPS PLAN DEFINITELY FOR FUTURE Cu ry Life Experiment Units Out line
Work Ahead of Them to Meet Through May The student rllsiisslon groups
which were organized by Dr Bruce nrry to perpetuate the spirit and met
In Hi f tio b0ii invention d here some time airo have been diking
definite plans for the future of types of men and women But during the
last week tnls s1 8111 tho IllilltinS of The educational and
psychological j this country has been unique for il have drawn p
questionnaires i there has been no attempt to tell a Miid which thev
intend to carry on story but only to express a feeling this work The
others as vet have that feeling being sadness clloted no definite
programs The Thus in the end Senora de Palleaders win n fnrrv nextnela
concluded both characteristics Monday evening to complete their plans j
The I inoni fn meet once a k until the last of May when a SeneraJ survey
of the semesters work wil be made The methods ure patterned after those
of similar groups organized by Dr Curry In other col leges These
discussion groups hope to to conclusions which will be of j finite value
to those directly con rned and to the college as a whole Tllv also
promise to be of aid to President Wilkin In his svstematlc 1 GIVE 25000
GROVE PATTERSON WILL SPEAK ATYM MEETING Editor of Toledo Blade to Talk
on Phases of American Politics Sunday Night irove Iattorson 05 oil i tor
of the Toledo Blade is to he the speaker at jtlie weekly meeting of the
Y M C A In 1 lit Mens building Sunday evening His subject will be Some
Phases of American Politics This will serve as an Introduction to the
Mock Convention and open discussion Is to be i bold at the close of the
session 12 FREE SCHOLARSHIPS TO BE GIVEN FRESHMEN Six Men Six Women
Entering Next September to Have Full Term Bills Paid TO INCREASE GRANTS
IN AID Funds Available for Needy Students in Payment of Bills Are Raised
40 Per Cent Authorization was given at the j meeting of the trustees
this morning to grant 12 free scholarships to fresh men entering In
September to cover jthe full amount of the term bills In the college of
arts and sciences iO0 The 12 scholarships are to be divided evenly among
entering men and j women and will be granted to those j who have made
distinguished scholasi tic records in high school and whose i financial
situation is such that aid Th nctl eur Her action of the Hoard
niitirovini f available for scholarship aid in proportion to the rease
in tuition charges which goes into effect next September This increase
is to he approximately 40 per cent over the amount available this past
year NATURE OF SPANISH ART CHARACTERIZED IN TALK Senora Isabel de
Palencia Tells About Latent Feeling of Tragedy Realism of Spain from
time to time to aid in the vis ualization of the masterpieces which were
thrown upon the screen The church she declared has ever been the great
patron of Span Ish art and religious subjects the fav orile of every
painter The second characteristic which Se tiora de Halencia ascribes to
Spanish art Is realism She pointed out that I the warlike unsettled life
of Spain has been unfavorable to landscape painting and so Spanish
gertius has ii its talent toward the portrayal j tunic are merged into
one the portrayal of human types expressing me innate national feeling
or trageuj ERRATUM The statement in Tuesdays issue to the effect that
Charles C Hubbard Jr president of the freshman class will be gone the
remainder of the school year should have read for the remainder of the
present term Hub bard leaving last sunuay frtn nt tinme a lew uays oi
iwuinnuvu Is well on the road to recovery and expects to return to
Oberlin arter OBERLIN OHIO TRUSTEES ENDORSE NEW TYPE OF DIPLOMA TODAY
Board Sanctions Faculty Recommenmendation that Smaller Certificate Be
Given Graduates iS TO HAVE LEATHER CASE Parchment Will be Presented in
Morocco Cover Many Colleges Now Use This Form Endorsement of the rei
imendation of the general faculty to the effect that a new type of
diploma be present ed to graduating classes in the future was made by
the Hoard of Trustees at their meeting this morning The new diplomas are
to be smaller In size presented in a simple Morocco leather case of an
8in by iin size with the words Oberlin College stamped in gold on the
cover replac ing the cumbersome scroll form of diploma which has been
used in the past The Latin Old English script is to be retained The
adoption of this new form is in accordance with a general tendency among
larger colleges and universities to furnish their graduates witli a
handier form of certificate on completion of work Among the schools
which now use this form are jLeland Stanford Universities of Illinjois
and Chicago Smith college and many others In both east and west MME
ONEGIN TO CLOSE ARTIST RECITAL COURSE Swedish Contralto Gives Concert
Monday Night in Finney Chapel at 730 Varied Program of Famous Opera
Singer Holds Much in Store for Listeners Hy M S S The last concert on
this semesters artist course will occur next Monday evening Aprpil 2
when Mine Sigrid Onegln contralto assisted by Franz Dorfmuller pianist
will give a recital in Finney chapel at 730 p m Mine Onegin is an
internationally famous singer on both opera and concert stage who
possesses personal charm as well as a golden voice and profound musical
feeling To quote II A Hollows of Minneapolis I canjhurd workouts such as
blocking and recall no other recital in which a tackling but the time
will be devoted great audience was so profoundly j kicking and passing
and drill In stirred or with such good reason j execution of plays This
will con A beautifully varied program hasj linue for three or four weeks
and will been chosen for Monday evening fr those men who are not
partlcwhich follows in full j ipming in baseball track or tennis Ai ivv
wiiii Mm mi ins and Crvi xt fall the men will be called lug from Orpheus
and Kurydice ni i Onhelin Whv Asks mv Fair One Haydn Shakespeare
Conzonettes this Veilchen Warnung Moart Hast lose Liehe Musensolm Der
Erlkonig Schubert Aria Je ne veux pas chanler Xieolo Isouard From the
Billet de Ioterie ITiOlSOl Sleep My Darling Child Old Swedish Lullaby
Tegner Xon je nirais plus au In is Jeunnes Killettes French Bergeretli
from the lSih Century Fairy Pipers Brewer Vocational Survey Summer
Occupations Among Seniors By Dwigbt Ilanawalt Deckhands camp counsellors
sales man day laborers these are some of the tasks to which the senior
men applied themselves in the last summer vacation Twentysix percent of
the men worked as laborers and deckhands 11 percent were camp
counsellors three men were sulesman and GO worked at unskilled labor
This data has been prepared by the Vocational Information Service of the
psychology department in an effort to find out how much vocational
experience college students get ta the summers labor The aim of this
work Is to advise college students that their summers occupation may be
of other benefit than pecuniary In reply to the question What kind of
knowledge did you acquire 20 men said that they had learned something
about handling men and boys 14 had acquired some knowledge of processes
or organization of business and j five had learned more about human j
nature the psychology of the work FRIDAY MARCH 28 1928 BOARD CREATES TWO
NEW ADMINISTRATIVE OFFICES Trustees Approve Fulltime Director of
Admissions Full Time Personnel Officer TO SPEND HALF TIME IN FIELD
Former Will be in Oberlin Only Part of TimeLatter Office Will be
Cooperative Creation of two new offices those of a fulltime director of
admissions and a fulltime personnel officer as additions to the
administrative force of the college was among the action taken by the
trustees this morning The fulltine director of aduterons is to spend
approximately half his time in the field and the ltmainder in Oberlin
His duties in the former regard will be somewhat along the line of
Professor Shermans activities this semester in creating interest in
Oberlin in outstanding contributory high schools The office of the
personnel officer will be the center for the collection of individual
data about students of all sorts It will contain the employment service
and housing service thoroughly reorganized and will in general be a
coordinating center for distribution and utilization of data as well as
the gathering of it FOOTBALL SEASON PLANS ARE DISCUSSED MONDAY Work for
Spring Early Fall Practice 6 Outlined by Coach MacEachron at Meeting To
Have no Hard Workouts NowDevote Time to Drill in Play Execution Plans
for the coming season and for spring practice were the main topics
discussed at the football meeting which was held in the varsity O room
last Monday evening In a short talk Coach aMcEachron outlined his work
for this spring and I he early part of next fall Starting immediately
after spring vacation light practices will be the program for Wednesday
and Friday afternoons from 4 30 to 5 30 There will be no hack about a
week early and this week will be spent in bard practice in preparation
for the opening games which will be with Heidelberg Akron anil Wooster
in the order named Such an opening will not be an easy one Inasmuch as a
number of men could not get to this meeting and the turnout was not as
large as was hoped for there will be another one this coming Monday
night at 7 30 in the same room Loves old sweet song Buy me I some candy
Wisconsin Daily Cardin al IP Shows Diverse ing men The highest wage
earned was about fifty dollars a week being earned by a riveter the
salesmen averaged 2335 per week while the laborers and deck hands
averaged 22 per week In the miscellaneous occupations the experience
gained varied from insight into hotel management to practical seamanship
Of the 61 senior women employed during the summer 14 Jid clerical work
at an average wage of 1750 per week Eleven women were waitresses for the
summer When asked if they liked the work these women were about neutral
In their attitude Seven were camp counsellors and an equal number were
playground Instructors Miscellaneous employments Included playing In an
orchestra tutoring factory work and bookselling The highest paid
employment was a Chautauqua superintendent It is found from these
statistics that the chance for women to gain vocaContlnued on page 2
BOARD OF TRUSTEES APPROVES MAJOR FACULTY APPOINTMENTS IN ACTION AT
SPECIAL MEETING ALUMNI ASSOCIATION TO INAUGURATE NEW PLAN Organization
Will Report Names of Graduates Leaving College to Its Nearest Chapter
Secretary John G Olmslead of the Alumni association has inaugurated a
new plan into the activities of that organization designed to lielp new
graduates of Oberlin as they leave college As soon as the association
learns where the new member is to be located the nearest chapter is
notified so that it can give the newcomer any possible assistance This
notice is J also exchanged between chapters whenever a member moves from
one region to another DAN BRADLEY TO DELIVER BACCALAUREATE SERMON Member
of Board of Trustees Will Preach on Last Sunday to Graduating Class IS
CLEVELAND PASTOR Received Degree from Oberlin in 1882 Has DD From Three
Institutions Dan F Bradley 82 pastor of the Pilgrim Congregational
church of Cleveland will deliver the baccalaureate sermon at
commencement next June according to an announcement made yesterday in
chapel by President Wilkins Dr Bradley received his A B from Oberlin in
1882 and was a teacher in the preparatory department from 1883 to 1S85
In 1885 he was granted the degree of Bachelor of Divinity by Oberlin His
next service for the college was from 1891 to 1S92 when he was a member
of the board of trustees He was again on the board from 1893 to 1902 and
from 1906 to the present Dr Bradley holds a degree of Doctor of Divinity
from three schools The degree was given him by Yankton in 1892 by
Cornell college in 1904 and by Oberlin in 190S DR REED CITES HEALTH AS
GOOD USE OF LEISURE Health Service Head Shows Importance of Sane
Expenditure of Spare Time Health is principally a matter of the
appropriate expenditure oi isure time according to Dr ljdatey B Iteed 03
head of the Health Service at the University of Chicago and former
colleague of President Wilkins who spoke in chapel yesterday noon Health
was defined by Dr Reed as the condition in which we live best and serve
most In the fulfillment of these conditions leisure time should in
general he apportioned among four fields acquaintance with many types of
people reading sports and art and music The Oberlin which he attended
said Dr Iteed offered many and remarkable opportunities for profit among
all these fields but the Oberlin of today offers still more enviable
opportunities for obtaining this sort of health BETTY L HILL 30 ELECTED
VICEPRESIDENT OF W A A Hetty L Hill 30 was elected vice prestdent of th
W A A last Thurs day in chapel as announced at the basketball banquet
last Friday eve ning contrary to the statement in the Review last
Tuesday which stated that Bitty von Wenck 30 had been elected to that
position ICE STORM DELAYS THIS ISSUE OF REVIEW A DAY This Issue of the
Review although dated Friday March 30 is being published Saturday March
31 due to the failure of the electric power following the Ice storm of
last Thursday night NUMBER 46 NAMES PROMINENT EDUCATOR AMONG NEW
PROFESSORS IN CONSERVATORY AND COLLEGE TO DIRECT INTRAMULALS Dr J
Herbert Nichols Will Be First to Fill New Berth Created Some ten major
appointments to the faculty of Oberlin College for the year 19281929
with three promotions comprised the main business trans acted by the
Board of Trustees of the college this morning at their special meeting
called to convene in the administration buildin gat 930 Chief among
these appointments was that of John Herbert Nichols M D Oberlin 11 of
Ohio State University to the position of Professor of Physical Education
and Director of Intramural Athletics Other appointments were Herbert B
Briggs acting associate pprofessor of political science Raymond Cerf
professor of violin and ensemble Leslie AVebber Jones associate
professor of classics Carroll Brown Malone acting associate professor of
history Miss Hope Hibbard assistant professor of zoology Marie Mathilda
Johnson assistant professor of mathematics Arthur L Williams assistant
professor of wind instruments and director of the college band S L
Wallace instructor in classics and fine arts Continued on Page 3 LIKELY
TO REPEAR 3ACT PLAY TUESDAY NIGHT Dramatic Association May Possibly Give
Second Performance Next Week Ice Storm Prevents Performance Tonight
Tickets Exchangeable or Returnable Owing to the ice storm of last night
which has deprived Oberlin of its electric power supply today the
Dramatic Association is unable to present its first performance of The
Importance of Being Earnest but the second performance of the play will
be given Tuesday night Aprpil 3 in all likelihood Tickets for tonights
performance are good in exchange for ones for tomorrow night the
management an nounces or for the performance on Tuesday night should
there be one as now planned If arrangements for the use of Warner hall
for the play Tuesday night should fart through the probability is that
the second performance of the play will not be given in which case money
will be refunded for all tickets not used GIVE EASTER MUSICAL
SERVICENEXT SUNDAY Special Services to be Held in First Methodist Church
Sunday at 4 p m The Easter Musical service at the First Methodist church
will be held Sunday afternoon April 1 at four oclock The program
includes an thems by the Junior and the Senior choirs and instrumental
numbers by assisting artists Mr Don Morrison will be In charge of the
program and Professor William K Breckenridge will be at the organ Those
assisting will be Professor Reber Johnson violin Mr John Wharton violin
Miss Marjory Waters harp and a brass quartette composed of Mr Donald
Stocker Mr Melvln Burriss Mr Walter Sells and Mr Robert Hubbard The
special numbers on the pro gram are Organ prelude Easter morning on Mt
Roubldoux H Gaul Processional Two Bright Angela Right Reverend Frederic
Lloyd Anthem God Hath Appointed Continued on page 2 9 of camnii nmhlems
slri lton

And here is the same text parsed directly with FineReader version 14:

VOLUME 55
SENIORS PLAN TO GIVE $25,000 TO COLLEGE DURING NEXT TEN YEARS AS MEMORIAL TO CLASS
[class OF 1928 ARRANGES HUGE GROVE PATTERSON WILL gift from interest on $1,000 LIFE INSURANCE POLICIES
alternatives submitted
Individuals Have Choice of Three Ways of Giving Quota to Budget
Hip greatest class memorial in the history of Oberlin college is being (•!< ted by the class of 1928, for in- as tin i s|. I of the familiar type of memorial. arrangements have been made wl oreby a gift of about. $25,000 to hr used as the class stiuplates upon presentation, to Oberlin College, will i hi made by this year’s graduating chi’s at their tenth annual reunion.
flie Individual appropriations which will go to make up the gift are being I made in several days.
The principal method is: That each senior take out with tile Equity Life Insurance Society a $1,000 life insurance polity; and that, for a period of ten years, he turn over the dividends I a< « ruing on said policies to the treas-‘ tirv of the present graduating class I toward the memorial.
held at the close of the session.
12 FREE SCHOLARSHIPS TO BE GIVEN FRESHMEN
Six Men, Six Women Entering Next September to Have Full Term Bills Paid
TO INCREASE GRANTS IN AID
Funds Available for Needy Students in Payment of Bills Are Raised 40 Per Cent
Authorization was given at
meeting of the trustees this morning
to grant 12 free scholarships to fresh-attached providing for tue payment ,
In event of the deatli of any one having drawn out a policy, t ’’•use is
nt’ one hundred dollars to the class tn usury by the beneficiary of the policy. upon receipt of which tlie beneficiary will receive a check for ail in* I terest that had accrued on policy to j time of the holder’s death. Otherwise tin* dividends, at the close of the ten yc.tr period, will he either applied to a reduction of the annual premiums, or paid directly to the holder of the (Continued on Page 2)
13 LEAVESOF’ABSENCE APPROVED BY TRUSTEES
Seven College, Five Conservatory Faculty Members Granted Furloughs
Next Year
Ten Professors Will Return From Absences for School Year 192 8to 1929
entering in September, to cover the full amount of the term hills in the college of arts and sciences, $300.
The 12 scholarships are to be divided evenly among entering men and women, and will be granted to those who have made distinguished scholastic records in high school and whose financial situation is such that aid of tiiis sort is a necessity for the payment of their term hills.
This action supplemented the ear-lier action of the Board approving the increase of funds available for scholarship aid in proportion to the increase in tuition charges which goes info effect next September. This increase is to he approximately 40 per cent over the amount available this past year.
NATURE OF SPANISH ART
CHARACTERIZED IN TALK
—— [stirred or with such good reason.”
Senora Isabel de Palencia Tells About beautifully varied program has
Latent Feeling of Tragedy, ‘been chosen for Monday evening Realism of Spain ! which follows in fall:
—— Aria, Away with Mourning and Cry-
An ever-present, latent feeling of ing, from “Orpheus and Eurydice”. realism are theiGluck>
outstanding characteristics of all 1 Ophelia, Why Asks my Fair One. Spanish art, according to Senora Isa- Haydn. Shakespeare Conzonettes. hel de Palencia, who lectured Tues- i>,ls Veiichen, Warnung, Mozart, day evening in the Art building. Rastlose Liehe, Musensohn, Der Erl-
Senora de Peleneia, widely known konjg Schubert.
as a playwright ami author as well
oi
Soven members of the college fat ui ami five members of the Conservatory faculty were granted leaves of al tee for next year by the Board of Ti tees and one was granted by tin* hi t fentiat committee. accora‘n,r to an trairedv seven
• ’ ’Uiicement made public mis mom-h it the trustee meeting.
the 13 leaves that were granted, one was for a semester only. ie list of tlie leaves granted is as follows:
Karl I,. Geiser, professor of political s’ ice, one year for study amt travel.
(Continued on page 2)
DISCUSSION GROUPS PLAN DEFINITELY FOR FUTURE
as a lecturer, illustrated her lecture on “Spanish Painting” witli actual Spanish costumes which she donned from time to time to aid in the visual izal ion of the masterpieces which were thrown upon the screen.
‘The church,” she declared, “hasI jsth century.
Fairy Pipers, Brewer.
over been Hie great patron of Span-Units Out- Ish art ami religious subjects the fav-
orite of every painter.”
Tlie second characteristic which Se-
nora de Palencia ascribes to Spanisii art is realism. She pointed out that Spain
ami has been unfavorable to landscape painting and so Spanish genius lias turned its talent toward tlie portrayal of types of men ami women. But
even in this, she said the painting of;plied themselves in the last summer Ikh’H unique, for vacation. Twenty-six percent of the
there has been no attempt to tell a men worked as laborers
Ci ry “Life Experiment’
line Work Ahead of Them—to Meet Through May
The student discussion group
” Id. Ii were organized by Dr. Bruce tlie war like, unsettled life of Uiirrv to perpetuate the spirit niethcds of tlie week-end convention 1″ I here some time ago. have been fluking definite plaps for the future during the last week.
The educational and psychological this country has units have drawn 4ip questionnaires armind which they intend to carry on story, but only to this work. The others, as yet. have that feeling being sadness
and deck-
express a feeling, hands, 11 percent were camp counsellors, three men were salesman, and 66 Senora de Pal- worked at unskilled labor.
Tiiis data has been prepared by the d into one ,the portrayal of Vocational Information Service of tlie
completed no definite programs. The “Thus in the end leaders will meet with Dr. Curry next encin concluded, “both characteristics
Momhv evening to complete tbdr wr^g|ng tbe innate psychology department in an effort to
The groups intend to meet once a national feeling of tragedy. w*‘< k until the last of May. when a general survey of the semester’s work w’ll be made. The methods are pat

ERRATUM
Tlie statement In Tuesday’s j
—- ——- . Hin Hl.lt r’.nries C Hubbard their summer’s occupation may be of Seven were camp counsellors, and an
„,ter tIlose of slm„ar groups .frps,imiln class, other benefIt thau I(ecuniary. equal number were playground in-
-mized by Dr. Curry in other col •• _____ of the In reulv to the question “What kind structors. Miscellaneous employ-
legpg.
will “he gone the remainder of the In reply to the question “What kind
J-liese discussion groups hope io men^id that they had learned some
finite con- hard, leaving last Sunday evening, for thing about handling men and boys,M
value to those directly con- “j —^ recuperation at home, had acquired some knowledge of pro-
ned. and to tlie college as a whol^ f and ce8Ses or organization of business, and
5 also promise to be of aid u Oberlin after five had learned more about human the chance for women to gain voca-
l^ldent Wilkins in his systematic rPhirn to
•tudy of campus problems. sprii p.uation.
OBERLIN, OHIO, FRIDAY, MARCH 28, 1928
TRUSTEES ENDORSE NEW TYPE OF DIPLOMA TODAY
lSSB“D «F trustees approves
Board Sanctions Faculty Recommen-mendation that Smaller Certificate
Be Given Graduates
IS TO HAVE LEATHER CASE
Trustee? Approve Full-time Director of Admissions, Full Time Personnel Officer
TO SPEND HALF TIME IN FIELD
Former Will be in Oberlin Only Part of Time—Latter Office Will be Cooperative
SPEAK AT Y. M. MEETING Parchment Will be Presented in Mor
—— occo Cover—Many Colleges Now
Editor of Toledo Blade to Talk on Use This Form
Phases of American Politico —— I ______
Sunday Night Endorsement of the recommendation Creation of two new offices, thosei
—— of the general faculty to the effect!of a full-time director of admissions,
Drove Patterson, ’05, editor of the i that a new type of diploma be present and a full-time personnel officer, as I
I oledo Blade, is to he the speaker at’ed to graduating classes in the futurejadditions to the administrative force! the weekly meeting of the Y. M. C. A.|was made by tin* Board of Trustees of the college, was among the action j new
ill the Mimi’S building Sunday evening, at their meeting this morning. taken by the trustees this morning.
His subject will be “Some Phases The new diplomas are to he smaller. The full-ti*ae director of adUs?<ons
of American Politics.’’ This will serve in size, presented in a simple Morocco is to spend approximately half his
Introduction to the Mock Con- leather case, of an 8-in. by G-in. size, time in the field, and the lemainderj As soon as the association learns vention, and open discussion is to he with the words “Oberlin College” in Oberlin, Ills duties ih the former
where the new member is to be locat-stamped in gold on the cover, replac- regard will he somewhat along the line ed the nearest chapter is notified so ing Hie cumbersome scroll form of of Professor Sherman’s activities this that it can give the newcomer
diploma which has been used in the semester, in creating interest in Ober- possible assistance. This
past. The Latin Old English script Un in outstanding contributory high i aiso exchanged between
is to be retained. schools. ! whenever a member moves
The adoption of this new form is The office of the personnel officer ; region to another.
in accordance with a general tend- will he the center for the collection I _________________
larger colleges and uni- of individual data about students of n i 11 nn l ni mr nniiizm all sorts. It will contain the employ DAN dKADLeY TO DELIVER
r I
ment service and housing service, thoroughly reorganized, and will in general he a coordinating center, for distribution and utilization of data, as j well as the gathering of it.
ency anion versities to furnish their graduates with a handier form of certificate on completion of work. Among the schools which now use tills form are Leland Stanford, Universities of Illinois and Chicago, Smith college, and many others in both east and west.
the
MME. ONEGIN TO CLOSE ARTIST RECITAL COURSE
Swedish Contralto Gives Concert Monday Night in Finney Chapel at 7:30
FOOTBALL SEASON PLANS ARE DISCUSSED MONDAY
Work for Spring, Early Fall Practice; 6 Outlined by Coach Mac-Eachron at Meeting
Varied Program of Famous Opera Singer Holds Much in Store for Listeners
i
To
Have no Hard Workouts Now-Devote Time to Drill in
Play Execution
(By M. S. S.) j Plans for the coming season and I
The Inst concert on this semester’s’ f°1- sl”‘ing Practice were the main top-artist course will occur next Monday !'<” ‘Hwussed at the football meeting evening, Aprpil 2, when Mine. Sigrid “Meh was held in the varsity “O” Onegin, contralto, assisted h,v FranzI room last Monday evening.
Dorfmuller, pianist, will give a recital ‘!l 11 abort talk Coach aMcEachron in Finney chapel at 7:30 p. m. oat lined his work for this spring and
Mme. onegin is an internationally ! 1‘‘“rly part of next fall. Starting famous singer, on both opera and con- immediately after spring vacation cert stage, who possesses personal hglit practices will be the program charm as well as a golden voice and I'”” Wednesday and Friday afternoons profound musical feeling. To quote t from 4:30 to 5:30. There will be no II. A. Bellows, of Minneapolis: “I can hard workouts such as blocking and recall no other recital in which a tackling, hut the time will be devoted great audience was so profoundly
to kicking and passing and drill in tin- execution of plays. Tiiis will continue l’or three or four weeks and will he for those men who are not participating in baseball, track, or tennis Next fall the men will be called hack about a week early, and this week will he spent in hard practice in preparation for the opening games which will he with Heidelberg, Akron and Wooster in tlie order named. Such an opening will not lie an easy one.
appropriate expenditure time, according to Dr. uuaiey B. | Reed, ’03, head of the Health Service at the University of Chicago and former colleague of President Wilkins, who spoke in chapel yesterday noon.
Health was defined by Dr. Reed as the condition in which we live best and serve most. In tlie fulfillment of these conditions leisure time should in general be apportioned- among four fields: acquaintance with many types of people, reading, sports, and art and music.
Tlie Oberlin which he attended, said Dr. Reed, offered many and remarking men. The highest wage earned I able opportunities for profit among all
Inasmuch as a number of men Aria. Je ne venx pas chanler, Nicol., eoul.l not get to tiiis meeting and the Isouard. From tlie “Billet tie Loter- turnout was not as large as was hoped ie” (1750-1801) ^or* ^ere be another one this
‘ Sleep, My Darling Child, (old Swed- c°mlng Monday night at 7:30 in the isl. Lullaby) Tegner. j s,,n,e ri,oni’
Non, je n’irais plus au h- is. Jeunues
Fillettes, French Bergeretb < from the Love’s old sweet song—Buy me
I some candy.—Wisconsin Daily Cardinal (IP).
Yocationai Survey Shows Diverse Summer Occupations Among Seniors
(By Dwight Hanawalt) Deck-hands, camp counsellors, salesman, day laborers—these are some of the tasks to which tlie senior men np*
was about fifty dollars a week, being earned by a riveter; the salesmen av-
eraged $23.35 per week, while the lab-1 ties for obtaining this sort of health. I T,ie Easter Musical service at the orers and deck hands averaged $22 ________________
per week. BETTY L. HILL, ’30, ELECTED
In the miscellaneous occupations VICE-PRESIDENT OF W. A
the experience gained varied from “in- Betty L. Hill, ’30, was elected vice sight into hotel management” to “prac- president of the W. A. A. last Thurs- choirs and instrumental numbers by tical seamanship.” jay chapel, as announced at the assisting artists. Mr. Don Morrison
Of the 61 senior women employed basketball banquet last Friday eve- be in charge of the program and during the summer, 14 did clerical njng ,contrary to the statement in the Professor William K. Breckenridge work at an average wage of $17.50 Review last Tuesday, which stated !win be at the organ. Those assisting per week. Eleven women were wait- that Bitty von Wenck, *30, had been be Professor Reber Johnson, vio-
find out how much vocational exper-
lence college students get in the sum- resses for the summer. When asked elected to that position. I mer’s labor. Tlie aim of this work j if they liked the work, these women issue is to advise college students that were about neutral in their attitude.
ments included playing in an orchestra, tutoring, factory work and bookselling. The highest paid employment was a Chautauqua superintendent.
It is found from these statistics that
nature, the psychology of the work-
(Contlnued on page 2)
NUMBER 45
MAJOR FACULTY APPOINTMENTS IN ACTION AT SPECIAL MEETING
ALUMNI ASSOCIATION TO NAMES PROMINENT
INAUGURATE NEW PLAN
Organization Will Report Names of Graduates Leaving College to its Nearest Chapter
EDUCATOR
AMONG NEW PROFESSORS IN CONSERVATORY AND COLLEGE
TO DIRECT INTRAMULALS
Secretary John G. Olmstead of the Alumni association has inaugurated a plan into the activities of that organization, designed to help new
graduates of Oberlin as they leave college.
any notice is chapters from one
BACCALAUREATE SERMON
Member of Board of Trustees Will Preach on Last Sunday to Graduating Class
IS CLEVELAND PASTOR
Received Degree from Oberlin in 1882 —Has D.D. From Three Institutions
Dan F. Bradley, ’82, pastor of the t Pilgrim Congregational church of ‘Cleveland, will deliver the baccalaureate sermon at commencement next June, according to an announcement made yesterday in chapel by President ! Wilkins.
Dr. Bradley received his A. B. from
Dr. J. Herbert Nichols Will Be First to Fill New Berth
Created
Some ten major appointments to the faculty of Oberlin College for the year 1928-1929, with three promotions, comprised 1 lie main business transacted by tlie Board of Trustees of the college tiiis morning at their special meeting called to convene in the administration buildin gat 9:30.
Chief among these appointments was that of John Herbert Nichols, M. D., Oberlin ’ll, of Ohio State University to the position of Professor of Physical Education and Director of Intramural Athletics.
Other appointments were:
Herbert B. Briggs, acting associate pprofessor of political science.
Raymond Cerf, professor of violin and ensemble.
Leslie Webber Jones, associate professor of classics.
Carroll Brown Malone, acting associate professor of history.
Miss Hope Hibbard, assistant professor of zoology.
Marie Mathilda Johnson, assistant professor of mathematics.
Arthur L. Williams, assistant professor of wind instruments and director of the college band.
S. L. Wallace, instructor in classics and fine arts.
(Continued on Page 3)
Oberlin in 1882, and was a teacher in tlie preparatory department from 1883 to 1885. In 1885 he was granted tlie degree of Bachelor of Divinity by Oberlin.
His next service for the college was from 1891 to 1892, when he was a member of the board of trustees. He was again on the hoard from 1893 to 1902, and from 1906 to the present.
Dr. Bradley holds a degree of Doctor of Divinity from three schools. The degree was given him by Yankton in
1892, by Cornell college in 1904, and Owing to (he ice storm of last night bv Oberlin in 1908. which has deprived Oberlin of its elec-
____________________ trie power supply today, the Dramatic
HD DT7CIY PITUC LH7ATTU Association is unable to present its UK. KUjD vl 1 Lu ilEiALlil (first performance of “The Importance
AS GOOD USE OF LEISURE of r insEarnest ’ but tl,e second
LIKELY TO REPEAR 3-ACT PLAY TUESDAY NIGHT
Dramatic Association May Possibly Give Second Performance Next Week
Ice Storm Prevents Performance Tonight—Tickets Exchangeable or Returnable
performance of the play will be given
Health Service Head Shows Importance of Sane Expenditure of Spare Time
Health is principally a matter of the
Ok
these fields, but the Oberlin of today offers still more enviable opportuni-
! Tuesday night, Aprpil 3, in all likelihood. ’
Tickets for tonight’s performance are good in exchange for ones for to-i morrow night, the management an-
•’usure i
nounces, or for the performance on
(Tuesday night, should there be one, jas now planned.
If arrangements for the use of War-
, ner hall for the play Tuesday night ‘should fail through tlie probability is that the second performance of the play will not be given, in which case money will he refunded for all tickets not used.
GIVE EASTER MUSICAL SERVICENEXT SUNDAY
Special Services to be Held in First Methodist Church Sunday at 4 p. m.
First Methodist church will be held Sunday afternoon, April 1, at four A. o’clock. The program includes anthems by the Junior and the Senior
ICE
STORM DELAYS THIS ISSUE OF REVIEW A DAY
This issue of the Review, although dated Friday. March 30, Is being published Saturday, March 31, due to tlie failure of the electric power following the Ice storm of last Thursday night.
lin, Mr. John Wharton, violin, Miss i Marjory Waters, harp, and a brass quartette composed of Mr. Donald i Stocker, Mr. Melvin Burriss, Mr. Wal-| ter Sells and Mr. Robert Hubbard.
The special numbers on the program are:
Organ prelude—Easter morning on Mt. Roubidoux, H. Gaul.
Processional—Two Bright Angela, Right Reverend Frederic Lloyd.
Anthem—God Hath Appointed a (Continued on page 2)

Results:

FineReader Version 9 (10/07):  620 words, 171 Non-English words (234/3261 =  7.2% Error Rate)

FineReader Version 14 (1/17): 591 words, 14 Non-English words (244/3283= 7.4% Error Rate)


Third, from the Nov 20, 1964 issue of The Kenyon Collegian which looks more like a modern newspaper with various multicolumn and irregular formats but not as crowded as the 1928 Oberlin paper and with better quality microfiche.

0007.jpg

Here is the text parsed from an *.xml file created with FineReader version 9:

Empon Collegian BARRY BERGH ON WOMENS COLLEGE PAGE 2 Vol LXXXXI No 5
Gambier Ohio 43022 November 20 1964 THIRTYFIVE CENTS THE VISIT REVIEWED
PAGE 5 ALO Archon Deke Top in Blood Drive Two hundred and ten people
volunteered to give blood for the thirteenth annual visit of the
Bloodmobile to Kenyon College on Tuesday the seventeenth of November
From these 210 volunteers the Bloodmobile received a total of 168 pints
of blood This figure is an average one for the annual blood drive last
years figures for example being 194 volunteers and 170 pints Mrs H L
Warner was in charge of the drive Assisting her in administrative work
were Mrs Thomas Edwards who ran the canteen that was serving during the
drive and Mrs Paul Titus who was at the registration desk Those helping
Mrs Warner in soliciting for the drive were Mrs Robert Baker for the
Kenyon faculty and staff Mrs Walker Mrs Irish and Mr Belton for Bexley
Dixie Long undergraduate chairman and a staff of students consisting of
one representative from each fraternity two independent representatives
and two representatives from each of the freshman dormatories Also
assisting in the drive iRitcheson Resigns IWill Go to SMU College News
Bureau Charles R Ritcheson chairman of the Kenyon College Department of
History has submitted his resignation He will assume a similar position
at Southern Methodist University Dallas Tex effective Sept 1 1965 At SMU
his major responsibility will be development of a new graduate program
leading to the doctor of philosophy degree President F Edward Lund ac
exciting period in its history cepted the resignation which is A native
of Maysville Oklaeffective June 30 with regret homa Ritcheson received
the BA Describing Professor Ritchesons degree from the University of
service to Kenyon President Oklahoma in 1946 studied at HarLund referred
particularly to his vard the University of Zurich direction of the
Symposium on and received the Doctor of PhiCommunication between the
Arts losophy degree from Oxford Uniand Sciences in 1962 At that versity
in 1951 Prior to coming time such eminent authorities as to Kenyon in
1953 he was asso Marjorie Henshaw Clara studies the townspeoples
reaction to Edward Teller and C P Snow ciate professor at Oklahoma Col
her 100 million dollar proposal while W H Webster Alfred on were brought
to Kenyon He also lege for women right and Edward Hallowell The Mayor on
left consider its ef praised Ritcheson for his leader fects in last
weeks performance of The Visit t v k 4 4k jj W CB i m in Candor Freedom
Praised In New N C A Evaluation ing on its assets They were im were the
Arnold Air Society and accrediting group of which Ken the Chase bociety
J he nurses yon is a charter member were Mrs Frank Bailey Mrs The two
evaluators were Dean James Michael and Mrs Thomas paimer C Pilcher of
Wayne State Greenslade University and Dean Richard On the basis of a
percentage Doney of Northwestern Their recomputed by giving full credit
to port has just been made public donors and people rejected as a in
general they were quite imresult of the onthespot physical preSsed with
Kenyon They cornexamination and V credit to mented favorably on the
candor those volunteers who either were and forthrightness of the recent
ill at the time or failed to obtain seif study They also praised the
permission to give the Alpha atmosphere of full academic freeLombda
Omega fraternity un dom the calibre and achieveseated last years winner
Delta ments of both faculty and adminPhi with a percentage of 397
istration the Colleges relations Archon placed second among the with the
Episcopal church salarfraternities with a percentage of ies and faculty
housing The ex345 followed by Delta Epsilon aminers had particular
praise for 308 and Delta Phi 302 President Lunds reestablish by Charles
Spain Verral In midApril of last year Kenyon was reevaluated by the TVT
1 A 1 A 4 Z1 iNorui oenudi bbuciaumi ui vui provide a vaiuabie
opportunity leges ana secondary acnoois an for quiet detachment On the
critical side the evalu ship in developing a program in NonWestern
Studies at the College In his letter to President Lund Ritcheson said
Gratified as I am by my new appointment I shall always feel regret at
missing the years immediately ahead Professor Ritcheson is a memTurn to
page 8 col 5 pressed with the blending of old for Kenyon During the time
I and new buildings in a dignified have been at Kenyon the College and
spacious campus which can has taken great strides forward until at the
present 1 Delieve it stands on the verge of the most It f ators outlined
four areas where the College is facing difficulties The two major
problems they felt are the unusually high attrition rate of students and
faculty and the large debt which has been allowed to pile up since World
War II Turn to page 4 col 3 As The Collegian went to press we learned of
the resignation of Prof Virgil Aldrich head of Kenyans Department of
Philosophy Prof Aldrich hopes to join the faculty of the University of
North Carolina next September Prof Charles Ritcheson i Senate Takes Up
Regulations to be Drinking Changed by Bryan Perilman For the past month
and onehalf the Campus Senate has been discussing the problems of beer
and liquor consumption at Kenyon Among those donors outside the ment of
initiative of the faculty College The problems center around the fact
that Rules and ReguKenyon student body were twen in matters of
educational policy lations Section II D concerning alcoholic beverages
in its generty three of the college faculty and Kenyon they felt has
overcome ality does not conform to existing state statute 430169 staff
eleven from Bexley and most of the disadvantages of its Statute 430169
states Sale to shall sell intoxicating liquor to a four others from
Gambier isolated location while capitaliz Minors Prohibited No Person
person under the age of twenty 1 one years or sell beer to a person
Kenyon Singers At Cleveland College News Bureau The Kenyon College
Singers presented a joint concert with The Notre Dame College Choir of
Cleveland on Nov 14 at 830 pm in Kulas Auditorium Cleveland The singers
sang selections from Camille SaintSaens and arrangements by Robert Shaw
Roger Wagner and Fenno Heath Jointly with the Notne Dame A Day With Bob
Dyl by John Cocks Wearing high heel boots a tailored peajacket without
lapels under the age of eighteen or buy choir they presented Ijlov Let
intoxicating liquor for or furnish Evry Tongue Adore Thee by J it to a
minor unless given by a S Bach O Sacred Head Sore physician in the
regular line of Wounded by J S Bach and Al practice or by a parent or
legal leluia by Randall Thompson guardian Beer is all malt bev Soloists
for the evening were pegged dungarees of a kind of buffed azure large
sunglasses with erages of less than 32 Section Robert Tait of Lima O
William squared edges his dark curly hair standing straight up on top
and 430173 states further Any room Scar of West Newton Mass spilling
over the upturned collar of his soiled white shirt he caused or building
where beer or intoxi Thomas Lockard of West Lake a small stir when he
got off the plane in Columbus Businessmen eating liquor is manufactured
O and Lowell Gaspar of Fairnodded and smirked the ground crew looked a
little incredulous and sold bartered possessed or kept view Park O Dean
Merrill of 111 a mother put a hand on her childs head and made him turn
away Bob Dylan came into the terminal taking long strides walking hard
on his heels and swaggering just a little He saw us smiled a nervous but
friendly smile and came over to introduce himself and his companion a
lanky unshaven man named Victor who looked like a hip version of Abraham
Lincoln Dave Banks who had organized the concert and who was Dylans
official reception committee led Dylan and Victor to baggage claim Along
the way Victor asked us how far we were from the school and where he and
Dylan would be spending the night Learning that Banks had re Turn to
page 4 col 4 Rockville Md was accompanist English Professors to Hear
Famous Speakers Folksinger Bob Dylan College students interested in be
among the speakers at the con modern literature literary criti vention
served a room for them in a small motel seven miles from Kenyon cism
andor the teaching of Eng More than 6000 teachers from he smiled a
little and said Tryin to keep us as far away from the lish are being
invited to attend throughout the United States are school as you can huh
the meetings of the National expected to attend the convention The trip
back from the airport right before the concert he said Council of
English Teachers Con which this year will focus its at was a quiet one
Both men seem and they all came in sweaty vention in Cleveland Thursday
tention on reevaluation of instruc ed rather tired Dylan especially and
yellin Man the audience Nov 26 through Saturday Nov 28 tion in English
who was pale and nervous He was full of football players foot Saul
Bellow author Malcolm Governor Sanford and Albert said he was right in
the middle ball players Banks mentioned Cowley authorcritic Nancy Hale
Kitzhaber professor at the uni of a big concert tour which had that
Kenyon hadnt won a single author Walter Havighurst au versity of Oregon
and president been on for almost two months football game all year and
both thor and English professor at Mi of the Council will open the gen
and Victor reminisced about one men seemed enthusiastic Yeah ami
University Rod Serling tele eral session Thursday at 8 pm in memorable
engagement in Cam No kidden Dylan said and vision writer and North Caro
the Grand Ballroom of the Shera bridge They had this pep rally Turn to
page 3 col 1 lina Governor Terry Sanford will Turn to page 4 col 5

And here is the same text parsed directly with FineReader version 14:

THE VISIT REVIEWED PAGE 5
®be l\fnpon Collegian
THIRTY-FIVE CENTS
Vol. LXXXXI, No. 5
ALO, Archon,
Deke Top in Blood Drive
Two hundred and ten people volunteered to give blood for the thirteenth annual visit of the Bloodmobile to Kenyon College on Tuesday, the seventeenth of November. From these 210 volunteers, the Bloodmobile received a total of 168 pints of blood. This figure is an average one for the annual blood drive, last year’s figures, for example, being 194 volunteers and 170 pints.
Mrs. H. L. Warner was in charge of the drive. Assisting her in administrative work were Mrs.
Thomas Edwards, (who ran the canteen that was serving during the drive), and Mrs. Paul Titus (who was at the registration desk). Those helping Mrs. Warner in soliciting for the drive were:
Mrs. Robert Baker (for the Kenyon faculty and staff), Mrs. Walker, Mrs. Irish, and Mr. Belton,
(for Bexley), Dixie Long, (undergraduate chairman), and a staff of students consisting of one representative from each fraternity, two independent representatives, and two representatives from each of the freshman dorma-tories. Also assisting in the drive were the Arnold Air Society and
the Chase Society. The nurses yon js a charter member, were Mrs. Frank Bailey, Mrs. The two evaluators were Dean James Michael, and Mrs. Thomas paimer C. Pilcher of Wayne State Greenslade. University and Dean Richard
On the basis of a percentage Doney of Northwestern. Their recomputed by giving full credit to port has just been made public, donors and people rejected as a jn general, they were quite im-result of the on-the-spot physical pressed with Kenyon. They corn-examination, and *4 credit to mented favorably on the “candor those volunteers who either were and forthrightness” of the recent ill at the time or failed to obtain self study. They also praised “the permission to give, the Alpha atmosphere of full academic free-Lombda Omega fraternity un- dom,” the calibre and achieve-seated last year’s winner, Delta ments of both faculty and admin-Phi, with a percentage of 39.7 istration, the College’s relations Archon placed second among the with the Episcopal church, salar-fraternities with a percentage of ies and faculty housing. The ex-34.5, followed by Delta Epsilon aminers had particular praise for (30.8%) and Delta Phi (30.2%). President Lund’s “re-establish-
Gambier, Ohio 43022 — November 20, 1964
Ritcheson Resigns Will Go to S.M.U.
College News Bureau
Charles R. Ritcheson, chairman of the Kenyon College Department of History, has submitted his resignation. He will assume a similar position at Southern Methodist University, Dallas, Tex., effective Sept. 1, 1965. At SMU his major responsibility will be development of a new graduate program leading to the doctor of philosophy degree.
President F. Edward Lund ac- exciting period in its history.” cepted the resignation, which is A native of Maysville, Okla-effective June 30, with regret, homa, Ritcheson received the B.A. Describing Professor Ritcheson’s degree from the University of service to Kenyon, President Oklahoma in 1946, studied at Har-Lund referred particularly to his vard, the University of Zurich direction of the Symposium on and received the Doctor of Phi-Communication between the Arts losophy degree from Oxford Uni-and Sciences in 1962. At that versity in 1951. Prior to coming time, such eminent authorities as to Kenyon in 1953, he was asso-
Marjorie Henshaw (Clara) studies the townspeople’s reaction to Edward Teller and C. P. Snow ciate professor at Oklahoma Collier $100 million ‘dollar proposal while W. H. Webster (Alfred), on were brought to Kenyon. He also lege for women.
right, and Edward Hallowell (The Mayor), on left, consider its ef- praised Ritcheson for his leader-fects in last week’s performance of “The Visit.”
Candor, Freedom Praised In New N. C. A. Evaluation
by Charles Spain Verral ing on its assets. They were im
ship in developing a program in Non-Western Studies at the College.
In his letter to President Lund, Ritcheson said, “Gratified as I am by my new appointment, I shall always feel regret at missing the years immediately ahead
Professor Ritcheson is a mem-Turn to page 8, col 5
In mid-April of last year, Kenyon was re-evaluated by the North Central Association of Colleges and Secondary Schools, an accrediting group of which Ken-
pressed with the blending of old ^or Kenyon. During the time I and new buildings in a dignified have been at Kenyon, the College and spacious campus which can has taken great strides forward,
provide a valuable “opportunity for quiet detachment.”
On the critical side, the evaluators outlined four areas where the College is facing difficulties. The two major problems, they felt, are the unusually high attrition rate of students and faculty and the large debt which has been allowed to pile up since World War II.
Turn to page 4, col. 3
until at the present, I believe it stands on the verge of the most
As The Collegian went to press we learned of the resignation of Prof. Virgil Aldrich, head of Kenyon’s Department of Philosophy. Prof. Aldrich hopes to join the faculty of the University of North Carolina next September.
Senate Takes Up Drinking Regulations to be Changed
by Bryan Perilman
For the past month and one-half the Campus Senate has been discussing the problems of beer and liquor consumption at Kenyon Among those donors outside the ment of initiative of the faculty College. The problems center around the fact that Rules and Regu-Kenyon student body, were twen- in matters of educational policy.” lations, Section II D, concerning alcoholic beverages, in its gener-ty three of the college faculty and Kenyon, they felt, has overcome ality does not conform to existing state statute 4301.69.
staff, eleven from Bexley, and most of the disadvantages of its Statute 4301.69 states, “Sale to shall sell intoxicating liquor to a four others from Gambier. isolated location, while capitaliz- Minors Prohibited. No Person
A Day With Bob Dylan
by John Cocks
Folksinger Bob Dylan
r
Jt ~

— A
a
Prof. Charles Ritcheson
Kenyon Singers
At Cleveland
College News Bureau
The Kenyon College Singers presented a joint concert with The Notre Dame College Choir of Cleveland on Nov. 14 at 8:30 p.m. in Kulas Auditorium, Cleveland.
The singers sang selections from Camille Saint-Saens and arrangements by Robert Shaw,
person under the age of twenty- Roger Wagner and Fenno Heath, one years or sell beer to a person Jointly with the Notne Dame under the age of eighteen, or buy choir they presented “Now Let intoxicating liquor for or furnish Ev’ry Tongue Adore Thee” by J. it to, a minor, unless given by a S. Bach, “O Sacred Head, Sore physician in the regular line of Wounded” by J. S. Bach and “Al-practice, or by a parent or legal leluia” by Randall Thompson.
Wearing high heel boots, a tailored pea-jacket without lapels, guardian. “Beer” is all malt bev- Soloists for the evening were pegged dungarees of a kind.of buffed azure, large sunglasses with erages of less than 3.2%. Section Robert Tait of Lima, O.; William squared edges, his dark, curly hair standing straight up on top and 4301.73 states further, “Any room Scar of West Newton, Mass.; spilling over the upturned collar of his soiled white shirt, he caused or building where beer or intoxi- Thomas Lockard of West Lake, a small stir when he got off the plane in Columbus. Businessmen eating liquor is manufactured, O.; and Lowell Gaspar of Fair-nodded and smirked, the ground crew looked a little incredulous and sold, bartered, possessed or kept view Park, O. Dean Merrill of
Turn to page 4, col. 4
Rockville, Md., was accompanist.
English Professors to Hear Famous Speakers
a mother put a hand on her child’s head and made him turn away.
Bob Dylan came into the terminal taking long strides, walking hard on his heels and swaggering just a little. He saw us, smiled a nervous but friendly smile, and came over to introduce himself and his companion, a lanky, unshaven man named Victor who looked like a hip version of Abraham Lincoln. Dave Banks, who had organized the concert and who was Dylan’s official reception committee, led Dylan and Victor to baggage claim. Along the way, Victor
asked us how far we were from the school and where he and College students interested in be among the speakers at the con-Dylan would be spending the night. Learning that Banks had re- modern literature, literary criti- vention.
served a room for them in a small motel seven miles from Kenyon, cism and/or the teaching of Eng- More than 6,000 teachers from he smiled a little and said “Tryin’ to keep us as far away from the lish are being invited to attend throughout the United States are
school as you can, huh?” the meetings of the National expected to attend the convention
The trip back from the airport right before the concert,” he said, Council of English Teachers Con- which this year will focus its at-
was a quiet one. Both men seem- “and they all came in sweaty vention in Cleveland Thursday, tention on reevaluation of instruc-
ed rather tired, Dylan especially, and yellin’. Man, the audience Nov. 26 through Saturday, Nov. 28. tion in English.
who was pale and nervous. He was full of football players—foot- Saul Bellow, author; Malcolm Governor Sanford and Albert said he was right in the middle ball players.” Banks mentioned Cowley, author-critic; Nancy Hale, Kitzhaber, professor at the uni-
of a big concert tour which had that Kenyon hadn’t won a single author; Walter Havighurst, au- versity of Oregon and president
been on for almost two months, football game all year, and both thor and English professor at Mi- of the Council, will open the gen-
and Victor reminisced about one men seemed enthusiastic. “Yeah? ami University; Rod Serling, tele- eral session Thursday at 8 p.m. in
memorable engagement in Cam- No kidden’?”, Dylan said, and vision writer, and North Caro- the Grand Ballroom of the Shera-
bridge. “They had this pep rally Turn to page 3, col. 1 lina Governor Terry Sanford will Turn to page 4, col. 5

Results:

FineReader Version 9 (10/07):  620 words, 171 Non-English words (137/1699 =  8.0% Error Rate)

FineReader Version 14 (1/17): 591 words, 14 Non-English words (103/1668 = 6.2% Error Rate)


 

Fourth and last, here is the Sept 26, 2001 issue of The Transcript of Ohio Wesleyan University which has a simplest news layout of all the modern papers since the 1928 sample above.

0025

 

Here is the text parsed from an *.xml file created with FineReader version 9:

New sculpture built onJAYwalk News Page 5 Glass House breaks thriller
promise Entertainment Page 7 OWU womens rugby buries Wittenberg Sports
Page 8 Ohio Wesleyan University DelawareOhio Volume CXXXVmNo IV
September 26 2001 III V Health Counseling Services understaffed
Editorial Page 2 The oldest independent student newspaper in the nation
Terrorist attack claims another OWU alumnus By Elizabeth Dale The
Transcript He had the unique ability to make each person he was with
feel like they were his best friend Jim Kehoe 84 classmate and friend
said in remembering the life of Ted Luckett Edward Hobbs Luckett II 84
known as Ted is the third OWU graduate who died in the World Trade
Center attacks on Sept 11 While at Ohio Wesleyan Luckett played varsity
soccerand rugby was a brother of Phi Delta Theta and Senior Class
president He graduated with a bachelor of arts degree in history Luckett
went on to become a partner and vice president of Cantor Fitzgerald and
product manager of ESpeed Soccer coach Jay Martin remembered Lucketts
four years on the team fondly Ted represented all that is good about OWU
He was a solid student good in athletics and participated in a lot of
campus activities He was a really good guy Martin said Teds involvement
in athletics continued even after his graduation from college He loved
soccer and at his passing was a coach in the boys and 1 LA AJ Luckett
Ted represented all that is good about OWU He was a solid student good
in athletics and participated in a lot of campus activities He was a
really good guy Soccer coach Jay Martin girls league in their town of
Fair Haven Kehoe said Kehoe said Luckett was also instrumental in
gettingacoaches game going on Sunday afternoons so they could relive a
few of their old glory days Luckett was also an avid sailor who competed
in a race from Newport RI to Bermuda Above sailing and soccer Luckett
has been most remembered for his friendship with many many people Kehoe
said Kehoe said that Luckett held the principles of family friendship
comrriunity and theenjoymentof living life in the moment as core values
He had the perspective to realize that the stuff most of us do from 95
was only a necessary chore to provide security for those values Kehoe
said Luckett is survived by his wife Lisa and three children Jennifer
Grace 7 William Stone II 4 and Timothy Wyatt 4 months A memorial service
was held Monday in Rumson NJ Donations in his memory to help with his
childrens education can also be made to the Luckett Childrens
Educational Trust co Arthur H Tildesley 30 Pine Cove Road Fair Haven NJ
07704 Campus holds peaceful just response to tragedy By Chris Nida The
Transcript As the United States moved ahead with military action against
those responsible for the Sept 1 1 terrorist attacks Ohio Wesleyan
University students and faculty assembled in the HWCC Atrium last
Thursday to support a just and peaceful response to the tragedy
Sponsored by ProgressOWU and Amnesty International along with a variety
of religious groups the rally at OWU was part of a nationally
coordinated effort Students and community members gathered at 146
campuses in 36 states and one Canadian province The nationwide rallies
forpeace were based on the following five principles We unequivocally
condemn the abominable terrorist attacks of Tuesday The nations
political and military leadership must seekjustice rather than revenge
in order to avoid the loss of more innocent lives and to work towards a
lasting peace Americans must resist the scapegoating of people on the
basis of race religion and nationality especially innocent Muslim and
Arab peoples in the US and abroad and take a stand against racism and
xenophobia We urge a consideration of underlying political and economic
causes including an examination of past US actions and foreign policy
that may have contributed to this tragedy The people of the US and their
servants in Government must guard our precious civil liberties with
vigilance and not allow fear and terrorism to undermine our commitment
to freedom Sophomore Liz Magee one of the organizers of the rally at OWU
said that the impact of the rallies is related to the unity amongst
those involved Our strength is in our solidarity Magee said Now more
than ever we must stand united in our quest for peace and justice
ProgressOWU president junior Ryan Sarni urged citizens to i m v r i o
Photo Courtesy Nolan Dutton Students sign their names to the Peace
Banner at Thursdays event which drew over 100 students The rally was
sponsored by ProgressOWU and Amnesty International remember what is at
the heart of American society During this time of crisis we call upon
Americans everywhere to reaffirm their commitment to the principles that
make Americans a great people our respect for freedom and liberty our
embrace of tolerance and diversity and our commitment to due process and
justice Sarni said At OWU green and white ribbons symbolizing life and
peace respectively were distributed to attendees Those interested were
also invited to sign a petition for peaceful justice as well as the
response from Amnesty International Students also had the opportunity to
paint a peace banner while WCSA sponsored a thankyou card for the hard
work put in by local firefighters and policemen Featured speakers at the
rally were Joan McLean associate professor of politics and government
and Martin Hipsky associate professor of English McLean offered three
suggestions as to how to proceed They included making sure any military
action was based on just war principles ensuring that civilians were
never the target of war and to resist the tendency to oversimplify this
conflict and instead address the systemic see RALLY page 5

And here is the same text parsed directly with FineReader version 14:

transcript
The oldest independent student newspaper in the nation
VohuneCXXXVm.No. IV September 26.2001
Terrorist attack claims another OWU alumnus
Campus holds ‘peaceful, just’ response to tragedy
By Elizabeth Dale The Transcript
“He hud the unique ability to make each person he w as with feel like they were his best friend.” Jim Kehoe (’K4), classmate and friend, said in remembering the life ofTcd Luckett.
Edward Hobbs Luckett 11 ( ‘ K4 I, known as “Ted.” is the third OWU
graduate who died in the World Trade Center attacks on Sept.
II.
WhileatOhio Wesleyan.
Luckett played varsity soccer and rugby, was a brother of Phi Delta Theta and Senior Class president He graduated with a
bachelor of aits degree in history. Luckett went on to become a partner and vice president of Cantor Fitzgerald, and product munagerof E-Speed
Soccer coach Jay Martin remembered
Luckett’s ____________________
four years on the team fondly.
“Ted represented all that is good about OWL He was a solid student, good in athletics and participated in a lot of campus
“Ted represented all that is good about OWU. He was a solid student, good in athletics and participated in a lot of campus activities. He was a really good guy.” —Soccer coach Jav Marlin
activities He
was a really good guv,” Martin said.
Ted’s involvement in athletics continued even after his graduation from college.
“He loved soccer and at his passi ng was a coach in the boy s and
girls league in their town of Fair Haven.” Kehoe said.
Kehoe said Luckett was also instnimentul in gcttinga”coacJtes” game going on Sunday afternoons so they could re-liv e a few oltheir old “glory days.”
L uckett was also an a v id sat lor who competed in u race front Newport. R.I.. to Bermuda. Above sailing and soccer,
I utkea has been most remembered for his friendship with many, many people, Kehoe said.
Kehoe said that Luckett held the principles of family, friendship, community, and the enjoyment of living life in the moment os core
values.
“He had the perspective to realize that the stuff most of us do from 9-5 was only a necessary chore to provide security forthose valutas,” Kehoe said.
Luckett
_________________ is survived
by his wife. Lisa, and three children. Jennifer Grace, 7. William Stone II. 4, and Timothy Wyatt. 4 months.
A
memorial service was
held Monday in Rumson, N.J. Donations in his memory to help with his children’s education can also be made to the Luckett Children’s Educational Trust, c/o Arthur H. Tildcslcy, 30 Pine Cove Road, Fair Haven, NJ, 07704,
is
Bv Chris Nida The Transcript
As the United States moved ahead with military action against those responsible lor the Sept 11 icrrorist atlackii. Ohio Wesleyan University students and fucully assembled in the IIWCC Atrium lust Thursdav to support a ‘just and peaceful* response to the tragedy.
Sponsored by ProgressOWU and Amnesty International, along with a variety of religious groups, the rally at OWU was part of a nationally coordinated effort Students and community members gathered at 146 campuses in SO stales and one Canadian province
The nationwide rallies fur peace were based on the following five principles
“We unequivocally condemn the abominable terrorist attacks of Tuesday.
“The nation’s political and military leadership must seek justice rather than revenge in order u» avoid the loss of more innocent lives and to work towards a lasting peace
“Americans must resist the scapegoating of people on the basis Of race, religion, and nationality, especially innocent Muslim and Arab peoples in the U.S. and abroad, and take a stand against rac ism and xenophobia.
“We urge a consideration of underlying political and economic causes, including an exuminationof past U.S. actions and foreign policy, that may have contributed to this tragedy.
“The people of the U.S. and their servants in Government must guard our precious civil liberties with vigilance and not allow fear and terrorism to undermine our commitment to freedom.”
Sophomore Liz Magee, one of the organizers of the rally at OWU, said that the impact of the rallies is related to the unity amongst those involved.
“Our strength is in our solidarity,” Magee said. “Now, more than ever, wc must stand united in our quest for peace and justice.”
ProgressOWU president junior Ryan Sarni urged citizens to
Photo Couwtist Nolan Dutton
Students sign their names to the Peace Banner at Thursday’s event, which drew over 100 students. The rally was sponsored by ProgressOWU and Amnesty International.
remember what is at the heart of American society.
“During this time of crisis we call upon Americans everywhere to reaffirm their commitment to the principles that make Americans a great people—our respect for freedom and liberty, our embrace of tolerance and diversity, and our commitment to due process and justice.” Sami said.
At OWTJ. green and white ribbons symbolizing life and peace, respectively, were distributed to attendees. Those interested were also invited to sign a petition for peaceful justice, as well as the response from Amnesty International.
Students also had the
opportunity to paint a peace banner, while WCSA sponsored a thank-you card for the hard work put in by local firefighters and policemen.
Fcuturcd speakers at the rally were Joan McLean, associate professor of politics and government, and Martin Hipsky, associate professor of English McLean offered three suggestions os to how to proceed. They included making sure any military action was based on just war principles, ensuring that civilians were never the target of war. and to resist the tendency to oversimplify this conflict and instead address the systemic «ce RA1.LV, page 5

Results:

FineReader Version 9 (10/07):  620 words, 171 Non-English words (56/964 =  5.8% Error Rate)

FineReader Version 14 (1/17): 591 words, 14 Non-English words (48/925 = 5.2% Error Rate)


 

OCR Accuracy Summary:

Surprisingly, the nearly decade older Version 9 ABBYY FineReader OCR engine performed as well as the just released Version 14 in most situations (where the papers mimiced the complicated multi-column newspaper format).  In one situation the older engine was slightly more accurate, but the difference was so small it probably falls within the noise of natural variations.

For some reason, the difference was greatest (> 10x) scanning the simplest dual column layout of the 1890 paper.  More testing would need to confirm this large gap is consistent across similar samples.  Also, the sample scans used were relatively clean, so it would be interesting to run the scans against a wider variety of low through high quality scans.

1890 College of Wooster:  28% (Version 9)  -vs- 2.4% (Version 14)

1928 Oberlin College:  7.2% (Version 9)  -vs-  7.4% (Version 14)

1964 Kenyon College:  8% (Version 9)  -vs-  6.2% (Version 14)

2001 Ohio Wesleyan University:  5.8% (Version 9)  -vs-  5.2% (Version 14)

 

HackOH5 Hackathon

Picture-of-Hackathon

Artist Rendition (not to scale)

 

Website:  HackOH5.ohio5.org

Date:  Fri, March 31st – Sat, April 1st

Location:  College of Wooster, Library

Mentors/Workshops on-sight:  Text mining, analysis, GitHub, etc

 

Goal:

Experiment with data, deconstruct it, combine it with other data, visualize it, contextualize it and explore what this tells us about our history.

Data:

  • 5 College Newspapers:  Denison, Kenyon, Oberlin, Ohio Wesleyan, College of Wooster
  • 170,000+ pages
  • From 1856, over 160yrs
  • Provided in batch/file dump or online REST API calls
  • File formats as single/multi-page PDF, jp2, xml and html format
  • Varying OCR quality (relatively noisy data)
  • Style, layout and OCR quality varies by microfiche quality, time period, scan software, font, layout, etc

 

Examples:

See 3 examples posted on HackOH5.ohio5.org website.

 

Guidelines:

  • All Narratives/Stories/Research are based upon Data in some form
  • Find the most interesting story embedded within the Data
  • Don’t force the data in a particular direction, let it lead you in your area of inquiry
  • Ask a progressively narrower sequence of open-minded questions
  • Backtrack, modify questions with preliminary data exploration, iterate
  • Think in multiple dimensions, datasets (external mashup) compare, contrast, set a baseline for comparison
  • All Data is fundamentally represented, manipulated, analyzed and visualized numerically.

 

Kenyon CWL/IPHS Digital Humanities Team Approach:

We’ll divide our efforts into three distinct, but somewhat overlapping and intimately connected teams:

  1. Text Wrangling Team – Get, clean and normalize dataset.  Minimal programming, mostly setting parameters in calls to various Python Libraries.
  2. Analytics Team – (tech) run various Machine Learning (ML) analysis on the dataset and (non-tech) closely explore/examine data to find structure and meta-data that lends itself to exploring various questions in fields of humanities.
  3. Data Visualization Team – get a good feel for the underlying data, ML output results and humanities research question and translate that into the most engaging and intuitive visual experience

 

Text Wrangling – Part A Clean Data

  1. Performance (languages, data types, idioms, libraries, parallel processing, local, cluster, cloud)
  2. Get physical copies of data files
  3. Selectively rescan?
  4. Sample with statistical rigor?
  5. Identify Titles
  6. Identify Topic Sentences/Summary Sentences
  7. Whitespace Filter
  8. Apostrophe Filter
  9. Remove Stop Words
  10. Remove Punctuation
  11. Stemming
  12. Lemmiation
  13. Spell Correction (Titles?)
  14. Grammar Correction (Titles?)

Text Wrangling – Part B Analyze Data

  1. Word Count/Frequencies
  2. Vocabularies
  3. Domain Dictionaries (Race, Sex, etc)
  4. Sentiment Analysis (positive, negative, confidence)
  5. Topic Modeling
  6. Self-Organizing Maps
  7. Trends/comparisons across time
  8. Trends/comparisons across subsets of data
  9. Trends/comparisons with external datasets

Analytics Team

  1. Sample representative sample articles across college source, time period, topics and other significant dimensions and read raw data articles to get an intuition as to the style, content and structure of the dataset.
  2. Think about what is distinctive or interesting about the dataset that you learned about in step 1.
  3. Think about what data/structure is missing from the dataset that could be augmented or contrasted or baselined with an external dataset.
  4. Formulate research questions that this particular dataset(s) alone or with external dataset mashups can answer in a strongest way.  Generate at least 3-5 different and potentially interesting research questions that you think this dataset can particularly answer.
  5. Translate your research question into specific questions we can ask of the dataset.  For example, searching for key terms, vocabularies, pos/neg sentiments, changes over time, comparison with contemporary newspaper sources, etc.
  6. Work with the technical Text Wrangler team to translate these questions into programs in a sequence of an increasingly specific line of questioning.
  7. During the iterative process, feel free to iterate, backtrack and modify your research question based upon prior results.  Better to fail fast and approach the data with a new more productive/interesting research question than to pour a lot of time into what will eventually be a relatively uninteresting or weakly supported claim.

Visualization Team

  1. Briefly read over the top-level tasks for both the Data Wrangler team and Analytics Team to get a preview of what data you may be given to visualize and what are the important ideas to convey with the data/results.
  2. In particular, try to get an intuition for the both the underlying data as well as the analysis that will result in final datasets you’ll need to visualize.
  3. Read about current best practices in Data Visualizations
  4. Quickly review a number of contemporary Data Visualization galleries to get a concrete idea of how data/analysis results are expressed today.
  5. Drill down into our tool of choice to understand how to quickly generate some interesting visualizations with sample code, templates, etc.
  6. Communicate with the other teams, especially the Analytics Team to harmonize between the dataset resulting from the analysis and what the main ideas that need to be expressed/emphasized in as the research results.

 

Create a free website or blog at WordPress.com.

Up ↑