by Seth Bernstein
I’ve just had a lesson come out on automatic transliteration of Cyrillic sources in The Programming Historian so I thought that I would devote this post to shameless self promotion. Then I decided I should also write a little about some of the tools I use to build databases from web information and create visualizations. I’ll pay particular attention to resources that I have found useful for Russian/Eurasian history.
The bulk of my programming I do in a language called Python
. Compared to other languages like Java or C, Python syntax is closer to natural language, making it easy to understand. The open-source community has put together many modules for Python, some of which are quite powerful and useful. There is a great Python full-length course available free at Udacity
(Computer Science 101) and Code Academy
has good interactive lessons for mastering syntax (not only for Python, by the way). The Programming Historian
has lessons for Python and other tools geared toward humanities scholars who would like to learn specific skills (e.g., counting the words in a set of documents or downloading a set of web pages) without learning an entire language.
What makes Python indispensable for me is its ability to extract data easily from web pages. Using a parsing module called Beautiful Soup
, you can, for example, go through all three million names in Memorial’s list of gulag victims
and generate a table (and then a map
and a blog post
) of the sources of the entries in about a dozen lines of code. The Programming Historian
has two lessons that deal with Beautiful Soup.
Read full post here. (Originally posted October 13, 2013)