History 3816G / Digital Humanities 3902G:

Introduction to Digital History

Tuesdays, 6pm

Room UC-222

Contact me

Devon Elliott

delliot8@uwo.ca

Office Hours: Tuesdays, 4:00 - 5:00pm, Lawson Hall Room 1208 or by appointment

Hack Western

March 27-29, 2015
https://hackwestern.com/

Presentation

Jessica
Timeline JS
http://timeline.knightlab.com/

Presentations

Schedule:
http://bit.ly/1KHe4G0

Comparing: Evaluating Historical Websites

Readings:

Comparing: Evaluating Historical Websites

Technology:

  • batch downloading sources with wget and curl

Scarcity or Abudance

  • How have historians worked under conditions of scarcity? What are some of those limitations?
  • What are the promises and perils of working within conditions of abundance?

Copyright

  • Copyright vs. public domain
  • What is the DMCA?
  • What are copyleft, Creative Commons, and other alternatives to copyright?

Unix-like Systems

  • Text-based
  • Command line interface
  • Many tools that each do specific jobs
  • Hierarchical file system
  • Treats processes like files
  • Use of pipes to string tools together
  • Linux most popular clone today; Macs are similar to Linux

INTERMISSION

Command Line

Macs: Use the Spotlight Search, and type Terminal. These computers are packaged with a terminal interface.

Windows: Find the Command Prompt. For Win8+, go to Apps, then Windows System, then Command Prompt. You can also try searching for cmd in the Search Box. For earlier versions of Windows, you'll have to click Start, All Programs, Accessories, Command Prompt.

curl and wget

On a Mac, in Terminal, type: man curl

On a PC, but you can add it by visiting the cURL website at http://curl.haxx.se/download.html but you'll have to choose the correct platform and processor. You can download it and add it to your C:/Windows directory, and curl should be executable from your command prompt.

Package installers for Macs

http://brew.sh/

In your Terminal, enter:
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Now type brew doctor and hit Enter

Installing Wget

With the package installer added to Macs, we can add wget by typing brew install wget and pressing Enter.

For Windows PCs, you'll have to download and add wget.exe to your C:/Windows directory. Go to:
http://users.ugent.be/~bpuype/wget/

When you think you have wget added, type wget and press Enter. You should get a response that says wget: missing URL If you don't see that, let me know.

Wget

Enter wget -h to see the options for wget.

Enter mkdir wget-activehistory and press Enter. This should create a directory called wget-activehistory. To change to that directory, type cd wget-activehistroy and press Enter.

Wget

wget [options] [URL]

wget -r --no-parent -w 2 --limit-rate=20k http://activehistory.ca/papers/

Wget with the Internet Archive

Open a new Terminal or Command prompt. Create a directory called wget-internetarchive.

In your browswer, start with the Advanced Search
http://archive.org/advancedsearch.php

Enter new york clipper in the Query box, select identifier for the "Fields to return", and sort by identifier asc. Choose the CSV file format, and click Search. This should save a search.csv file to your computer, likely in your Downloads folder. Drag it into your wget-internetarchive folder.

Wget with the Internet Archive

Rename the search.csv file to identifier.txt and then open it with Notepad or TextEdit. You should see a list starting with "identifier". We want to erase all the quotation marks, and that first line. Then save the file. Make sure there isn't an empty line at the beginning after you erase "identifier". To erase the quotation marks, use ctrl-f and put the " in the search field. Then replace all with nothing.

Wget with the Internet Archive

In Terminal or Command Prompt, change to your wget-internetarchive directory. Now run wget with the following:
wget -r -H -nc -np -nH --cut-dirs=2 -A .pdf,.txt -e robots=off -l1 -i ./identifier.txt -B 'http://archive.org/download/'

Query Structure

http://blog.archive.org/2011/03/31/how-archive-org-items-are-structured/

No class next week!

Have a great reading week!

See you on Feb. 24.

Contact me at delliot8@uwo.ca or stop by Lawson Hall Room 1208 on Tuesdays, 4:00-5:00. I'm also available before and after class on Tuesdays, or by appointment.