- This fascinating Pinterest board collects photos, floor plans and other artifacts of Toronto’s Carnegie libraries, most of which are still in service today, although their fireplaces are no longer in use.
- Back to the frozen future with these two vehicles designed to handle snow. The steam-powered Xrot 9213 is still able to free up the Bernina line in eastern Switzerland. Meanwhile, we’re not quite sure where the massive Antarctic Snow Cruiser is.
- I’m not done with trains. The NYT has a great piece on the Bailak-Amur-Mainline (BAM) branch of the Transsiberian.
- Speaking of Siberia, thanks to an amazingly cool weather trick, this Finnish village briefly saw itself reflected in the night sky. Better than an X-Files episode.
- These photos are why I’m trapped in Tokyo forever now. A grittier version of the future emerges from subtle animated GIFs.
- Le 3e lieu m’a tuer – une fois de plus, Mlle Salt a le coup de gueule précis. Pendant ce temps, en Seine-et-Marne, les bibliothèques sont vides (et pour cause).
- This knee-jerk response to a WSJ op-ed daring to question the relevance of library schools is a bit beside the point, however.
- Don’t buy $12 chocolate bars from bearded hipsters.
- binder makes iPython notebooks hosted on GitHub interactive. Looks awesome.
Month: January 2016
Guessing the language of a book based on its title
In the midst of endless report-writing, I was faced with an interesting challenge at work this week. We are trying to aggregate e-book usage data for the members of our consortium, and we were interested in figuring out how well the French language content is faring compared to the English titles that make the bulk of the collection.
Unfortunately, one of our vendors do not include language data in either their title lists or the usage reports. Before trying to recoup the usage reports with the full e-book metadata I could get from the MARC records, I tried to run the title list through the guess_language library by way of a simple Python script:
from guess_language import guess_language import csv with open('2015-01_ProQuest_titles.csv', 'rb') as csvfile: PQreader = csv.DictReader(csvfile) for row in PQreader: title = row['Title'] language = guess_language(title.decode('utf-8')) print language, title
The results were a disaster:
pt How to Dotcom : A Step by Step Guide to E-Commerce en My Numbers, My Friends : Popular Lectures on Number Theory en Foundations of Differential Calculus en Language and the Internet en Hollywood & Anti-Semitism : A Cultural History, 1880-1941 de Agape, Eros, Gender : Towards a Pauline Sexual Ethic la International Law in Antiquity fr Delinquent-Prone Communities en Modernist Writing & Reactionary Politics
guess_language works by identifying trigrams, combinations of three characters that are more prevalent in one language than another. While it works reasonably well on whole sentences and short text snippets, the particular construction of a book title seems to throw the method entirely off-kilter.
As I was pondering the next steps, I came to realize that I could also filter titles based on language directly in the vendor database and then export to a CSV file… which solved my issue in seconds but wasn’t half as fun as playing around with computational linguistics. Back to writing reports, I guess.
Link dump 2016/1: Modernist libraries, fiction publishing, podcasts and Noah Webster
While I keep working on the draft of my first actual blog posts, let’s see if I can also use this space to keep track of what I recently enjoyed reading:
- The Tale of Two Modernist Libraries (Architect Magazine, Dec. 16, 2015) on the ongoing transformation of Philip Johnson’s Boston Public Library and Mies van der Rohe’s MLK library in Washington DC. Not convinced about the metal cladding on the BPL building. Mecanoo’s intervention on MLK looks better, although losing that midcentury lobby will be a shame (somebody save those chairs!). They’re repeating the rooftop garden trick that seemed to have worked well in Birmingham, why not, although the rounded curves of the roof extension are out of character in a Mies building.
- Huh. Kodak unveiled a new Super-8 camera. Also its CEO has pretty cool looking business cards. They seem to be doing everything they can to save colour film, but I’m not sure it will be worth the hassle.
- iOS apps for coding, transmitting, displaying and dashboarding your work (Finer Things in Tech).
- Tor.com has a fascinating post about the process of fiction publishing, taking the latest George R.R. Martin title as an example. This infographic sums it up nicely.
- The End of the Dark Ages of Podcasting. Just because everyone knows about Serial (whose second season is kind of disappointing I must say) doesn’t mean podcasts are mainstream yet, at least not until discovery has been improved.
This week, I also learned that most of American English spelling can be traced to Noah Webster. He axed the extra u’s in colour and neighbour, changed offence to offense and cheque to businesslike check. He’s the one who insisted the letter “z” be pronounced “zee” instead of “zed” (he also wanted “y” to be called “yi” and “w” to become “we”). All this, and much more, from the first chapter of Mary Norris‘ Between You & Me, which is a true delight to read1.
- Nonrestrictive clause ↩