Skip to content

timtom.ch

Thomas' home on the web

Looking to hire me?

Check out my library technology and open data consulting practice.

RSS My photography blog

  • Bibliothèque Publique d’Information du Centre Pompidou
    One of the world's most recognizable museums, Paris' Centre Pompidou is also home to the largest public library in the city.
  • Åbo Akademi Arkens bibliotek
    From ore to lore: this former iron mill in Finland's old capital is now the arts faculty library.
  • Töölö Library, Helsinki
    A nice example of Finnish modernism, this library was originally part of a larger urban project by architect Aarne Ervi.
  • Bibliothèque du Jardin botanique de Genève
    The home of one of the world's foremost botanical collections, this library is also a landmark of Swiss modernist architecture.
  • Bibliothek der Österreichischen Akademie der Wissenschaften
    For over 150 years, the library that saw the birth of higher education in Vienna laid all but forgotten, ending up as a table tennis court for policemen. A spectacular renovation brought it back to life.
  • Archives cantonales vaudoises
    The visual complexity of the Vaud cantonal archives building reflects its internal organization.
  • Biblioteca Comunale degli Intronati di Siena
    The Intronati library combines a modern public library with a priceless historical collection inside the former home of one of Europe's oldest universities.
  • Bibliothèque Félicité-Angers, Neuville
    This little town in Quebec found an original way to preserve rare religious artifacts while updating the little-used church that protects them.
  • Bibliothèque de l’Institut Suisse de Droit Comparé
    A modest pavillon on the university of Lausanne campus hosts a renowned law collection.
  • Former Toronto Reference Library
    When it opened in 1908, the Central Reference Library was among Toronto's first purpose-built library buildings.

Non-photo posts

  • Gingerbread architecture
  • California Suitcase
  • Impressions from the Artist Project 2017
  • Automating and sending speedtest.net data to web services
  • Link dump 2016/8: Open science, books on a ship, waves in the Alps, maker projects

Elsewhere

  • LinkedIn
  • X
  • Instagram
  • Flickr
  • GitHub
  • Facebook

Support

Archives

  • November 2023
  • November 2017
  • February 2017
  • March 2016
  • February 2016
  • January 2016
  • November 2015

Categories

  • Code
  • DIY
  • Howtos
  • Life
  • Link dumps
  • Photography
  • Uncategorized
  • Work

Tag: ebooks

Guessing the language of a book based on its title

In the midst of endless report-writing, I was faced with an interesting challenge at work this week. We are trying to aggregate e-book usage data for the members of our consortium, and we were interested in figuring out how well the French language content is faring compared to the English titles that make the bulk of the collection.

Unfortunately, one of our vendors do not include language data in either their title lists or the usage reports. Before trying to recoup the usage reports with the full e-book metadata I could get from the MARC records, I tried to run the title list through the guess_language library by way of a simple Python script:

from guess_language import guess_language
import csv
with open('2015-01_ProQuest_titles.csv', 'rb') as csvfile:
    PQreader = csv.DictReader(csvfile)
    for row in PQreader:
        title = row['Title']
        language = guess_language(title.decode('utf-8'))
        print language, title

The results were a disaster:

pt How to Dotcom : A Step by Step Guide to E-Commerce
en My Numbers, My Friends : Popular Lectures on Number Theory
en Foundations of Differential Calculus
en Language and the Internet
en Hollywood & Anti-Semitism : A Cultural History, 1880-1941
de Agape, Eros, Gender : Towards a Pauline Sexual Ethic
la International Law in Antiquity
fr Delinquent-Prone Communities
en Modernist Writing & Reactionary Politics

guess_language works by identifying trigrams, combinations of three characters that are more prevalent in one language than another. While it works reasonably well on whole sentences and short text snippets, the particular construction of a book title seems to throw the method entirely off-kilter.

As I was pondering the next steps, I came to realize that I could also filter titles based on language directly in the vendor database and then export to a CSV file… which solved my issue in seconds but wasn’t half as fun as playing around with computational linguistics. Back to writing reports, I guess.

Share this:

  • Share on X (Opens in new window) X
  • Share on Facebook (Opens in new window) Facebook
  • Email a link to a friend (Opens in new window) Email
Posted on January 20, 2016January 20, 2016Categories Code, WorkTags ebooks, language, linguistics, metadata, pythonLeave a comment on Guessing the language of a book based on its title
Proudly powered by WordPress