Skip to content

timtom.ch

Thomas' home on the web

Looking to hire me?

Check out my library technology and open data consulting practice.

Looking for pictures?

Check out my portfolio

RSS My photography blog

  • Médiathèque La Passerelle, Vitrolles
    This library near Marseilles is the face of urban renewal for a neighbourhood once blighted by rapid growth and destructive politics.
  • Woodlawn Public Library, Dartmouth NS
    A former multiplex cinema in Dartmouth, Nova Scotia, now serves as library branch and central technical processing for the Halifax Public Libraries.
  • Bibliothèque la Mosaïque de la Pocatière
    Opened for the 1967 Canadian Centennial, this Modernist theatre in a small Québec town now serves as the local public library.
  • Bibliothèque François-Hertel, Cégep de la Pocatière
    The library at Cégep de la Pocatière blends state-of-the-art equipment with mid-century elegance.
  • Bibliothèque les Deux-Ormes – Aix-en-Provence
    Near Cézanne's childhood home west of Aix-en-Provence, this former bastide serves as neighbourhood library since 1993.
  • Bibliothèque Méjanes – Halle aux Grains, Aix-en-Provence
    From granary to post office to public library, the stone vaults of the Halle aux Grains in Aix-en-Provence still offer a welcome oasis of calm next to the busy market square.
  • Bibliothèque Méjanes – Allumettes, Aix-en-Provence
    A former safety match factory has been home to the Aix en Provence public library since 1989. Soon to be renovated, this space is a beautiful example of adaptive reuse of industrial spaces.
  • Bibliothèque multimédia intercommunale d’Épinal
    This public library in Vosges is built around a treasury chest: an unique collection dating back to the first millennium and its 18th century shelves.
  • Bibliothèque de l’Université Laval, Québec
    Contrary to what it stern exterior may imply, Laval University main library in Québec City opened in 1968 as a future-ready flexible space.
  • Médiathèque Protestante du Stift, Strasbourg
    This unique collection survived fire and war to chart the origins of the Reformation in Strasbourg.

Non-photo posts

  • California Suitcase
  • Impressions from the Artist Project 2017
  • Automating and sending speedtest.net data to web services
  • Link dump 2016/8: Open science, books on a ship, waves in the Alps, maker projects
  • Link dump 2016/7: Maps

Elsewhere

  • LinkedIn
  • Twitter
  • Instagram
  • Flickr
  • GitHub
  • Facebook

Support

Archives

  • November 2017
  • February 2017
  • March 2016
  • February 2016
  • January 2016
  • November 2015

Categories

  • Code
  • DIY
  • Life
  • Link dumps
  • Photography
  • Uncategorized
  • Work

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Tag: metadata

Link dump 2016/6: planes over the Atlantic, hidden metadata, name wars and materials

Next time I cross the Atlantic, I shall stop a moment to reflect on everything that’s happening in the background to make this possible. AeroSavvy has a great post explaining how the North Atlantic Tracks system works.

This experimental visualization tool from the Internet Archive is a fun way to explore the popularity of a given concept:

Graph displaying instances over time of the word "Colonialism" in 82,000 books indexed by the Internet Archive.
Dated instances of the word “Colonialism” in 82,000 books indexed by the Internet Archive.

The Anatomy of a Tweet. Unsurprisingly, a tweet is composed of much more than 140 characters. There’s a scary amount of metadata coming along with it.

Serialization formats are not toys. Things to watch out for if you are building a web application that takes YAML, XML or JSON input. Watch it even if you don’t: being aware of how easy it is to break software is sobering.

Game of Thrones — the French Baby Boys’ Names Edition. An hilarious take on the evolution of the most popular boy’s names in France. Also from the same excellent Strange Maps blog, I can’t help but love the straightforward honesty of cartographer Jacques-Nicolas Bellin who admitted to the following on his 1753 map of Australia:

Excerpt from a 1753 map, showing the words "Ceci est Conjecturale".
Excerpt from Bellin’s Carte Réduite des Terres Australes, 1753.

I’m currently reading Mark Miodownik’s Stuff Matters and discovering little snippets information about materials I wasn’t aware of. For example, that the reason reinforced concrete works so well is because “as luck would have it, steel and concrete have almost identical coefficients of expansion” (p. 75). In passing, he also warns against the simplistic equation concrete = ugly:

But the truth is that cheap design is cheap design whatever the material. Steel can be used in good or bad design, as can wood or bricks, but it is only with concrete that the epithet of ‘ugly’ has stuck. There is nothing intrinsically poor about the aesthetics of concrete.

I agree.

Share this:

  • Twitter
  • Facebook
  • Email
Posted on March 14, 2016March 17, 2016Categories Link dumpsTags air travel, architecture, atlantic, books, coding, concrete, dataviz, maps, materials, metadata, navigation, python, security, twitterLeave a comment on Link dump 2016/6: planes over the Atlantic, hidden metadata, name wars and materials

Guessing the language of a book based on its title

In the midst of endless report-writing, I was faced with an interesting challenge at work this week. We are trying to aggregate e-book usage data for the members of our consortium, and we were interested in figuring out how well the French language content is faring compared to the English titles that make the bulk of the collection.

Unfortunately, one of our vendors do not include language data in either their title lists or the usage reports. Before trying to recoup the usage reports with the full e-book metadata I could get from the MARC records, I tried to run the title list through the guess_language library by way of a simple Python script:

from guess_language import guess_language
import csv
with open('2015-01_ProQuest_titles.csv', 'rb') as csvfile:
    PQreader = csv.DictReader(csvfile)
    for row in PQreader:
        title = row['Title']
        language = guess_language(title.decode('utf-8'))
        print language, title

The results were a disaster:

pt How to Dotcom : A Step by Step Guide to E-Commerce
en My Numbers, My Friends : Popular Lectures on Number Theory
en Foundations of Differential Calculus
en Language and the Internet
en Hollywood & Anti-Semitism : A Cultural History, 1880-1941
de Agape, Eros, Gender : Towards a Pauline Sexual Ethic
la International Law in Antiquity
fr Delinquent-Prone Communities
en Modernist Writing & Reactionary Politics

guess_language works by identifying trigrams, combinations of three characters that are more prevalent in one language than another. While it works reasonably well on whole sentences and short text snippets, the particular construction of a book title seems to throw the method entirely off-kilter.

As I was pondering the next steps, I came to realize that I could also filter titles based on language directly in the vendor database and then export to a CSV file… which solved my issue in seconds but wasn’t half as fun as playing around with computational linguistics. Back to writing reports, I guess.

Share this:

  • Twitter
  • Facebook
  • Email
Posted on January 20, 2016January 20, 2016Categories Code, WorkTags ebooks, language, linguistics, metadata, pythonLeave a comment on Guessing the language of a book based on its title
Proudly powered by WordPress