Creating Song Lyrics Graphs

A couple of months ago, I wrote myself a tool which could take a text file of song lyrics and generate an image showing how frequently each word appeared in the song (like a word cloud, where more frequent words were larger), and which words followed which words (unlike a word cloud, since it had arrows between the words).

After trying it on quite a few different songs, I came up with the idea of feeding it a very repetitive song, such as the road trip song 99 Bottles of Beer.

A directed graph of the lyrics of "99 Bottles of Beer," with words as nodes and edges between subsequent lyrics.

Yesterday, I decided to post this image to the Reddit r/dataisbeautiful community, and it received a lot of interest. I’ve had some people ask how I created an image like this, which this post will try to answer.

Directed Graphs

While I’ll try to keep from getting too technical, one thing we need to understand is that this song lyric image is a directed graph.

Simplified, a directed graph is a bunch of nodes (the circles, each with a unique word of the song) and edges (the arrows showing how the words are related).

For example, an edge (arrow) from “99” → “bottles” means that “99” comes just before “bottles” in the song lyrics.

I can create a directed graph with a (free!) tool like yEd Graph Editor, which lets me draw nodes (circles) and drag edges (arrows) between them.

The first two lines of the song – “99 bottles of beer on the wall / 99 bottles of beer” – in graph form

So with this alone, I could create an entire song lyrics graph, but it would take a very long time – there are thousands of words in all ninety-nine verses of the song, so I’d have to draw thousands of arrows.

Automatically Generating a yEd Graph

To save time, I want to be able to take a text file of song lyrics and automatically convert it into a yEd document.

yEd files are in a format called GraphML. Here’s a sample of a very simple graph, and the GraphML that describes it:

A simple graph, with the following nodes: 99, bottles, of, beer. Arrows join these nodes in that order.

Lines 1–6 tell us that this is the start of a GraphML document, and lines 17–18 end the document. What we care most about is the nodes (lines 8–11) and edges (lines 13–15).

You can see that each <node> has an id. Each <edge> has a source (where the arrow comes from) and target (where the arrow points to), and they use those same node ids. So, for example, <edge source="99" target="bottles"/> means “draw an arrow from the node with an id of 99 to the node with an id of bottles.”

Notice that each node can have multiple edges, so we only need to define each word as a node once – even though “bottles” is used hundreds of times throughout the song, we only need a single node with an id of bottles, and then we can refer to it with as many edges as we need.

Effectively, what I need to do is create a script which will loop through the lyrics text and create a <node> for each unique word. Then I need to go back through the lyrics and, looking at each pair of adjacent words, create an <edge> between them.

The resulting code is my song-lyrics-graph Python script. It’s built using the basic concept above, though it has some additional features too – plain vanilla GraphML doesn’t allow things like specifying the size of nodes, but yEd adds extensions to the GraphML document that let me do that.

As long as Python is installed on your computer and you’ve downloaded my script, you can drag and drop a .txt file of song lyrics onto the song_lyrics_graph.py file, and it will generate a .graphml file with a directed graph of your song.

yEd Layouts

My script does generate all the nodes and edges, but it doesn’t position them in a pretty layout – the file it generates will just have all the nodes on top of each other.

Screenshot of yEd

Fortunately, yEd has a layout engine that will try to figure out a good arrangement of the nodes. Open the Layout menu, and you’ll see a large selection of layouts to choose from.

Screenshot of yEd's Layout menu

For most songs, I’ve found out that the Tree / Balloon layout seems to work best, though you can certainly experiment with the others.

When you select Layout / Tree / Balloon, a set

Screenshot of the yEd Balloon Layout settings menu. Root Node Policy: Weighted Center Root. Routing Style for Non-Tree Edges: Straight-Line. Preferred Child Wedge: 200. Preferred Root Wedge: 360. Minimal Edge Length: 10. Compactness Factor: 0.5. Place Children Interleaved and Straighten Chains is checked, all other checkboxes are unchecked.

Again, you can play around with the settings to try to make the graph look good, but these are the settings I usually use.

Click OK, and yEd will arrange your nodes as it sees fit.

From there, you can export your graph as a .png image by using the File / Export menu!

Converting GPS Data Between GPX and KML

Part of the GPS Mapping Tutorials series.

GPX (GPS Exchange format) and KML (Keyhole Markup Language) are both file types used to store GPS data. While many applications can use either file formats, Google products (Google Earth, Google My Maps) tend to prefer KML, so it’s often helpful to be able to convert between them.

(Note that both .kml and .kmz file extensions represent KML files; the latter is just a zipped version to reduce file size.)

This tutorial will teach you how to convert between GPX and KML (in both directions) using GPS Visualizer.

Continue reading “Converting GPS Data Between GPX and KML”

Extracting GPX Files From a Garmin Automotive GPS

Part of the GPS Mapping Tutorials series.

This tutorial will teach you how to record route data on a Garmin automotive GPS and extract it into a GPX file (which can then be used by mapping software).

I wrote this tutorial using a Garmin DriveSmart 50 LMT. However, I’ve had success using the same steps with other variations of the Garmin nüvi and DriveSmart series.

Continue reading “Extracting GPX Files From a Garmin Automotive GPS”

Generating GPX and KML maps with Ruby on Rails

While working on my various projects, I’ve dealt with various types of maps.

My Flight Historian plots flight data using the Great Circle Mapper tool. These maps are simple to generate from my flight data (I just have to pass it a plain text collection of airport codes) and easy to embed in my website. However, because they are static images, they can’t be easily panned, zoomed, or otherwise manipulated in the way that modern map websites and apps can.

A sample Great Circle Mapper map of my flights in 2018.
(Map generated by Paul Bogard using the Great Circle Mapper – copyright © Karl L. Swartz)

On the other hand, my driving maps require too much detail for a single image, so I create them in Google Earth, which lets me manipulate the view as much as I need to. The driving data is a bit more complicated than my flight log data; while my flight log represents the abstract shortest distance straight line between two airports (and thus only requires specifying the airport at each end), a single drive can involve tens of thousands of coordinates that can be joined together, connect-the-dots style, to show the actual driving route taken.

A sample driving data map using Google Earth

Fortunately, all those coordinates are automatically generated and saved by my car’s GPS navigation unit in a file format called GPX (GPS Exchange Format), which is an XML-based file format which contains (among other things) latitude/longitudes sampled, in the case of my particular GPS, once per second.

Sample from a GPX file from a recent trip

Google Earth doesn’t use GPX format (though it can import it); instead, it uses a format called KML (Keyhole Markup Language, from back when Google Earth was Keyhole EarthViewer). KML is also an XML-based format, so conceptually it’s similarly a collection of coordinates that can all be joined together, with its own slightly different style.

The same set of latitudes, longitudes, and altitudes in KML format. (Note that KML reverses the order of longitude and latitude.)

But while GPX and KML can be used to represent complicated route shapes, they don’t have to be. These formats are both just as capable of taking a pair of points on the globe and drawing the shortest line between them. With that in mind, I decided to try to have Flight Historian automatically generate KML and GPX versions of my flight map, which would let me show my flight routes in Google Earth and Google Maps.

Continue reading “Generating GPX and KML maps with Ruby on Rails”

Creating Multiple Flash Messages in Ruby on Rails

On my Flight Historian application, a number of my pages make use of the flash and flash.now session messages capability for errors, warnings, successes, and informational messages. However, some of those pages needed to have multiple messages of the same type (e.g., multiple warnings), which flash didn’t allow me to do. Additionally, I had some views that were generating status messages of their own (for example, if a collection was empty on a page that had multiple collections), and so I ended up with several ways to generate messages that didn’t output consistent HTML.

Continue reading “Creating Multiple Flash Messages in Ruby on Rails”

Finding Your Tail Number (When You Can’t See It)

Every aircraft has a unique registration number, usually printed on or near the tail – think of it like a license plate for an airplane.

Photo of several regional jets with the tail numbers highlighted

I track tail numbers on my flight log in order to let me keep track of whether I’ve been on a particular airplane before, and to let me know which particular airplanes I’ve flown on most often.

If you can see the tail number of your plane, great! But if you can’t see it, you can usually find the tail number with a bit of detective work.

Continue reading “Finding Your Tail Number (When You Can’t See It)”