Parsing Boarding Pass Dates in Ruby

The bar codes on paper or electronic boarding passes contain a good deal of data about a given flight. One of my goals for Flight Historian is to allow me to add a new flight by scanning the bar code, but in order to do that, I need to write a Ruby parser for the data in these boarding passes. This parser will accept bar code data, and return a collection of field names, values, and its interpretation of what those values mean.

One of the more difficult challenges I’m running into, though, is interpreting the date of the flight from the bar code.

The Date Problem

A PDF417 scanner (for paper boarding passes) or an Aztec Code scanner (for electronic boarding passes) will return data that looks something like the following:

M1BOGARD/PAULD        ENRL3YA YYZMUCAC 0846 079Y021A0013A6B2>5321MO6079BAC 0014953170001                          2A016246946044002AC                     0PC *30601    0922       AC 524UO          5                       0F          NNN

The date is a three digit number starting at the 45th character, which contains an ordinal date (the number of days from the start of the year). In the example above, it represents the 79th day of the year.

But what year? The boarding pass doesn’t explicitly encode the year, so we’ll have to guess!

Estimating Sensible Dates

In general, if I’m scanning a boarding pass to add it to my flight log, I could make a safe assumption that the date is no further than year from now, since I’m not aware of any airlines that issue a boarding pass more than a year in advance (or any that even let you purchase a ticket more than a year in advance). However, it’s entirely possible that the day could be in the past, too, since I could be scanning an old bar code – and if we want to make this parser useful for anyone, it needs to be able to handle old bar codes sensibly.

Case: Ordinal Date and No Other Information

If we don’t have any further information beyond the day of the year, the best we can do is an educated guess. We’ll start with what we know: it’s the 79th day of some year.

def interpret_ordinal_date(raw)
  day_of_year = raw.to_i
  output = "#{day_of_year.ordinalize} day of the year "
  return output
end

We’d like to provide some likely dates as well, too.  We know that the date must be less than a year from now, and to limit our options, I’ll assume that the user is not scanning a boarding pass more than two years old.

Assumption: The flight date is earlier than one year after today.
Assumption: The flight date is not earlier than than two years before today.

Thus, if today is 15 January 2017, we want to return a list of all 79th days of the year where the date is greater than or equal to 15 January 2015 and less than 15 January 2018. In more general terms:

search_range = (Date.today-2.years...Date.today+1.year)

Note that we’re not using Ruby’s 2.years.ago notation, since that returns a Time and we want to work with pure Dates. Also, we’re using Ruby’s triple dot range notation, which means the range does not include the second number.

So we can then create an array of likely dates. We know we’re going to want to use this functionality again, so we’ll wrap it in a lambda so we can call it like a method within a method:

# Return an array of all dates in search_range that match day_of_year.
# If specific_years are given, only return results in those years.
find_matching_date = lambda {|search_range, day_of_year, specific_years: nil|
  likely_dates = Array.new
  specific_years ||= (search_range.begin.year..search_range.end.year)
  specific_years.each do |y|
    begin
      this_date = Date.ordinal(y, day_of_year)
      if search_range.cover?(this_date)
        likely_dates.push(this_date)
      end
    rescue
    end  
  end
}

range = (Date.today-2.years...Date.today+1.year)
matching_dates = find_matching_date(range, day_of_year)

Though we’re not using it yet, we’ve added a block that will let us filter our results by specific years. We use Ruby’s ||= syntax to check if it’s set, and if not, we just search all of the years in the provided search_range.

We use a begin/rescue block because it’s possible that this lambda might receive a date that’s not valid in every year (for example, day_of_year = 366 is only valid in leap years). If a date’s not valid, that’s fine, we just won’t include it in the array.

Case: 366th Day of Year

So speaking of that 366th day of the year, we have a little more information: this date must be in a leap year. Because of this, we know the date cannot be later than this year (if today were 31 December 2015, the flight date must be less than 31 December 2016, so we can’t match the 366th day of next year). So we’ll make the assumption that day 366 must be the last day of this year (if this year is a leap year) or the last day of the most recent leap year (if this year is not a leap year). Updating our list of assumptions:

Assumption: The flight date is earlier than one year after today.
Assumption:
 The 366th day of a year occurs in the latest leap year less than or equal to this year.
Assumption:
 The flight date is not earlier than than two years before today.

The new leap year assumption could potentially conflict with the final assumption (since the most recent leap year could be more than two years ago), so if any assumptions conflict, the one that’s earlier in the list takes precedence.

Because we added the begin/rescue block, we can reuse our find_matching_dates lambda; it will simply pass over the 366th day of non-leap-years in our range, and only return us leap years.

Because the largest gap between leap years is 8 years (centuries that aren’t divisible by 400 are not leap years), we need to search between December 31 of seven years ago, and December 31 this year, inclusive.

# Find the latest 366th day of the year through the end of this year:
range = (Date.new(Date.today.year-7,12,31)..Date.new(Date.today.year,12,31))
matching_dates = find_matching_date.call(range, day_of_year).last(1)

If we find multiple leap years in this range (which is likely since most 8-year blocks have two leap years), we only want to use the most recent one, so we use .last(1) to return only the last element as a single-element array.

Case: Boarding Pass Issue Date Available

The boarding pass data could potentially contain a second date: an optional four-digit number starting at the 7th character after the first “>”, which contains a the last digit of the year followed by an ordinal date.

M1BOGARD/PAULD        ENRL3YA YYZMUCAC 0846 079Y021A0013A6B2>5321MO6079BAC 0014953170001                          2A016246946044002AC                     0PC *30601    0922       AC 524UO          5                       0F          NNN

In the example above, it represents the 79th day of a year that ends in 6.

This isn’t a full year, so we still have to guess the decade, but it still gives us some additional information.

First, we can assume that the boarding pass was issued on or before the flight date, and that the flight date is less than one year after the boarding pass date.  We also can assume that the boarding pass issue date is not later than today, because if it were issued in the future, we wouldn’t have a boarding pass to scan yet. Finally, we will assume the most recent boarding pass date that matches all these criteria is the correct one. Updating our assumptions:

Assumption: The flight date is earlier than one year after today.
Assumption:
 The 366th day of a year occurs in the latest leap year less than or equal to this year.
Assumption:
The boarding pass date is not later than today.
Assumption:
The flight date does not occur before the boarding pass issue date.
Assumption: The flight date is less than one year after the boarding pass issue date.
Assumption: Of all valid boarding pass dates, the most recent is the correct date.
Assumption:
 
The flight date is not earlier than than two years before today.

We’ll write a lambda for estimating the boarding pass date as well, since then we’ll also be able to use our parser to parse a boarding pass date.

# Returns the most recent date matching day_of_year in a year ending in
# year_digit. If flight_date is provided, search relative to that instead
# of relative to today.
estimate_issue_date = lambda { |year_digit, day_of_year, flight_date: nil|
  flight_date ||= Date.today # if flight_date not set, set it to today
  search_range = (flight_date-10.years+1.day..flight_date)
  year_this_decade = flight_date.year/10*10 + year_digit
  specific_years = [year_this_decade-10,year_this_decade]
  likely_dates = find_matching_date.call(search_range, day_of_year, specific_years: specific_years)
  return likely_dates.last
}

We’ve also added an optional flight_date, which we don’t need here, but we’ll talk about soon.

So we need to see if a boarding pass issue date is available, split out the year digit and day of year of the boarding pass issue,  and feed that into the above lambda. Then, we take the returned boarding pass issue date, and search within a year from it to get a flight date:

# Find flight date based on boarding pass date:
conditional_start = @raw_data.index(">")
if (conditional_start && @raw_data[conditional_start+2,2].to_i(16)>=7 && @raw_data[conditional_start+7,4] =~ /^\d{4}$/)
  # Boarding pass issue date is present and valid
  bp_year_digit = @raw_data[conditional_start+7].to_i
  bp_day_of_year = @raw_data[conditional_start+8,3].to_i
  bp_date = estimate_issue_date.call(bp_year_digit, bp_day_of_year)
  if bp_date
    # Get first matching date within 1 year of boarding pass date
    # (if boarding pass date is year prior to leap year and has same
    # day of year as flight, two dates could potentially match)
   range = (bp_date...bp_date+1.year)
   matching_dates = find_matching_date.call(range, day_of_year).first(1)
   end
 end

This parser is part of a BoardingPass class, which has @raw_data defined as the raw boarding pass data string passed to it on initialization. Thus, our parser method can look at this string and attempt to extract a boarding pass issue year digit and day of year.

We do need to specifically take the first matching flight date, since we could end up with two dates if our boarding pass is issued in the year before a leap year. Let’s assume that we’re currently in the year 2017, and we get a boarding pass issue date of 5346 and flight date of 346. In this case, the boarding pass date would be the 346th day of a recent year ending in 5 – which is 12 December 2015. So we need to search for dates that are the 346th day of the year, greater than or equal to 12 December 2015 and less than 12 December 2016.  The 346th day of 2015 is 12 December 2015, which is a valid date… but because 2016 is a leap year, the 346th day of 2016 is 11 December 2016, which is also less than 12 December 2016. In this case, the earlier date is more likely to be correct, so we’ll always just take the first matching date.

Case: @flight Data Available

Because my boarding pass parser is part of Flight Historian, each Flight object has a field where I can store the raw boarding pass data. (Since boarding passes can contain ticket numbers and frequent flier numbers, the field isn’t shown to site visitors, and is only visible when I’m logged in.)  I want to use this parser on boarding passes stored in Flight objects, and if I have a Flight object (available to the method as @flight), I already know the departure date, and I can assume that it’s the most accurate representation of the flight year – there’s no estimation involved!

Assumption: The flight year in @flight.departure_date.year is accurate.
Assumption:
 The flight date is earlier than one year after today.
Assumption:
 The 366th day of a year occurs in the latest leap year less than or equal to this year.
Assumption:
The boarding pass date is not later than today.
Assumption:
The flight date does not occur before the boarding pass issue date.
Assumption: The flight date is less than one year after the boarding pass issue date.
Assumption: Of all valid boarding pass dates, the most recent is the correct date.
Assumption:
 
The flight date is not earlier than than two years before today.

Technically, we could just use the whole @flight.departure_date, but in case there’s an odd edge case where they’re difference, we’ll just use the known year and calculate the day.

# Find flight dates based on @flight.departure_date.year:
year = @flight.departure_date.year
range = (Date.new(year,1,1)..Date.new(year,12,31))
matching_dates = find_matching_date.call(range, day_of_year)

We can also use this to find the boarding pass year. Because we know the flight date, we will assume that the boarding pass date is not later than it, and is the latest matching date on or prior to it. (This could potentially lead to situations where the boarding pass issue date is more than a year prior to the boarding pass date, but assuming that the year of @flight is accurate is a higher priority assumption.)

The boarding pass issue date is therefore later than 10 years prior to @flight.departure_date, and no later than the flight date. This is then the exact same search we used in estimate_issue_date above, except we’re searching 10 years prior to @flight.departure_date, rather than 10 years prior to today – which is why we included the optional flight_date parameter in that lambda.

# Find boarding pass issue date based on @flight.departure_date:
matching_date = estimate_issue_date.call(year_digit, day_of_year, flight_date: @flight.departure_date)

Case: Invalid Dates

Finally, we can do a simple check: if the day of the year is greater than 366, we know it’s invalid and we can just return an invalid date error.

Assumption: The day of the year is not valid if it is larger than 366.
Assumption:
The flight year in @flight.departure_date.year is accurate.
Assumption:
 The flight date is earlier than one year after today.
Assumption:
 The 366th day of a year occurs in the latest leap year less than or equal to this year.
Assumption:
The boarding pass date is not later than today.
Assumption:
The flight date does not occur before the boarding pass issue date.
Assumption: The flight date is less than one year after the boarding pass issue date.
Assumption: Of all valid boarding pass dates, the most recent is the correct date.
Assumption:
 
The flight date is not earlier than than two years before today.

Since there are a few other situations where we may not have a valid date above, we can initialize a blank matching_dates array for the flight date, and if its length is 0 after we run through the above checks, we know we didn’t find a valid date and can return a string saying so.

error_text = "not a valid date"
matching_dates = Array.new

if day_of_year <= 366   # Estimate potential flight date(s) and store them in matching_dates end output = "#{day_of_year.ordinalize} day of the year " if matching_dates.length > 0
  output += "(#{matching_dates.join(', ')})"
else
  output += "(#{error_text})"
end
return output

Likewise, we can create a nil matching_date date for the boarding pass date, and if it never gets updated, we can return the error string.

error_text = "not a valid date"
matching_date = nil

if day_of_year <= 366
 # Estimate potential boarding pass issue date and store it in matching_date
end

output = "#{day_of_year.ordinalize} day of a year ending in #{year_digit} " 
output += matching_date ? "(#{matching_date})" : "(#{error_text})"
return output

Final List of Assumptions

If any assumptions conflict, the one that is earlier in the below list takes precedence.

  • The day of the year is not valid if it is larger than 366.
  • The flight year in @flight.departure_date.year is accurate.
  • The flight date is earlier than one year after today.
  • The 366th day of a year occurs in the latest leap year less than or equal to this year.
  • The boarding pass date is not later than today.
  • The flight date does not occur before the boarding pass issue date.
  • The flight date is less than one year after the boarding pass issue date.
  • Of all valid boarding pass dates, the most recent is the correct date.
  • The flight date is not earlier than than two years before today.

Flowchart

In order to plan for combining all of the above strategies and assumptions into one method, we will create a flowchart for this method:

parse-ordinal-date

We first check whether the input is a 3 numeric characters (flight date), 4 numeric characters (boarding pass issue date), or neither (invalid). After that, we run checks based on our assumptions to come up with candidate date(s).

Final Code

def interpret_ordinal_date(raw)
  return nil unless raw.present?
  
  error_text = "not a valid date"
  # Return an array of all dates in search_range that match day_of_year.
  # If specific_years are given, only return results in those years.
  find_matching_date = lambda { |search_range, day_of_year, specific_years: nil|
    likely_dates = Array.new
    specific_years ||= (search_range.begin.year..search_range.end.year)
    specific_years.each do |y|
      begin
        this_date = Date.ordinal(y, day_of_year)
        if search_range.cover?(this_date)
          likely_dates.push(this_date)
        end
      rescue
      end
    end
    return likely_dates
  }
  # Returns the most recent date matching day_of_year in a year ending in
  # year_digit. If flight_date is provided, search relative to that instead
  # of relative to today.
  estimate_issue_date = lambda { |year_digit, day_of_year, flight_date: nil|
    flight_date    ||= Date.today # if flight_date not set, set it to today
    search_range     = (flight_date-10.years+1.day..flight_date)
    year_this_decade = flight_date.year/10*10 + year_digit
    specific_years   = [year_this_decade-10,year_this_decade]
    likely_dates = find_matching_date.call(search_range, day_of_year, specific_years: specific_years)
    return likely_dates.last
  }
  
  if raw =~ /^\d{3}$/
    # Raw data format is 3 digits
    # This is a flight date
    day_of_year    = raw.to_i
    matching_dates = Array.new
    
    if day_of_year <= 366       # day_of_year is valid       if @flight         # @flight data is available         year           = @flight.departure_date.year         range          = (Date.new(year,1,1)..Date.new(year,12,31))         matching_dates = find_matching_date.call(range, day_of_year)       else         # @flight data is not available         conditional_start = @raw_data.index(">")
        if (conditional_start && @raw_data[conditional_start+2,2].to_i(16)>=7 && @raw_data[conditional_start+7,4] =~ /^\d{4}$/)
          # Boarding pass issue date is present and valid
          bp_year_digit  = @raw_data[conditional_start+7].to_i
          bp_day_of_year = @raw_data[conditional_start+8,3].to_i
          bp_date        = estimate_issue_date.call(bp_year_digit, bp_day_of_year)
          if bp_date
            # Get first matching date within 1 year of boarding pass date
            # (if boarding pass date is year prior to leap year and has same
            # day of year as flight, two dates could potentially match)
            range          = (bp_date...bp_date+1.year)
            matching_dates = find_matching_date.call(range, day_of_year).first(1)
          end
        else
          # Boarding pass issue date is not available
          if day_of_year < 366             # Find likely matching dates between 2 years ago and 1 year from now             range          = (Date.today-2.years...Date.today+1.year)             matching_dates = find_matching_date.call(range, day_of_year)           else             # Find most recent matching date in a leap year.             # Largest gap between leap years is 8 years (centuries not             # divisible by 400 are not leap years), so search between 31             # Dec 7 years ago and 31 Dec 0 years ago (this year), and take             # the most recent result.             range = (Date.new(Date.today.year-7,12,31)..Date.new(Date.today.year,12,31))             matching_dates = find_matching_date.call(range, day_of_year).last(1)           end         end       end     end     output = "#{day_of_year.ordinalize} day of the year "     if matching_dates.length > 0
      output += "(#{matching_dates.join(', ')})"
    else
      output += "(#{error_text})"
    end

    return output
    
  elsif raw =~ /^\d{4}$/
    # Raw data format is 4 digits
    # This is a boarding pass issue date
    year_digit    = raw[0].to_i
    day_of_year   = raw[1..3].to_i
    matching_date = nil
    
    if day_of_year <= 366
      # day_of_year is valid
      if @flight
        # @flight data is available
        matching_date = estimate_issue_date.call(year_digit, day_of_year, flight_date: @flight.departure_date)
      else
        # @flight data is not available
        matching_date = estimate_issue_date.call(year_digit, day_of_year)
      end
    end
    
    output = "#{day_of_year.ordinalize} day of a year ending in #{year_digit} " 
    output += matching_date ? "(#{matching_date.standard_date})" : "(#{error_text})"
    return output
    
  else
    # Raw data format is not 3 or 4 digits
    return nil
  end
end

Test Cases

require 'test_helper'

class BoardingPassTest < ActiveSupport::TestCase
  
  def setup
    # Make all tests assume that it is 10 Jan 2017 at noon local:
    travel_to Time.new(2017, 1, 10, 12)
    
    trip = Trip.new( name: "Public Trip",
                           hidden:  false,
                           purpose: "personal")
    
    @flight = trip.flights.new(origin_airport_id:      1,
                               destination_airport_id: 2,
                               trip_section:           1,
                               departure_date:         "2014-12-12",
                               departure_utc:          "2014-12-12 11:00",
                               airline_id:             1)
  end
  
  def get_field_value(boarding_pass, description)
    interpreted = nil
    boarding_pass.raw_with_metadata.each do |field|
      if field[:description] == description
        interpreted = field[:interpreted]
        break
      end
    end
    return interpreted
  end
  
  class OrdinalTest < BoardingPassTest          test "ordinal flight with blank flight date" do       pass = BoardingPass.new("M1DOE/JOHN            EABC123 DAYCLTAA 5163    Y015D0027 148>218 MM    BAA              29001001123456732AA AA XXXXXXX             X")
      interpreted = get_field_value(pass, "[Leg 1] Date of Flight (Julian Date)")
      assert_nil interpreted
    end
    
    test "ordinal flight with non-numeric flight date" do
      pass = BoardingPass.new("M1DOE/JOHN            EABC123 DAYCLTAA 5163 ZZZY015D0027 148>218 MM    BAA              29001001123456732AA AA XXXXXXX             X")
      interpreted = get_field_value(pass, "[Leg 1] Date of Flight (Julian Date)")
      assert_nil interpreted
    end
    
    test "ordinal flight date without issue date" do
      pass = BoardingPass.new("M1DOE/JOHN            EABC123 DAYCLTAA 5163 346Y015D0027 148>218 MM    BAA              29001001123456732AA AA XXXXXXX             X")
      interpreted = get_field_value(pass, "[Leg 1] Date of Flight (Julian Date)")
      # Should return this ordinal date for last year, this year, and next year:
      assert_equal("346th day of the year (12 Dec 2015, 11 Dec 2016, 12 Dec 2017)", interpreted)
    end
  
    test "ordinal flight date of 366 without issue date" do
      pass = BoardingPass.new("M1DOE/JOHN            EABC123 DAYCLTAA 5163 366Y015D0027 148>218 MM    BAA              29001001123456732AA AA XXXXXXX             X")
      interpreted = get_field_value(pass, "[Leg 1] Date of Flight (Julian Date)")
      # Should return most recent valid ordinal date on or prior to today:
      assert_equal("366th day of the year (31 Dec 2016)", interpreted)
    end
  
    test "ordinal flight date with issue date" do
      # Deliberately set the pass date in a year before a leap year, as this
      # could cause two results in pass_date...pass_date+1.year, and we only
      # want one
      pass = BoardingPass.new("M1DOE/JOHN            EABC123 DAYCLTAA 5163 346Y015D0027 148>218 MM5346BAA              29001001123456732AA AA XXXXXXX             X")
      interpreted = get_field_value(pass, "[Leg 1] Date of Flight (Julian Date)")
      # Should return first valid ordinal date on or after the boarding pass issue date:
      assert_equal("346th day of the year (12 Dec 2015)", interpreted)
    end
    
    test "ordinal flight date with @flight available" do
      pass = BoardingPass.new("M1DOE/JOHN            EABC123 DAYCLTAA 5163 346Y015D0027 148>218 MM5346BAA              29001001123456732AA AA XXXXXXX             X", flight: @flight)
      interpreted = get_field_value(pass, "[Leg 1] Date of Flight (Julian Date)")
      # Test should prioritize @flight over boarding pass date, so we should
      # get 2014 even though the boarding pass issue date is 2015.
      # Should return ordinal date in the year of @flight's departure date:
      assert_equal("346th day of the year (12 Dec 2014)", interpreted)
    end
    
    test "ordinal flight date greater than 366" do
      pass = BoardingPass.new("M1DOE/JOHN            EABC123 DAYCLTAA 5163 367Y015D0027 148>218 MM    BAA              29001001123456732AA AA XXXXXXX             X")
      interpreted = get_field_value(pass, "[Leg 1] Date of Flight (Julian Date)")
      # Should return nil
      assert_equal("367th day of the year (not a valid date)", interpreted)
    end
    
    test "ordinal pass date with blank pass date" do
      pass = BoardingPass.new("M1DOE/JOHN            EABC123 DAYCLTAA 5163 346Y015D0027 148>218 MM    BAA              29001001123456732AA AA XXXXXXX             X")
      interpreted = get_field_value(pass, "Date of Issue of Boarding Pass (Julian Date)")
      assert_nil interpreted
    end
    
    test "ordinal pass date with non-numeric pass date" do
      pass = BoardingPass.new("M1DOE/JOHN            EABC123 DAYCLTAA 5163 346Y015D0027 148>218 MMZZZZBAA              29001001123456732AA AA XXXXXXX             X")
      interpreted = get_field_value(pass, "Date of Issue of Boarding Pass (Julian Date)")
      assert_nil interpreted
    end
    
    test "ordinal pass date with @flight available" do
      pass = BoardingPass.new("M1DOE/JOHN            EABC123 DAYCLTAA 5163 346Y015D0027 148>218 MM5345BAA              29001001123456732AA AA XXXXXXX             X", flight: @flight)
      interpreted = get_field_value(pass, "Date of Issue of Boarding Pass (Julian Date)")
      # Should return most recent matching date on or before @flight's departure date:
      assert_equal("345th day of a year ending in 5 (11 Dec 2005)", interpreted)
    end
    
    test "ordinal pass date without @flight available" do
      pass = BoardingPass.new("M1DOE/JOHN            EABC123 DAYCLTAA 5163 346Y015D0027 148>218 MM5345BAA              29001001123456732AA AA XXXXXXX             X")
      interpreted = get_field_value(pass, "Date of Issue of Boarding Pass (Julian Date)")
      # Issue date should not be in the future
      # Should return most recent matching date on or before today:
      assert_equal("345th day of a year ending in 5 (11 Dec 2015)", interpreted)
    end
    
    test "ordinal pass date greater than 366" do
      pass = BoardingPass.new("M1DOE/JOHN            EABC123 DAYCLTAA 5163 346Y015D0027 148>218 MM5367BAA              29001001123456732AA AA XXXXXXX             X")
      interpreted = get_field_value(pass, "Date of Issue of Boarding Pass (Julian Date)")
      # Issue date should not be in the future
      # Should return most recent matching date on or before today:
      assert_equal("367th day of a year ending in 5 (not a valid date)", interpreted)
    end
    
  end
    
end

Conclusion

Obviously, these are all still estimates; without a true year encoded, it’s impossible to know the encoded date with 100% certainty. But for the majority of boarding passes, this should provide a reasonable guess for the flight and boarding pass dates.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s