# Creating a Regular Expression for US Tail Numbers

One of the minor features I’ve added to the flight log is country flags for tail numbers. Every aircraft is registered to one country, and each country has its own assigned format for tail numbers, so it’s possible to look at each tail number and determine what country it’s from.

Since this operation is matching a string to a pattern, it made sense to create regular expressions for each country. For most countries, whose tail number is a unique prefix followed by a dash and three or four letters, this was easy to do. But the United States rules for valid tail numbers are substantially more complicated.

## Valid US Tail Numbers

US tail number validity is defined by the Federal Aviation Administration (FAA):

What’s a Valid N-Number?

U.S. aircraft registration numbers (i.e. N-Numbers) consist of a series of numbers and letters.

N-Numbers may not exceed five (5) characters in addition to the standard U.S. registration prefix letter “N” (i.e. the prefix of “N” is not considered in the count).

These characters:

may be one (1) to five (5) numbers (e.g. N12345);
• may be one (1) to four (4) numbers and one (1) suffix letter (examples: N1A and N1234Z);
• may be one (1) to three (3) numbers and two (2) suffix letters (examples: N24BY and N123AZ).
• may not be the letters “I” or “O” to avoid confusion with the numbers one (1) or zero (0).

An N-Number may not begin with zero (0).

The first zero in a number must be preceded by at least one of the numbers one (1) through nine (9) (example: N01Z is not valid).

## Building the Regular Expression

Starting the expression is easy; we know that it must start with N.

`^N`

Following the N, though, we really have three possibilities, as defined in the FAA list:

• One to five digits
• One to four digits followed by one letter
• One to three digits followed by two letters

This means that after the N, there are three valid strings:

```\d{1,5}
\d{1,4}[A-Z]
\d{1,3}[A-Z]{2}```

However, we’re not done yet. We know that the first digit may not be zero, so we need to modify our expressions:

```[1-9]\d{0,4}
[1-9]\d{0,3}[A-Z]
[1-9]\d{0,2}[A-Z]{2}```

Note that because we specified the first digit, we had to decrease the counts of all the remaining digits by one. The FAA also indicated that where letters are used, “I” and “O” are not valid letters. So we need to modify our letter ranges as follows:

```[1-9]\d{0,4}
[1-9]\d{0,3}[A-HJ-NP-Z]
[1-9]\d{0,2}[A-HJ-NP-Z]{2}```

We’re looking good, but notice that all three of the possibilities start with [1-9]. Every valid US tail number thus starts with an N followed by a digit between 1 and 9, so we should include that 1-9 range up front with the N:

`^N[1-9]`
```\d{0,4}
\d{0,3}[A-HJ-NP-Z]
\d{0,2}[A-HJ-NP-Z]{2}```

Now we join the possibilities with or pipes (|) and parentheses, and cap it off with a dollar sign to indicate the end of the string:

`^N[1-9]((\d{0,4})|(\d{0,3}[A-HJ-NP-Z])|(\d{0,2}[A-HJ-NP-Z]{2}))\$`

And finally, add the leading and trailing slashes to indicate a regular expression:

`/^N[1-9]((\d{0,4})|(\d{0,3}[A-HJ-NP-Z])|(\d{0,2}[A-HJ-NP-Z]{2}))\$/`

We’re done! I used Rubular to test the regular expression with various valid and invalid US tail numbers, and it behaves as expected.

## Notes

This example is assuming the tail number string is using all uppercase letters. If you wish to consider tail numbers with lowercase letters valid, you’ll need to include the /i case insensitive option at the end of the regular expression.