The start of a new year is the start of everyone’s favorite season: tax season!. Every year I use TurboTax to file my taxes and the online version offers two downloads to archive a copy of the return:
A PDF of all of the forms.
A .tax file that can be later imported into TurboTax
The .tax file is an undocumented format and can only be imported by TurboTax. It appears to contain the tax return data in a structured format. This would be useful if you want to read select fields of your tax return programmatically.
I prodded at the .tax file and realized it’s a zip file. The unzip command can list the contents.
However the manifest.xml file appears to not be a valid xml file.
After some prodding of the TurboTax desktop application it appears to be a AES CBC mode encrypted file. Using the powers of deduction I have determined the keys to encrypt the manifest.xml are:
For 2017 tax year files the key is !QAZ2WSX#EDC4RFV with an IV of 5TGB@YHN7UJM(IK(
For 2019 tax year files the key is 7HMT&BGM5KBNFH>< with an IV of #YBU7JLZ*JGL7MAR
Using the pycryptodome library it’s pretty easy to get the contents of the manifest.xml file.
Running the above results in
The manifest contains a list of all returns in the file and some metadata about each return. Unfortunately the tax return file is encrypted as well with a different key. Using powers of deduction the keys to encrypt the tax: file are:
For 2017 tax year files the key is 4TGB@YHN7UJM(IK( with an IV of !ASZ2WSX#EDC4RFV
For 2019 tax year files the key is 8NV^RASJVG*(XSCB with an IV of #BMBVVBD$FSZ6LSZ
Adjusting the above script to use the above keys on a tax: file results in an XML representation of the tax return.
Unfortunately the schema is not documented but it should be possible to read certain fields out of the tax return with any standard XML parser.