Form Recognizer W-2 form model

This article applies to: Form Recognizer v3.0 checkmark Form Recognizer v3.0.

The Form Recognizer W-2 model, combines Optical Character Recognition (OCR) with deep learning models to analyze and extract information reported on US Internal Revenue Service (IRS) tax forms. A W-2 tax form is a multipart form divided into state and federal sections consisting of more than 14 boxes detailing an employee's income from the previous year. The W-2 tax form is a key document used in employees' federal and state tax filings, as well as other processes like mortgage loans and Social Security Administration (SSA) benefits. The Form Recognizer W-2 model supports both single and multiple standard and customized forms from 2018 to the present.

Automated W-2 form processing

An employer sends form W-2, also known as the Wage and Tax Statement, to each employee and the Internal Revenue Service (IRS) at the end of the year. A W-2 form reports employees' annual wages and the amount of taxes withheld from their paychecks. The IRS also uses W-2 forms to track individuals' tax obligations. The Social Security Administration (SSA) uses the information on this and other forms to compute the Social Security benefits for all workers.

Sample W-2 tax form processed using Form Recognizer Studio

Screenshot of sample W-2 processed in the Form Recognizer Studio.

Development options

Form Recognizer v3.0 supports the following tools:

Feature Resources Model ID
W-2 model prebuilt-tax.us.w2

Try W-2 data extraction

Try extracting data from W-2 forms using the Form Recognizer Studio. You need the following resources:

  • An Azure subscription—you can create one for free

  • A Form Recognizer instance in the Azure portal. You can use the free pricing tier (F0) to try the service. After your resource deploys, select Go to resource to get your key and endpoint.

Screenshot of keys and endpoint location in the Azure portal.

Form Recognizer Studio

Note

Form Recognizer studio is available with v3.0 API.

  1. On the Form Recognizer Studio home page, select W-2.

  2. You can analyze the sample W-2 document or select the ➕ Add button to upload your own sample.

  3. Select the Analyze button:

    Screenshot: analyze W-2 window in the Form Recognizer Studio.

Input requirements

  • For best results, provide one clear photo or high-quality scan per document.

  • Supported file formats:

    Model PDF Image:
    JPEG/JPG, PNG, BMP, and TIFF
    Microsoft Office:
    Word (DOCX), Excel (XLS), PowerPoint (PPT), and HTML
    Read REST API version
    2022/06/30-preview
    Layout
    General Document
    Prebuilt
    Custom

    ✱ Microsoft Office files are currently not supported for other models or versions.

  • For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).

  • The file size for analyzing documents must be less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier.

  • Image dimensions must be between 50 x 50 pixels and 10,000 px x 10,000 pixels.

  • PDF dimensions are up to 17 x 17 inches, corresponding to Legal or A3 paper size, or smaller.

  • If your PDFs are password-locked, you must remove the lock before submission.

  • The minimum height of the text to be extracted is 12 pixels for a 1024 x 768 pixel image. This dimension corresponds to about 8-point text at 150 dots per inch (DPI).

  • For custom model training, the maximum number of pages for training data is 500 for the custom template model and 50,000 for the custom neural model.

  • For custom extraction model training, the total size of training data is 50 MB for template model and 1G-MB for the neural model.

  • For custom classification model training, the total size of training data is 1GB with a maximum of 10,000 pages.

Supported languages and locales

Model Language—Locale code Default
prebuilt-tax.us.w2
  • English (United States)
English (United States)—en-US

Field extraction

Name Box Type Description Standardized output
Employee.SocialSecurityNumber a String Employee's Social Security Number (SSN). 123-45-6789
Employer.IdNumber b String Employer's ID number (EIN), the business equivalent of a social security number 12-1234567
Employer.Name c String Employer's name Contoso
Employer.Address c String Employer's address (with city) 123 Example Street Sample City, CA
Employer.ZipCode c String Employer's zip code 12345
ControlNumber d String A code identifying the unique W-2 in the records of employer R3D1
Employee.Name e String Full name of the employee Henry Ross
Employee.Address f String Employee's address (with city) 123 Example Street Sample City, CA
Employee.ZipCode f String Employee's zip code 12345
WagesTipsAndOtherCompensation 1 Number A summary of your pay, including wages, tips and other compensation 50000
FederalIncomeTaxWithheld 2 Number Federal income tax withheld 1111
SocialSecurityWages 3 Number Social security wages 35000
SocialSecurityTaxWithheld 4 Number Social security tax with held 1111
MedicareWagesAndTips 5 Number Medicare wages and tips 45000
MedicareTaxWithheld 6 Number Medicare tax with held 1111
SocialSecurityTips 7 Number Social security tips 1111
AllocatedTips 8 Number Allocated tips 1111
Verification​Code 9 String Verification Code on Form W-2 A123-B456-C789-DXYZ
DependentCareBenefits 10 Number Dependent care benefits 1111
NonqualifiedPlans 11 Number The non-qualified plan, a type of retirement savings plan that is employer-sponsored and tax-deferred 1111
AdditionalInfo Array of objects An array of LetterCode and Amount
LetterCode 12a, 12b, 12c, 12d String Letter code Refer to IRS/W-2 for the semantics of the code values. D
Amount 12a, 12b, 12c, 12d Number Amount 1234
IsStatutoryEmployee 13 String Whether the RetirementPlan box is checked or not true
IsRetirementPlan 13 String Whether the RetirementPlan box is checked or not true
IsThirdPartySickPay 13 String Whether the ThirdPartySickPay box is checked or not false
Other 14 String Other info employers may use this field to report
StateTaxInfos Array of objects An array of state tax info including State, EmployerStateIdNumber, StateIncomeTax, StageWagesTipsEtc
State 15 String State CA
EmployerStateIdNumber 15 String Employer state number 123-123-1234
StateWagesTipsEtc 16 Number State wages, tips, etc. 50000
StateIncomeTax 17 Number State income tax 1535
LocalTaxInfos Array of objects An array of local income tax info including LocalWagesTipsEtc, LocalIncomeTax, LocalityName
LocalWagesTipsEtc 18 Number Local wages, tips, etc. 50000
LocalIncomeTax 19 Number Local income tax 750
LocalityName 20 Number Locality name. CLEVELAND
W2Copy String Copy of W-2 forms A, B, C, D, 1, or 2 Copy A For Social Security Administration
TaxYear Number Tax year 2020
W2FormVariant String The variants of W-2 forms, including "W-2", "W-2AS", "W-2CM", "W-2GU", "W-2VI" W-2

Migration guide and REST API v3.0

Next steps