Form Recognizer W-2 form model
This article applies to: Form Recognizer v3.0.
The Form Recognizer W-2 model, combines Optical Character Recognition (OCR) with deep learning models to analyze and extract information reported on US Internal Revenue Service (IRS) tax forms. A W-2 tax form is a multipart form divided into state and federal sections consisting of more than 14 boxes detailing an employee's income from the previous year. The W-2 tax form is a key document used in employees' federal and state tax filings, as well as other processes like mortgage loans and Social Security Administration (SSA) benefits. The Form Recognizer W-2 model supports both single and multiple standard and customized forms from 2018 to the present.
Automated W-2 form processing
An employer sends form W-2, also known as the Wage and Tax Statement, to each employee and the Internal Revenue Service (IRS) at the end of the year. A W-2 form reports employees' annual wages and the amount of taxes withheld from their paychecks. The IRS also uses W-2 forms to track individuals' tax obligations. The Social Security Administration (SSA) uses the information on this and other forms to compute the Social Security benefits for all workers.
Sample W-2 tax form processed using Form Recognizer Studio
Development options
Form Recognizer v3.0 supports the following tools:
Feature | Resources | Model ID |
---|---|---|
W-2 model | prebuilt-tax.us.w2 |
Try W-2 data extraction
Try extracting data from W-2 forms using the Form Recognizer Studio. You need the following resources:
An Azure subscription—you can create one for free
A Form Recognizer instance in the Azure portal. You can use the free pricing tier (
F0
) to try the service. After your resource deploys, select Go to resource to get your key and endpoint.
Form Recognizer Studio
Note
Form Recognizer studio is available with v3.0 API.
On the Form Recognizer Studio home page, select W-2.
You can analyze the sample W-2 document or select the ➕ Add button to upload your own sample.
Select the Analyze button:
Input requirements
For best results, provide one clear photo or high-quality scan per document.
Supported file formats:
Model PDF Image:
JPEG/JPG, PNG, BMP, and TIFFMicrosoft Office:
Word (DOCX), Excel (XLS), PowerPoint (PPT), and HTMLRead ✔ ✔ ✱ REST API version
2022/06/30-preview
Layout ✔ ✔ General Document ✔ ✔ Prebuilt ✔ ✔ Custom ✔ ✔ ✱ Microsoft Office files are currently not supported for other models or versions.
For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).
The file size for analyzing documents must be less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
Image dimensions must be between 50 x 50 pixels and 10,000 px x 10,000 pixels.
PDF dimensions are up to 17 x 17 inches, corresponding to Legal or A3 paper size, or smaller.
If your PDFs are password-locked, you must remove the lock before submission.
The minimum height of the text to be extracted is 12 pixels for a 1024 x 768 pixel image. This dimension corresponds to about
8
-point text at 150 dots per inch (DPI).For custom model training, the maximum number of pages for training data is 500 for the custom template model and 50,000 for the custom neural model.
For custom extraction model training, the total size of training data is 50 MB for template model and 1G-MB for the neural model.
For custom classification model training, the total size of training data is
1GB
with a maximum of 10,000 pages.
Supported languages and locales
Model | Language—Locale code | Default |
---|---|---|
prebuilt-tax.us.w2 |
|
English (United States)—en-US |
Field extraction
Name | Box | Type | Description | Standardized output |
---|---|---|---|---|
Employee.SocialSecurityNumber | a | String | Employee's Social Security Number (SSN). | 123-45-6789 |
Employer.IdNumber | b | String | Employer's ID number (EIN), the business equivalent of a social security number | 12-1234567 |
Employer.Name | c | String | Employer's name | Contoso |
Employer.Address | c | String | Employer's address (with city) | 123 Example Street Sample City, CA |
Employer.ZipCode | c | String | Employer's zip code | 12345 |
ControlNumber | d | String | A code identifying the unique W-2 in the records of employer | R3D1 |
Employee.Name | e | String | Full name of the employee | Henry Ross |
Employee.Address | f | String | Employee's address (with city) | 123 Example Street Sample City, CA |
Employee.ZipCode | f | String | Employee's zip code | 12345 |
WagesTipsAndOtherCompensation | 1 | Number | A summary of your pay, including wages, tips and other compensation | 50000 |
FederalIncomeTaxWithheld | 2 | Number | Federal income tax withheld | 1111 |
SocialSecurityWages | 3 | Number | Social security wages | 35000 |
SocialSecurityTaxWithheld | 4 | Number | Social security tax with held | 1111 |
MedicareWagesAndTips | 5 | Number | Medicare wages and tips | 45000 |
MedicareTaxWithheld | 6 | Number | Medicare tax with held | 1111 |
SocialSecurityTips | 7 | Number | Social security tips | 1111 |
AllocatedTips | 8 | Number | Allocated tips | 1111 |
VerificationCode | 9 | String | Verification Code on Form W-2 | A123-B456-C789-DXYZ |
DependentCareBenefits | 10 | Number | Dependent care benefits | 1111 |
NonqualifiedPlans | 11 | Number | The non-qualified plan, a type of retirement savings plan that is employer-sponsored and tax-deferred | 1111 |
AdditionalInfo | Array of objects | An array of LetterCode and Amount | ||
LetterCode | 12a, 12b, 12c, 12d | String | Letter code Refer to IRS/W-2 for the semantics of the code values. | D |
Amount | 12a, 12b, 12c, 12d | Number | Amount | 1234 |
IsStatutoryEmployee | 13 | String | Whether the RetirementPlan box is checked or not | true |
IsRetirementPlan | 13 | String | Whether the RetirementPlan box is checked or not | true |
IsThirdPartySickPay | 13 | String | Whether the ThirdPartySickPay box is checked or not | false |
Other | 14 | String | Other info employers may use this field to report | |
StateTaxInfos | Array of objects | An array of state tax info including State, EmployerStateIdNumber, StateIncomeTax, StageWagesTipsEtc | ||
State | 15 | String | State | CA |
EmployerStateIdNumber | 15 | String | Employer state number | 123-123-1234 |
StateWagesTipsEtc | 16 | Number | State wages, tips, etc. | 50000 |
StateIncomeTax | 17 | Number | State income tax | 1535 |
LocalTaxInfos | Array of objects | An array of local income tax info including LocalWagesTipsEtc, LocalIncomeTax, LocalityName | ||
LocalWagesTipsEtc | 18 | Number | Local wages, tips, etc. | 50000 |
LocalIncomeTax | 19 | Number | Local income tax | 750 |
LocalityName | 20 | Number | Locality name. | CLEVELAND |
W2Copy | String | Copy of W-2 forms A, B, C, D, 1, or 2 | Copy A For Social Security Administration | |
TaxYear | Number | Tax year | 2020 | |
W2FormVariant | String | The variants of W-2 forms, including "W-2", "W-2AS", "W-2CM", "W-2GU", "W-2VI" | W-2 |
Migration guide and REST API v3.0
Follow our Form Recognizer v3.0 migration guide to learn how to use the v3.0 version in your applications and workflows.
Explore our REST API to learn more about the v3.0 version and new capabilities.
Next steps
Try processing your own forms and documents with the Form Recognizer Studio
Complete a Form Recognizer quickstart and get started creating a document processing app in the development language of your choice.
Feedback
Submit and view feedback for