Yes, you can convert the table extracted by Azure Forms Recognizer into a pandas DataFrame using Python. Here's a basic approach:
Extract the Data: Assuming you have the output dictionary from Forms Recognizer that includes the row and column indices for each cell.
Initialize an Empty DataFrame: Create an empty DataFrame that you'll populate with the data.
- Populate the DataFrame: Iterate through the dictionary, using the row and column indices to place each cell's content into the correct location in the DataFrame.
import pandas as pd
#Assuming 'data' is the dictionary containing the extracted table information
data = {
Copy
'tables': [
{
'rows': [
{'cells': [{'row_index': 0, 'col_index': 0, 'text': 'Name'},
{'row_index': 0, 'col_index': 1, 'text': 'Age'}]},
{'cells': [{'row_index': 1, 'col_index': 0, 'text': 'Alice'},
{'row_index': 1, 'col_index': 1, 'text': '24'}]}
]
}
]
}
Create an empty DataFrame
df = pd.DataFrame()
Populate the DataFrame
for table in data['tables']:
Copy
for row in table['rows']:
for cell in row['cells']:
row_index = cell['row_index']
col_index = cell['col_index']
text = cell['text']
df.at[row_index, col_index] = text
Adjust column names if necessary
df.columns = df.iloc[0] # Set the first row as the column header
df = df[1:] # Remove the first row as it's now the header
print(df)