I am an Apache Tika committer (https://github.com/apache/tika)
and I am working on OneNote 365 parsing. In particular, .one files.
I have already written parsers for 2013 and 2016 versions of OneNote.
I am trying to see if my existing code works for parsing OneNote 365 (one note files generated from SharePoint Online). But it does not at this time.
I opened up the MS-ONE and MS-ONESTORE specifications and am trying to figure out if I can get this to work.
As far as I can tell, the specification does not apply to the OneNote 365 version documents.
I am looking at the header, and comparing with the MS-ONESTORE specification.
Here are the first 3
guidFileType - 16 bytes
{7B5C52E4-D88C-4DA7-AEB1-5378D02996D3}
guidFile - 16 bytes
{D276BC67-FD7F-4D5C-65CB-9DDBA52A90D1}
guidLegacyFileVersion - 16 bytes
{D276BC67-FD7F-4D5C-65CB-9DDBA52A90D1}
The guidFileType is correct. The spec says this GUID matches the .one file which is correct.
The guidFile is supposed to be a random GUID. No way to know if this is correct or not. so we move on.
But then there is guidLegacyFileVersion.
The spec says: MUST be "{00000000-0000-0000-0000-000000000000}" and
MUST be ignored.
But in the OneNote 365 version has the exact same GUID in that position as the guidFile field.
Is there a specification specifically for this version of OneNote?