Tutorial: Generate and analyze version usage report for SharePoint site
By understanding version storage on a site, you can better optimize the Version history settings to meet your organization’s recovery objectives and manage storage costs.
This tutorial shows how you can generate a version storage usage report and analyze it to better understand the version storage footprint of the site. The report can also be used to run ‘what-if’ analysis of applying different version limits or trimming existing versions.
In this tutorial we cover how to:
- Generate version storage usage report file for site or library.
- Check progress of report generation.
- Understand the report file.
- Analyze version storage use using Excel or PowerShell.
In later tutorials, review how you can run impact analysis on the generated CSV report.
Before you begin
- Identify the SharePoint site, OneDrive account, or document library whose version storage usage you want to understand.
- Choose a location within the SharePoint document library that you want to save the report to.
- Download the latest SharePoint Online Management Shell.
Note
- The report file is generated within the report location specified.
- The report location must be within a SharePoint document library itself.
- There cannot be a file with the same name as the report in the document library.
Generate version usage report for sites or library
You can generate a report on the current version storage use on a site by running the New-SPOSiteFileVersionExpirationReportJob
command or on a library by running the New-SPOListFileVersionBatchDeleteJob
command.
In the following example, a job is queued to generate a site-scoped report at the report location, https://contoso.sharepoint.com/sites/sites1/reports/MyReports/VersionReport.csv
.
New-SPOSiteFileVersionExpirationReportJob -Identity https://contoso.sharepoint.com/sites/site1 -ReportUrl "https://contoso.sharepoint.com/sites/sites1/reports/MyReports/VersionReport.csv"
In the following example, a job is queued to generate a library-scoped report at the report location, https://contoso.sharepoint.com/sites/sites1/reports/MyReports/VersionReport.csv
.
New-SPOListFileVersionExpirationReportJob -Site https://contoso.sharepoint.com/sites/site1 -List "Documents" -ReportUrl "https://contoso.sharepoint.com/sites/sites1/reports/MyReports/VersionReport.csv"
Check progress on the report generation
Use the Get-SPOListFileVersionExpirationReportJobProgress
command to track the progress of report generation request.
The example below shows how you can check if your site scoped report is fully populated and ready to be analyzed.
Get-SPOSiteFileVersionExpirationReportJobProgress -Identity https://contoso.sharepoint.com/sites/site1 -ReportUrl "https://contoso.sharepoint.com/sites/sites1/reports/MyReports/VersionReport.csv"
The example below shows how you can check if your library scoped report is fully populated and ready to be analyzed.
Get-SPOListFileVersionExpirationReportJobProgress -Site https://contoso.sharepoint.com/sites/site1 -List "Documents" -ReportUrl "https://contoso.sharepoint.com/sites/sites1/reports/MyReports/VersionReport.csv"
The cmdlet will return a response in JSON format. The returned json response has a key named status. Refer to the table below for one of the following expected values:
JSON Status Response | Explanation |
---|---|
“completed” | The job has successfully completed and the report is fully populated. |
“in_progress” | There is an active job. The report is partially populated. |
“no_report_found” | There are no active jobs populating the specified file. |
“failed” | The job to generate the report has failed due to the error message. Check “error_message” for the error message from the failure. |
Understand version report file
The generated report is in CSV format with each row corresponds to a file version. Here’s an example of file version expiration report and its column breakdown.
The first row is the header with the column identifiers containing File Version Identifiers, Version Metadata information, and expiration timestamp. Compact columns are denoted with .Compact
post-fix that won’t repeat values if two consecutive rows have the same value. The other rows represent file versions, where each row represents a single version.
Let’s go through the first file version displayed in this report.
File version identifiers:
WebId
,DocId
,MajorVersion
, andMinorVersion
uniquely identify each version in your SharePoint site.Version metadata identifiers:
WebUrl
indicates the version inhttps://contoso.sharepoint.com
, andFileUrl
indicates that the file for this version is located at DocLib/MyDocument.docx. In other words, it is in a Document Library calledDocLib
, while the file is in the root folder ofDocLib
and is named MyDocument.docx.Size
indicates that the version takes 92,246 bytes of storage.The next two columns,
ModifiedBy_UserId
andModifiedBy_DisplayName
indicate that the user Michelle Harris (with user ID 6) has created this version.LastModifiedDate
indicates that the version’s content was last modified on March 13, 2023, at 22:36:09 UTC.SnapshotDate
displays that the version became a historical version on March 20, 2023, at 16:56:51 UTC.IsSnapshotDateEstimated
shows thatSnapshotDate
is the actual snapshot date.Expiration schedule identifiers:
CurrentExpirationDate
indicates that this version is currently set to never expire.AutomaticPolicyExpirationDate
shows that under the automatically expire policy, this version is also set to never expire.TargetExpirationDate
indicates that if we follow this schedule for trimming, we would set this version to never expire.
Note
File versions stored in the preservation hold library will be included in this report as well.
Let’s look at the third version.
The WebId
and DocId
values are empty because these columns are compact columns, denoted by .Compact post-fix, it means they should have values. If we look for the last nonempty above that row, we find WebId
as 4c7a58c1-01f2-4fa3-a730-44081a44f689
, and DocId
as 18c3e09c-b5be-48e7-a754-7a2ce53e0999
.
We can also see that the TargetExpirationDate
is set for April 19, 2023, at 18:08:53 UTC. It means if we trim based on this schedule, we would be setting the expiration date for this version to that time.
Note
All date times are represented in the round-trip format. For more information, see Standard date and time format strings - .NET | Microsoft Learn
Analyze version storage for sites
Before you start your analysis, you should update the TargetExpirationDate
column in your report file to the dates that you want, like the versions to be expired. Again, if you choose a date that is in the past for a version, then that version will be treated as "version that had already expired" and will be deleted right away after you start the trim.
You can manually update the dates in TargetExpirationDate
by editing the csv file. However, you may have too many rows to update by hand. To bulk update the column, you can use Excel formulas, or you can also use one of the PowerShell scripts we have provided in Tutorial: Run 'What-If' analysis. Specifically, you may choose a trimming mode, run the corresponding script to get a updated csv file with TargetExpirationDate
filled out based on that trimming mode, and continue from there.
Option one: Analyze the report using Excel
Open the shared Excel workbook AnalyzeReportFile_Template.xlsx. You can find the following worksheets in it.
- Configuration: Use this worksheet to set the date range for generating the different report views.
- Dataset: This worksheet is the raw dataset imported from the report file. Various reporting summary views are constructed from this dataset.
- Preset Reports: Here's a list of preset views that can be used to understand the impact of applying the selected setting on versions stored in the site:
- Summary: Analyze the current state of version storage for this site and deleted version distribution under the new settings.
- Impacted Users: Review the users whose versions would be impacted under the new settings.
- Version Count: A table and graph showing the numbers of versions that will be available over time under the current schedule and the number of versions that will be available under the new schedule.
- Size of Versions Expired: Compare the size of versions that will be deleted over time under the current schedule and the number of versions that will be available under the new schedule.
- File Level Analysis: Review file level version deletions under the new settings.
Populate the workbook by following these steps:
On the Configuration worksheet, enter the full path to the What-If report file in Cell B3.
If you want to change the date range of graphs in Number of Versions Available worksheet, or Size of Versions Expired worksheet, change the corresponding values in Cells B6, B7, B10, and/or B11. It's optional.
At the top of Excel, select the Data tab, and in the Ribbon, select the Refresh All button.
On the Calculations worksheet, autofill the Number of Versions and Number of Versions Remaining After Deletion columns.
On the Impacted Users worksheet, autofill the Number of Versions Will be Deleted column.
All worksheets should now be up to date. You can check the information you're interested in.
Option two: Analyze the report using PowerShell
- Save the script as a file named AnalyzeReportFile.ps1.
# save this file as AnalyzeReportFile.ps1
Param(
[Parameter(Mandatory=$true)][string] $ReportLocalFilePath,
[Parameter(Mandatory=$false)][int]$ShowFilesWithFewerThanNVersions=10,
[Parameter(Mandatory=$false)][DateTime]$TimelineStartDate=[DateTime]::Now,
[Parameter(Mandatory=$false)][int]$TimelineStepDays=10,
[Parameter(Mandatory=$false)][int]$TimelineNumSteps=10
)
function Import-Dataset($DatasetFilePath)
{
$Dataset = Import-CSV $DatasetFilePath
$Columns = $Dataset `
| Get-Member -MemberType 'NoteProperty' `
| Select-Object -ExpandProperty Name
$CompactColumns = $Columns | Where-Object { $_ -Match ".Compact" }
$Timer = [Diagnostics.Stopwatch]::StartNew()
for ($RowIndex = 0; $RowIndex -lt $Dataset.Count; $RowIndex++)
{
if ($RowIndex -gt 0)
{
$PrevRow = $Dataset[$RowIndex-1]
}
$Row = $Dataset[$RowIndex]
foreach ($ColName in $Columns)
{
if ([string]::IsNullOrEmpty($Row.$ColName))
{
if (($ColName -in $CompactColumns) -and ($RowIndex -gt 0))
{
$Row.$ColName = $PrevRow.$ColName
}
else
{
$Row.$ColName = $null
}
}
}
$Row."WebId.Compact" = [Guid]$Row."WebId.Compact"
$Row."DocId.Compact" = [Guid]$Row."DocId.Compact"
$Row."MajorVersion" = [Int32]$Row."MajorVersion"
$Row."MinorVersion" = [Int32]$Row."MinorVersion"
$Row."WebUrl.Compact" = [String]$Row."WebUrl.Compact"
$Row."FileUrl.Compact" = [String]$Row."FileUrl.Compact"
$Row."Size" = [Int64]$Row."Size"
$Row."ModifiedBy_UserId.Compact" = [String]$Row."ModifiedBy_UserId.Compact"
$Row."ModifiedBy_DisplayName.Compact" = [String]$Row."ModifiedBy_DisplayName.Compact"
$Row."LastModifiedDate" = [DateTime]$Row."LastModifiedDate"
$Row."SnapshotDate" = [DateTime]$Row."SnapshotDate"
$Row."IsSnapshotDateEstimated" = [bool]$Row."IsSnapshotDateEstimated"
$Row."CurrentExpirationDate" = [System.Nullable[DateTime]]$Row."CurrentExpirationDate"
$Row."AutomaticPolicyExpirationDate" = [System.Nullable[DateTime]]$Row."AutomaticPolicyExpirationDate"
$Row."TargetExpirationDate" = [System.Nullable[DateTime]]$Row."TargetExpirationDate"
$Percent = [Math]::Ceiling(100 * $RowIndex / $Dataset.Count)
Write-Progress `
-Activity "Reading dataset" `
-Status "$Percent% Complete ($($RowIndex + 1) / $($Dataset.Count) rows):" `
-PercentComplete $Percent `
-SecondsRemaining $(($Dataset.Count - ($RowIndex + 1)) / (($RowIndex + 1) / $Timer.Elapsed.Totalseconds))
}
$Timer.Stop()
return $Dataset
}
function Get-NumVersionExpiresByDate($Dataset, $ColName, $DateCutoff)
{
$VersionsExpired = $Dataset | Where-Object { ($null -ne $_.$ColName) -and ($_.$ColName -le $DateCutoff) }
$IsTodayStr = ""
If ((Get-Date).Date -eq ($DateCutoff).Date)
{
$IsTodayStr = "*"
}
return [PSCustomObject]@{
Today = $IsTodayStr
Date = $DateCutoff
NumberOfVersionsAvailable = $Dataset.Count - $VersionsExpired.Count
NumberOfVersionsExpired = $VersionsExpired.Count
SizeOfVersionsExpiredInBytes = ($VersionsExpired | Measure-Object Size -Sum).Sum
}
}
function Get-FilesWithFewerThanNVersions($Dataset, $NumVersions)
{
$AvailableVersionsByFile = $Dataset `
| Where-Object { ($null -eq $_.TargetExpirationDate) -or ($_.TargetExpirationDate -gt [DateTime]::Now) } `
| Group-Object -Property WebId.Compact, DocId.Compact
$AvailableFilesWithNotEnoughVersions = @{}
# Files with some versions left but not enough
$AvailableVersionsByFile `
| Where-Object Count -lt $NumVersions `
| ForEach-Object { $AvailableFilesWithNotEnoughVersions[$_.Name] = $_.Count }
# Files with 0 versions left
$Dataset `
| Group-Object -Property WebId.Compact, DocId.Compact `
| Where-Object { $AvailableVersionsByFile.Name -notcontains $_.Name } `
| ForEach-Object { $AvailableFilesWithNotEnoughVersions[$_.Name] = 0 }
# Stitch all of the data together
return $Dataset `
| Group-Object -Property WebId.Compact, DocId.Compact `
| Where-Object Count -ge $NumVersions `
| Where-Object { $AvailableFilesWithNotEnoughVersions.Contains($_.Name) } `
| ForEach-Object `
{
$fileUrl = $_.Group[0]."WebUrl.Compact" + "/" + $_.Group[0]."FileUrl.Compact"
$numberOfVersionsAvailableBeforeTrim = $_.Count
$numberOfVersionsAvailableAfterTrim = $AvailableFilesWithNotEnoughVersions[$_.Name]
$numberOfVersionsTrimmed = $numberOfVersionsAvailableBeforeTrim - $numberOfVersionsAvailableAfterTrim
[PSObject]::new() |
Add-Member -PassThru -NotePropertyMembers ([Ordered]@{
FileUrl = $fileUrl
NumberOfVersionsAvailableBeforeTrim = $numberOfVersionsAvailableBeforeTrim
NumberOfVersionsAvailableAfterTrim = $numberOfVersionsAvailableAfterTrim
NumberOfVersionsTrimmed = $numberOfVersionsTrimmed
})
} `
| Sort-Object -Property NumberOfVersionsAvailableAfterTrim
}
function Get-MostImpactedUsers($Dataset)
{
$VersionsExpired = $Dataset | Where-Object { ($null -ne $_.TargetExpirationDate) -and ($_.TargetExpirationDate -le [DateTime]::Now) }
return $VersionsExpired `
| Group-Object -Property ModifiedBy_UserId.Compact `
| Select-Object `
@{ L = "UserId"; E = { $_.Group[0]."ModifiedBy_UserId.Compact" } }, `
@{ L = "UserDisplayName"; E = { $_.Group[0]."ModifiedBy_DisplayName.Compact" } },
@{ L = "NumberOfVersionsTrimmed"; E = { $_.Count } } `
| Sort-Object -Property NumberOfVersionsTrimmed -Descending
}
$Dataset = Import-Dataset -DatasetFilePath $ReportLocalFilePath
$CurrentExpirationSummaryTable = @()
$TargetExpirationSummaryTable = @()
$Timer = [Diagnostics.Stopwatch]::StartNew()
for ($Step = 0; $Step -lt $TimelineNumSteps; $Step++)
{
$DateCutOff = $TimelineStartDate.AddDays($TimelineStepDays * $Step)
$CurrentExpirationSummaryTable += `
Get-NumVersionExpiresByDate -Dataset $Dataset -ColName CurrentExpirationDate -DateCutoff $DateCutOff
$TargetExpirationSummaryTable += `
Get-NumVersionExpiresByDate -Dataset $Dataset -ColName TargetExpirationDate -DateCutoff $DateCutOff
}
$Timer.Stop()
Write-Host "===========================" -ForegroundColor Yellow
Write-Host "Current Expiration Schedule" -ForegroundColor Yellow
Write-Host "===========================" -ForegroundColor Yellow
$CurrentExpirationSummaryTable | Format-Table -Autosize | Out-String | Write-Host
Write-Host "Total elapsed seconds: $($Timer.Elapsed.TotalSeconds / 2)" -ForegroundColor Green
Write-Host
Write-Host "==========================" -ForegroundColor Yellow
Write-Host "Target Expiration Schedule" -ForegroundColor Yellow
Write-Host "==========================" -ForegroundColor Yellow
$TargetExpirationSummaryTable | Format-Table -Autosize | Out-String | Write-Host
Write-Host "Total elapsed seconds: $($Timer.Elapsed.TotalSeconds / 2)" -ForegroundColor Green
Write-Host
Write-Host "================================" -ForegroundColor Yellow
Write-Host "Files with Fewer Than $ShowFilesWithFewerThanNVersions Versions" -ForegroundColor Yellow
Write-Host "================================" -ForegroundColor Yellow
$Timer = [Diagnostics.Stopwatch]::StartNew()
Get-FilesWithFewerThanNVersions -Dataset $Dataset -NumVersions $ShowFilesWithFewerThanNVersions | Format-Table -Autosize | Out-String | Write-Host
$Timer.Stop()
Write-Host "Total elapsed seconds: $($Timer.Elapsed.TotalSeconds)" -ForegroundColor Green
Write-Host
Write-Host "==============" -ForegroundColor Yellow
Write-Host "Users Impacted" -ForegroundColor Yellow
Write-Host "==============" -ForegroundColor Yellow
$Timer = [Diagnostics.Stopwatch]::StartNew()
Get-MostImpactedUsers -Dataset $Dataset | Format-Table -Autosize | Out-String | Write-Host
$Timer.Stop()
Write-Host "Total elapsed seconds: $($Timer.Elapsed.TotalSeconds)" -ForegroundColor Green
Write-Host
- Open PowerShell and run the following command, replacing the placeholder values with the appropriate values.
Note
Use PowerShell 7 to run the commands. You can install PowerShell 7 by following these instructions: Installing PowerShell on Windows - PowerShell | Microsoft Learn.
. “<path to AnalyzeReportFile.ps1>” –ReportLocalFilePath “<path to the file version expiration What-If report .csv file>”
- The output displays four tables:
Current Expiration Schedule: this table contains a time-series summary for your versions as they are. It has the following columns:
- Date: the first column represents the date.
- NumberOfVersionsAvailable: the number of versions available on that date under the current schedule.
- NumberOfVersionsExpired: the number of versions expired on that date under the current schedule.
- SizeOfVersionsExpiredMB: the size of versions expired on that date under the current schedule.
Target Expiration Schedule: this table is the same as Current Expiration Schedule but reflects the updated schedule instead. This table is only helpful if you want to test out different expiration scenarios by changing the TargetExpirationDate column in the file version expiration report.
Files with Fewer Than 10 Versions: a list of the URLs, and the number of versions before and after the deletion for those files whose number of versions is fewer than 10 after immediate deletion (but was more than 10 before the immediate deletion).
Users Impacted: the users whose versions would be immediately deleted.
Optionally, you can adjust the parameters:
TimelineStartDate
: the starting date for Table 1 and 2 above.TimelineStepDays
: the number of days in between rows for Table 1 and 2 above.TimelineNumSteps
: the number of rows to calculate for Table 1 and 2 above.ShowFilesWithFewerThanNVersions
: the threshold for the number of versions in Table 3 above.