A nightly Veeam Backup & Replication job started failing on a single VM in a two-node Hyper-V Server 2019 failover cluster. The cluster uses shared SAS JBOD storage with Storage Spaces and an external witness disk, and had been running cleanly for years.
The night before, a prolonged power outage hard-cut both cluster nodes. When power returned, the VMs restarted on their own and the cluster came back online, with one exception: one node was reporting a failed cluster volume. Restarting that node cleared the volume report and the cluster looked healthy again. The client had also reset the cluster logs during recovery, which removed any record of which I/Os had been in flight when power dropped, but at that point everything appeared fine.
Later that night, the Veeam job for one VM started failing on every retry with:
Failed to create VM recovery checkpoint (mode: Veeam application-aware processing)
Details: Failed to create VM (ID: 8F2E5A91-4D3B-4F1C-9A22-9A0B12C34D56) recovery checkpoint.
Job failed ('Checkpoint operation for 'APPSRV' failed.
'APPSRV' could not initiate a checkpoint operation: The process cannot access the file
because it is being used by another process. (0x80070020).
'APPSRV' could not create auto virtual hard disk
C:\ClusterStorage\ssd_vdisk01\VMs\APPSRV\Virtual Hard Disks\APPSRV-01_1E5B4C9F-3A82-4D60-8B14-2C7F9E8D6A53.avhdx:
The process cannot access the file because it is being used by another process. (0x80070020).
Error code: '32774'.
A manual production checkpoint from Hyper-V Manager failed with the same error, so this was not a Veeam problem. Veeam was just the messenger. Hyper-V itself could not create the differencing disk.
The standard checks for 0x80070020 (open process handle, VSS shadow copies, mounted disk images) all returned empty, which made the lock harder to attribute than the error message suggests. The full diagnostic walkthrough below covers each check, the output it produced, and the reasoning that pointed at the actual cause.
The fix
Short version up front. The full diagnostic walkthrough and the reasoning behind each step are in the sections that follow.
Run all PowerShell from an elevated session on a node that is part of the cluster. Replace <VMName> with your VM’s name throughout (or set $VMName once, as the snippets below do).
1. Confirm Hyper-V sees zero checkpoints, then list the orphans
Open Hyper-V Manager on the node currently owning the VM. The VM’s checkpoint pane must show zero checkpoints. If any checkpoint is listed, stop here: the AVHDX files are part of a live chain and you must merge or delete them through Hyper-V Manager, not by deleting files.
Then enumerate every orphan .avhdx in any folder this VM has a disk in:
$VMName = "<VMName>"
# Every folder that holds a disk for this VM
$DiskFolders = Get-VMHardDiskDrive -VMName $VMName |
Select-Object -ExpandProperty Path |
Split-Path -Parent | Sort-Object -Unique
# Reusable filter: ONLY files whose extension is exactly .avhdx.
# Excludes .vhdx, .vhdx.mrt, .vhdx.rct, .avhdx.mrt, .avhdx.rct.
$Orphans = $DiskFolders | ForEach-Object {
Get-ChildItem $_ | Where-Object { $_.Extension -eq '.avhdx' }
}
# Show what was found. Confirm each one is a `<DiskName>_<GUID>.avhdx`,
# not a parent .vhdx (which this filter cannot return anyway).
$Orphans | Select FullName, Length, LastWriteTimeInspect the list. Every entry should match the <DiskName>_<GUID>.avhdx pattern. They are the orphans you will delete in step 4.
2. Confirm the VM is on the parent VHDX, not the AVHDX
Get-VMHardDiskDrive -VMName $VMName |
Select VMName, Path, ControllerType, ControllerNumber, ControllerLocationEvery Path value should end in .vhdx, not .avhdx. If any Path ends in .avhdx, the chain is live and you must not delete the AVHDX. If they all end in .vhdx, the AVHDX files are dead leftovers and the rest of this procedure applies.
3. Live-migrate the VM to another cluster node
List the cluster nodes and pick any one (other than the current owner) that shows State = Up:
Get-ClusterNode | Select Name, State
Get-ClusterGroup -Name $VMName | Select Name, OwnerNode, StateThen migrate the VM to a target node. Replace <TargetNode> with the chosen node’s name:
$TargetNode = "<TargetNode>"
Move-ClusterVirtualMachineRole -Name $VMName `
-Node $TargetNode -MigrationType Live
# Confirm the migration completed
Get-ClusterGroup -Name $VMName | Select Name, OwnerNode, StateThis is the step that does the work. It tears down the VM’s worker process on the original node, which releases the kernel-level lock on the orphan AVHDX. The CSV path stays the same from any node, so no file paths change.
4. Delete the orphan AVHDX files and their .mrt / .rct companions
The block below iterates over the orphans found in step 1 and deletes each .avhdx along with the .mrt / .rct files that share its exact base name. It cannot affect the parent .vhdx or its companions: the loop only ever sees files matched by Extension -eq '.avhdx' in step 1, and constructs delete paths from those exact filenames.
foreach ($f in $Orphans) {
# Strip the trailing .avhdx; everything we delete keys off this base path.
$base = $f.FullName -replace '\.avhdx$',''
Remove-Item "$base.avhdx" -Force
Remove-Item "$base.avhdx.mrt" -Force -ErrorAction SilentlyContinue
Remove-Item "$base.avhdx.rct" -Force -ErrorAction SilentlyContinue
}
# Verify what remains. Should be only <DiskName>.vhdx + .vhdx.mrt + .vhdx.rct.
$DiskFolders | ForEach-Object { Get-ChildItem $_ }If $Orphans or $DiskFolders are no longer set in your session (for example if you opened a new PowerShell window between step 1 and step 4), re-run the discovery from step 1 first. The deletion loop is intentionally separated from the discovery so the list of files about to be deleted is visible (and reviewable) before any Remove-Item runs.
5. Test a manual production checkpoint, then retry the Veeam job
In Hyper-V Manager on the new owning node, right-click the VM and choose Checkpoint. The checkpoint should appear in the Checkpoints pane within seconds and a new <DiskName>_<GUID>.avhdx should appear in the VM’s disk folder. If the create step fails with 0x80070020 again, the cleanup did not fully release the locks; revisit the diagnostic walkthrough below before retrying Veeam.
Manual checkpoints do not auto-merge. Once the create succeeds, right-click the checkpoint in Hyper-V Manager and choose Delete Checkpoint. That triggers the merge: the AVHDX disappears from the disk folder, the chain returns to the parent .vhdx, and the VM is back to a clean state. Wait until the AVHDX is gone before doing anything else (small VMs merge in a few seconds; busy ones can take a minute or two).
With the manual create-and-delete cycle clean, retry the Veeam job. It should run green.
Why the standard diagnostics returned empty
The error code 0x80070020 is ERROR_SHARING_VIOLATION, the file-in-use error. The natural reaction is to find the process holding the file and either restart it or terminate it. On Hyper-V, that usually means a stuck vmwp.exe worker process, an antivirus scanner, or a Veeam agent service. None of those applied here.
Here is the full diagnostic walkthrough, with the actual command output, in the order the checks were run.
Get-VMSnapshot returns empty
Get-VMSnapshot -VMName APPSRV
Get-VM APPSRV | Select Name, State, Status, Generation, CheckpointType
Output:
Name : APPSRV
State : Running
Status : Operating normally
Generation : 2
CheckpointType : Production
No checkpoints. The VM is healthy and running production-checkpoint mode (the only mode Veeam uses for application-aware processing). At this point Hyper-V is convinced there is nothing to merge or clean up. But the AVHDX files are sitting on the CSV.
Get-VHD shows a valid differencing chain
Get-VHD "C:\ClusterStorage\ssd_vdisk01\VMs\APPSRV\Virtual Hard Disks\APPSRV-01_A4B2D8C1-7E3F-4592-BC81-3D7A6F9C2E10.avhdx" |
Format-List Path, ParentPath, VhdFormat, VhdType
Output:
Path : C:\ClusterStorage\ssd_vdisk01\VMs\APPSRV\Virtual Hard Disks\APPSRV-01_A4B2D8C1-7E3F-4592-BC81-3D7A6F9C2E10.avhdx
ParentPath : C:\ClusterStorage\ssd_vdisk01\VMs\APPSRV\Virtual Hard Disks\APPSRV-01.vhdx
VhdFormat : VHDX
VhdType : Differencing
The orphan AVHDX is a structurally valid differencing disk pointing at the right parent. The Veeam-style chain looks intact on disk, but Hyper-V’s runtime view of the VM (Get-VMHardDiskDrive) reports the VM running directly off the parent APPSRV-01.vhdx. The chain is ghosted: it exists on disk, it does not exist as far as Hyper-V’s checkpoint metadata is concerned.
handle.exe finds no user-mode locks
Sysinternals Handle is the standard tool for finding which process holds a file open. Elevated:
Invoke-WebRequest -Uri "https://download.sysinternals.com/files/Handle.zip" `
-OutFile "$env:TEMP\Handle.zip"
Expand-Archive -Path "$env:TEMP\Handle.zip" -DestinationPath "C:\Tools\Handle" -Force
cd C:\Tools\Handle
.\handle64.exe -accepteula -nobanner | Out-Null
.\handle64.exe -a -nobanner A4B2D8C1
.\handle64.exe -a -nobanner D9F1E27C
.\handle64.exe -a -nobanner ".avhdx"
Output:
No matching handles found.
No matching handles found.
No matching handles found.
Nothing. No process is holding either AVHDX file open according to user-mode handle enumeration. To rule out a privilege issue:
([Security.Principal.WindowsPrincipal][Security.Principal.WindowsIdentity]::GetCurrent()).IsInRole(
[Security.Principal.WindowsBuiltInRole]::Administrator)
True
Elevated. Empty result confirmed.
vssadmin shows no shadow copies
A common cause of invisible AVHDX locks is a leftover VSS shadow copy from the failed backup. The shadow holds a kernel reference even after the user-facing job has exited.
vssadmin list shadows /for=C:\ClusterStorage\ssd_vdisk01
Get-WmiObject Win32_ShadowCopy | Format-List ID, InstallDate, VolumeName, DeviceObject
Output:
No items found that satisfy the query.
No shadows on the CSV, no Win32_ShadowCopy objects anywhere on the host. Clean.
Get-DiskImage shows the AVHDX is not mounted
If something had attached the AVHDX as a disk (for inspection, recovery, or because a tool forgot to dismount), it would explain the lock.
Get-DiskImage "C:\ClusterStorage\ssd_vdisk01\VMs\APPSRV\Virtual Hard Disks\APPSRV-01_A4B2D8C1-7E3F-4592-BC81-3D7A6F9C2E10.avhdx"
Output:
Attached : False
ImagePath : C:\ClusterStorage\ssd_vdisk01\VMs\APPSRV\Virtual Hard Disks\APPSRV-01_A4B2D8C1-7E3F-4592-BC81-3D7A6F9C2E10.avhdx
FileSize : 5005901824
StorageType : 3
Not mounted. Same for the second AVHDX.
Remove-Item and Rename-Item still fail
After the previous checks, the file is reportedly not held by any user-mode process, has no shadow copies, and is not disk-mounted. By every standard diagnostic the file should be free, but both write operations fail:
Remove-Item "C:\ClusterStorage\ssd_vdisk01\VMs\APPSRV\Virtual Hard Disks\APPSRV-01_A4B2D8C1-7E3F-4592-BC81-3D7A6F9C2E10.avhdx" -Force
Remove-Item : Cannot remove item ... : The process cannot access the file
... because it is being used by another process.
Rename-Item "C:\ClusterStorage\ssd_vdisk01\VMs\APPSRV\Virtual Hard Disks\APPSRV-01_A4B2D8C1-7E3F-4592-BC81-3D7A6F9C2E10.avhdx" "test.avhdx"
Rename-Item : The process cannot access the file because it is being used by another process.
Both fail with the same error. The lock is real. Something has the file open. It just is not anything handle.exe can see.
The takeaway: kernel-mode locks are invisible to handle.exe
handle.exe enumerates user-mode handles: file handles owned by user-mode processes via the Win32 API. It cannot see handles held inside kernel-mode drivers.
Hyper-V’s storage stack (storvsp.sys, vhdmp.sys) opens VHDX and AVHDX files at the kernel level and tracks the chain across migrations, snapshots, and merge operations. When a checkpoint creation crashes mid-flight, the user-mode vmwp.exe worker process exits cleanly, but the kernel-mode chain tracking can be left referencing the orphan AVHDX. From that point on:
Get-VMHardDiskDrivereports the VM running off the parent VHDX (the user-mode chain has reset).Get-VMSnapshotreports zero checkpoints (the metadata was never committed).handle.exeshows no user-mode handles on the AVHDX (because there are none).- The kernel still holds a reference, so any attempt to delete, rename, or open the AVHDX returns
0x80070020. - New AVHDX creation also fails, because Hyper-V cannot get a clean handle on the chain it thinks it might still need.
The fix is to evict the kernel state for that VM. On a multi-node cluster, the cleanest way to do that without taking the VM offline is live migration.
Why live migration works
Live migration tears down the source node’s worker process and the kernel storage stack references along with it, then rebuilds them on the target node from current on-disk state. The orphan AVHDX has no role in current state, so the new node’s storage stack does not reference it. The file is now unlocked from every node in the cluster (a CSV is visible to all of them).
The full sequence in this case:
Move-ClusterVirtualMachineRole -Name "APPSRV" `
-Node "HV-NODE-02" -MigrationType Live
Get-ClusterGroup -Name "APPSRV" | Select Name, OwnerNode, State
Name OwnerNode State
---- --------- -----
APPSRV HV-NODE-02 Online
Migration completed in under a minute, the VM stayed up, no users disconnected. Then, from either node:
Rename-Item "C:\ClusterStorage\ssd_vdisk01\VMs\APPSRV\Virtual Hard Disks\APPSRV-01_A4B2D8C1-7E3F-4592-BC81-3D7A6F9C2E10.avhdx" "test.avhdx"
The rename succeeded silently. Lock released. The orphan and its companion .mrt and .rct files were deleted, a manual production checkpoint completed cleanly within seconds, and the Veeam job ran green on the next attempt.
Why the .mrt and .rct companions matter
Each AVHDX has two companion files:
<name>.avhdx.mrt: Modified Region Table<name>.avhdx.rct: Resilient Change Tracking metadata
These are tracking files Hyper-V uses to support incremental backups. With their parent AVHDX gone, they are stale. They will not block checkpoints, but stale RCT can confuse incremental backup logic and force a full backup on the next Veeam run. Step 4 of the fix removes them alongside each orphan AVHDX. The rationale is worth knowing in case you encounter these files in isolation, or write your own cleanup tooling.
Single-node fallback: brief downtime
On a single-node Hyper-V host, live migration is not an option. The kernel reference only releases when the worker process exits, which means stopping the VM. The procedure mirrors the multi-node fix, except step 3 (live migrate) is replaced by Stop-VM. The same Extension -eq '.avhdx' filter applies, so the parent .vhdx and its companions cannot be touched:
$VMName = "<VMName>"
# Discover disk folders (same as step 1 of the multi-node fix)
$DiskFolders = Get-VMHardDiskDrive -VMName $VMName |
Select-Object -ExpandProperty Path |
Split-Path -Parent | Sort-Object -Unique
# Identify orphan AVHDX files. Excludes .vhdx, .vhdx.mrt/.rct, .avhdx.mrt/.rct.
$Orphans = $DiskFolders | ForEach-Object {
Get-ChildItem $_ | Where-Object { $_.Extension -eq '.avhdx' }
}
$Orphans | Select FullName, Length, LastWriteTime # Review before deleting
# Stop the VM (graceful shutdown via integration services)
Stop-VM -Name $VMName
# Wait until Get-VM $VMName reports State = Off before continuing
# Delete each orphan and its matching companions
foreach ($f in $Orphans) {
$base = $f.FullName -replace '\.avhdx$',''
Remove-Item "$base.avhdx" -Force
Remove-Item "$base.avhdx.mrt" -Force -ErrorAction SilentlyContinue
Remove-Item "$base.avhdx.rct" -Force -ErrorAction SilentlyContinue
}
Start-VM -Name $VMName
Total downtime is whatever your VM’s clean shutdown and boot take, typically one to two minutes. A reboot of the host node also clears the kernel state and is worth knowing about if you are already planning maintenance, but a Stop-VM is the targeted action.
What caused the orphan in the first place
In this incident, the trigger was the prolonged power outage described at the top of the article. Both nodes lost power without warning while a Veeam job was in flight, which left the checkpoint chain partially committed on disk: the AVHDX existed, but the metadata was never finalized. The cluster recovery process restored running state for the VMs and Storage Spaces eventually surfaced the failed-volume report, but neither step inspects checkpoint chains, so the orphan AVHDX sat undetected until the next backup window.
A practical lesson worth flagging: in this case the client cleared the cluster event log during their own recovery work before we were brought in, which removed the historical record of what was happening on each node when power dropped. On any incident involving an unplanned cluster shutdown, capture the cluster log (Get-ClusterLog -Destination C:\Logs) before any cleanup or service restarts, and brief end-customers not to clear logs themselves. It is the only authoritative record of which I/Os were in flight, which writers were active, and which storage operations had committed at the moment of the power loss. Without it, root-cause analysis on the resulting backup failure becomes guesswork.
That specific failure is one instance of a more general pattern. Three conditions, in some combination, are responsible for almost every recurring case of orphan AVHDX from Veeam jobs on Hyper-V:
- Free space on the CSV. A checkpoint creation that runs out of space mid-write leaves the AVHDX behind. If your CSV is consistently above 85 percent full, that alone causes occasional checkpoint failures with this exact failure mode. Watch the parent VHDX sizes plus expected change rate during the backup window.
- Antivirus on the cluster nodes. Defender on Server 2019 and 2022 in particular has been known to grab a handle on a fresh AVHDX during creation, racing with
vmms.exe. Required exclusions on every cluster node (the VM migrates, the problem migrates with it):- Path:
C:\ClusterStorage\and recursive - Processes:
vmms.exe,vmwp.exe, and the Veeam Hyper-V agent processes
- Path:
- Backup process killed mid-flight. Anything that abruptly ends the Veeam job mid-snapshot (host crash, network blip, manual cancellation at the wrong moment) can leave the VM’s checkpoint chain partially committed. The AVHDX exists, the metadata never finalized.
If you find yourself running this cleanup more than once, work through the list above before chalking it up to a one-off.
Diagnostic flowchart
Compressed for the next time:
Symptom
Veeam fails with 0x80070020 “file in use” creating recovery checkpoint. Manual production checkpoint from Hyper-V Manager fails with the same error.
Confirm orphan state
Get-VMSnapshot -VMName <vm> returns empty. Get-VMHardDiskDrive -VMName <vm> shows the VM on the parent VHDX. Files named <disk>_<GUID>.avhdx exist next to the parent on disk.
Rule out the easy stuff
handle64.exe -a -nobanner <GUID-fragment>, vssadmin list shadows /for=<csv>, Get-DiskImage <avhdx>. If all return empty and Rename-Item on the AVHDX still fails with 0x80070020, you have a kernel lock.
Resolve the kernel lock
Multi-node cluster: Move-ClusterVirtualMachineRole -MigrationType Live. Single-node: Stop-VM then Start-VM.
Clean up
Delete the orphan AVHDX, plus its .mrt and .rct companions. Leave the live VHDX’s companions alone.
Verify
In Hyper-V Manager, create a manual production checkpoint, confirm the AVHDX appears, then delete the checkpoint to trigger the merge. If the AVHDX disappears cleanly, retry the Veeam job.
The kernel-lock signature (handle.exe empty, no shadows, no disk mounts, but rename still fails) is the diagnostic worth remembering. On Hyper-V, when user-mode tools come back clean and the file is still locked, evicting the worker process by migrating the VM to another node is usually the resolution.
Related reading
Another case of a Veeam error message pointing at a problem that lives one layer below the backup tool itself:
Fixing "SQL VSS Writer Is Missing" in Veeam Backup
A trailing space in a SQL Server database name can silently break VSS writer registration for every database. Here's how to find it, fix it, and prevent it.