Diffing: tracking changes between objects
Diffing is the process of determining what changed between two sets of objects.
Typically, the two sets of objects are two versions of the same thing (of a pulled Revit model, of a Structural Model that we want to Push to an Adapter, etc), in which case Diffing can effectively be used as a Version Control tool.
🤖 Developers: check out also the Diffing and Hash: Guide for developers.
The Diffing_Engine gives many ways to perform diffing on sets of objects. Let's see them.
IDiffing method
The most versatile method for diffing is the BH.Engine.Diffing.Compute.Diffing()
method, also called IDiffing
. Ideally, you should always use this Diffing method, although other alternatives exist for specific cases (see Other diffing methods below). A detailed technical explanation of the IDiffing can be found in the guide for developers.
This method can be found in any UI by simply looking for diffing
. See the below for an example file:
Diffing main method
Example file (right click -> download): DiffingExample-00-RevitDiffing.zip
Example file (right click -> download): DiffingInExcel.xlsx
The method takes three inputs:
pastObject
: objects belonging to a past version, a version that precedes thefollowingObjects
's version.followingObjects
: objects belonging to a following version, a version that was created after thepastObject
's version.diffingConfig
: configurations for the diffing, where you can set yourComparisonConfig
object, see below.
The output of every diffing method is always a diff
object, which we will describe in a section below.
How diffing works: identifiers
The IDiffing, like all diffing methods, relies on an identifier assigned to each object, which can be used to match objects, so it knows which to compare to which.
The identifer is generally a unique "signature" assigned to each object, and this signature is assumed to remain always the same even if the object is modified.
The identifier is typically stored on objects after they have been Pulled from an Adapter. This means that the IDiffing works best with objects pulled from a BHoM Adapter that stores the object Id on the object (most of them do).
In case no Identifier can be found on the objects, the IDiffing attempts to use alternative methods e.g. compare one-by-one the objects; it will give you a note if this happens.
(Technical sidenote: the identifier object is of a type called IPersistentAdapterId
, searched in the object's Fragments. More on this in the diffing guide for developers.)
The Diffing output: the Diff
object
The output of any Diffing method is an object of type Diff
. The diff
output can be Explode
d to reveal all the available outputs:
the Diff object
Example file (right click -> download): DiffingExample-00-RevitDiffing.zip
Example file (right click -> download): DiffingInExcel.xlsx
AddedObjects
: objects present in the second set that are not present in the first set.RemovedObjects
: objects not present in the second set that were present in the first set.ModifiedObjects
: objects that are recognised as present both in the first set and the second set, but that have some property that is different. The rules that were used to recognise modification are in theDiffingConfig.ComparisonConfig
.UnchangedObjects
: objects that are recognised as the same in the first and second set.ModifiedObjectsDifferences
: all the differences found between the two input sets of objects.DiffingConfig
: the specific instance ofDiffingConfig
that was used to calculate thisDiff
. Useful in scenarios where aDiff
is stored and later inspected.
The ModifiedObjectDifferences
output contains a List of ObjectDifferences
objects, one for each modified object, that contains information about the modified objects. These can be further Explode
d:
The Diff object's properties
Example file (right click -> download): DiffingExample-00-RevitDiffing.zip
Example file (right click -> download): DiffingInExcel.xlsx
PastObject
: the object in thepastObjs
set that was identified as modified (i.e., a different version of the same object was found in thefollowingObjs
set).FollowingObject
: the object in thefollowingObjs
set that was identified as modified (i.e., a different version of the same object was found in thepastObjs
set).Differences
: all the differences found between the two versions of the modified object. This is a List ofPropertyDifference
objects, one for each difference found on the modified object.
Finally, exploding the Differences
object, we find:
The Differences property
Example file (right click -> download): DiffingExample-00-RevitDiffing.zip
Example file (right click -> download): DiffingInExcel.xlsx
(Sorry, missing a more accurate screenshot here -- just keep exploding as in the grasshopper example)
DisplayName
: name given to the difference found. This is generally the PropertyName (name of the property that changed), but it can also indicate other things. For example, if aComparisonInclusion()
extension method is defined for some of the input objects (like it happens for Revit'sRevitParameter
s), then theDisplayName
may also contain some specific naming useful to identify the difference (in the case ofRevitParameter
, this is the name of the RevitParameter that changed in the modified object).
An example of a DisplayName could beStartNode.Position.X
(given a modified object of typeBH.oM.Structure.Elements.Bar
).PastValue
: the modified value in thePastObject
.FollowingValue
: the modified value in theFollowingObject
.FullName
: this is the modified property Full Name. An object difference can always be linked to a precise object property that is different; this is given in the Full Name form, which includes the namespace. An example of this could beBH.oM.Structure.Elements.Bar.StartNode.Position.X
. Note that this FullName can be significantly different fromDisplayName
(as happens forRevitParameter
s, where the Full Name will be something like e.g.BH.oM.Adapters.Revit.Parameters[3].RevitParameter.Value
).
Options for the diffing: DiffingConfig
(and ComparisonConfig
)
The DiffingConfig
object can be attached to any Diffing method and allows you to specify options for the Diffing comparison.
The Diffing config has the following inputs:
ComparisonConfig
allows you to specify all the object comparison options; it has many settings, please see its dedicated page.EnablePropertyDiffing
: optional, defaults totrue
. If disabled, Diffing does not checks all the property-level differences, running much faster but potentially ignoring important changes.IncludedUnchangedObjects
: optional, defaults totrue
. When diffing large sets of objects, you may want to not include the objects that did not change in the diffing output, to save RAM.AllowDuplicateIds
: optional, defaults tofalse
. The diffing generally uses identifiers to track "who is who" and decide which objects to compare; in such operations, duplicates should never be allowed, but there could be edge cases where it is useful to keep them.
Other Diffing methods
In addition to the main Diffing method IDiffing()
, there are several other methods that can be used to perform Diffing. These are a bit more advanced and should be used only for specific cases. The additional diffing methods can be found in the Compute folder of Diffing_Engine.
Other than these, Toolkit-specific diffing methods exist to deal with the subtleties of comparing Objects defined in a Toolkit. Users do not generally need to know about these, as Toolkit-specific diffing methods will be automatically called for you if needed by the generic IDiffing method. Just for reference, a Toolkit-specific Diffing method is RevitDiffing()
.
DiffWithFragmentId()
and DiffWithCustomDataKeyId()
These two methods are "ID-based" diffing methods. They simply retrieve an Identifier associated to the input objects, and use it to match objects from the pastObjs
set to objects in the followingObjs
set, deciding who should be compared to who.
- The
DiffWithFragmentId()
retrieves object identifiers from the objects' Fragments. You can specify which Fragment you want to get the ID from, and which property of the fragment is the ID. - The
DiffWithCustomDataKeyId()
retrieves object identifiers from the objects' CustomData dictionary. You can specify which dictionary Key you want to get the ID from.
Both method then call the DiffWithCustomIds()
to perform the comparison with the extracted Ids, see below.
DiffWithCustomIds()
The DiffWithCustomIds()
method allows you to provide:
- Two input objects sets that you want to compare,
pastObjs
andfollowingObjs
; - Two input identifiers sets,
pastObjsIds
andfollowingObjsIds
, with the Ids associated to thepastObjs
andfollowingObjs
.
You can specify some null
Ids in the pastObjsIds
and followingObjsIds
; however these two lists must have the same number of elements as pastObjs
and followingObjs
, respectively.
The IDs are then used to match the objects from the pastObjs
set to objects in the followingObjs
set, to decide who should be compared to who:
- If an object in the
pastObjs
does not have a corresponding object in thefollowingObjs
set, it means that it has been deleted in the following version, so it is identified as "Removed" (old). - If an object in the
followingObjs
does not have a corresponding object in thepastObjs
set, it means that it has been deleted in the past version, so it is identified as "Added" (new). - If an object in the
pastObjs
matches by ID an object in thefollowingObjs
, then it is identified as "Modified" (it changed between the two versions). This means that the two objects will be compared and all their differences will be found. This is done by invoking theObjectDifferences()
method, that is explained in detail here.
DiffOneByOne()
The DiffOneByOne()
method simply takes two input lists, pastObjs
and followingObjects
, and these have the objects in the same identical order. It then simply compares each object one-by-one. If matched objects are equal, they are "Unchanged", otherwise, they are "Modified" and their property difference is returned.
For this reason, this method is not able to discover "Added" (new) or "Removed" (old) objects.
DiffWithHash()
The DiffWithHash()
method simply does a Venn Diagram of the input objects' Hashes:
The Venn Diagram is computed by means of a HashComparer
, which simply means that the Hash of all input objects gets computed.
If objects with the same hash are found they are identified as "Unchanged"; otherwise, objects are either "Added" (new) or "Removed" (old) depending if their hash exists exclusively in following or past set. For this reason, this method is not able to discover "Modified" objects.
The Hash is leveraged by this method so you are able to customise how the diffing behaves by specifying a ComparisonConfig
options in the DiffingConfig
.
DiffRevisions
This method was designed for the AECDeltas workflow and is currently not widely used.
It essentially expects the input objects to be wrapped into a Revision
object, which is useful to attach additional Versioning properties to them.
The Revisions can then be provided as an input to DiffRevisions()
, and the logic works very similarly to the other diffing methods seen above.