BHoM Datasets
Datasets are a way to store and distribute BHoMObjects for use by others. For example, a list of standard structural materials or section properties as well as global warming potential for various materials.
The data should be serialised in a Dataset object, and the relevant .csproj
file in the repo, in which the Dataset is stored, should have a post build event implemented that ensures that the Dataset is copied to the C:\ProgramData\BHoM\Datasets folder
. This will allow it to be picked up by the Library_Engine
.
Generate a new dataset
To generate a new dataset to be used with the BHoM the following steps should be taken.
-
Generate the objects to be stored in the new Dataset. This means creating the BHoMObject of the correct type in any of the supported UIs. See below for an example of how to create a handful of standard European steel materials in Grasshopper. Remember to give the created objects an easily identifiable name as the name is what will show up when using the data in the dropdowns. Remember that all BHoM objects should be defined in SI units.
-
Store the created objects in a Dataset object and give the dataset an appropriate name. This is the name for the dataset - the name that appears in the UI is described the next step.
-
Populate the source object and assign it to the dataset. See guidance below regarding the source.
-
Convert the dataset object and store it to a single line json file. This is easiest done using the FileAdapter. The library engine relies on the json files to be a single line per object, while the default json output from the FileAdapter is putting the json over multiple lines. To make sure the produced json file is in the correct format for the library engine, provide a File.PushConfig with
UseDatasetSerialization
set to true andBeautifyJson
set tofalse
to the push command. Name the file something clearly identifiable, as the name of the file will be what is used to identify the dataset by the library engine, and will be what it is called in the UI menu. -
For personal use, do one of the following:
- Place the file in the relevant subfolder of the C:\ProgramData\BHoM\Datasets folder. If no relevant subfolder already exists, a new one can be added. The folder will be used to generate the menus used to find the dataset in the menu system, and also makes a whole folder searchable using the Library method. Remember that running an installer will reset the datasets folder so for this option backup the json file, or use option ii.
- Place the json file in a subfolder of a folder of your own choice and use the custom dataset folder outlined below.
- For distribution of the Dataset to the BHoM community do the following:
- Store the dataset in the appropriate repository folder:
- For a general dataset, such as standard materials etc., place the json file in an appropriate subfolder folder in BHoM_Datasets.
- For a toolkit specific dataset put the json file in a Dataset folder in the root folder of the toolkit to host the dataset. If no such folder exist, it should be created. Make sure that the oM project in the toolkit has the following post-build event code:
xcopy "$(SolutionDir)DataSets\*.*" "C:\ProgramData\BHoM\DataSets" /Y /I /E
that ensures that the dataset is copied over to the C:\ProgramData\BHoM\Datasets folder.
- Raise a Pull request on GitHub and ask for review from relevant parties.
- Store the dataset in the appropriate repository folder:
Custom dataset folder
By default, the Library_Engine scans the C:\ProgramData\BHoM\Datasets for all json files and loads them up to be queryable by the UI and the methods in the library engine. This location is reset with each BHoM install to make sure all datasets are up-to-date and that any modifications or fixes correctly are applied to the data. For some cases it can be also useful to have your own datasets stored in your own folder for example on a network drive to share during work on a particular project.
For these reasons it is possible to get the Library_Engine to scan other folders for datasets as well. This can easily be controlled via the AddUserPath and RemoveUserPath commands that can be called from any UI. After the AddUserPath command has been run once for a particular folder, the library engine will store the information about this folder in its settings and will keep on looking in subfolders of that location for any json files to be used as dataset.
To stop the Library_Engine from looking in this particular folder, use the RemoveUserPath command, providing a link to the folder you no longer want to be scanned by the Library_Engine.
Remember that the menu system of the Dataset dropdown components are built up using the subfolders, so even if only a single dataset is placed in this custom folder it might be a good idea to still put your json file in an appropriate subfolder.
How to access BHoM Datasets programmatically
Accessing various datasets, such as material or section datasets, can be useful when coding for BHoM. For example, you may need datasets when coding C# Unit Tests, or when programming some particular Engine function.
Access BHoM Datasets from a C# program, you need to ensure the correct dependencies are added to your project. The following steps will guide you through the process of adding the appropriate dependencies and demonstrate a few methods for accessing your desired dataset.
Step 1: Access Reference Manager
Access the Reference Manager in the C# project where you want to add the dependency.
Step 2: Browse for the DLL
Go to the "Browse" tab and click the "Browse" button in the bottom-right corner.
Navigate to the BHoM assemblies folder using the File Explorer window. The folder is usually located at C:\ProgramData\BHoM\Assemblies. Select Data_oM.dll and press "Add."
Step 3: Add Dependency
Make sure to check the box next to Data_oM.dll in the Reference Manager window and press "OK."
Step 4: Modify File Path
Open the project file of your specific C# project by double-clicking it with the left mouse button. Locate the line responsible for loading Data_oM.dll and modify the file path as shown in the image below.
Step 5: Get the Dataset data
The following example demonstrates how to access the Section Library from BHoM, specifically the .
To access the library, use the Match
method as shown in the example below. This returns the HE1000M
section defined in the EU_SteelSectionLibrary
dataset.
var steelSection = BH.Engine.Library.Query.Match("EU_SteelSections", "HE1000M", true, true) as ISteelSection;
The Match method takes four arguments:
- Library Name: "EU_SteelSections"
- Object Name: "HE1000M"
- Case Sensitivity: true or false
- Consider Spaces: true or false
The boolean values allow you to specify whether your search should be case-sensitive and whether to consider spaces within the object name.
Find Existing Libraries
If you're unsure about the available datasets, check the BHoM_Datasets repository.
Under BHoM_Datasets\DataSets, you'll find multiple folders and subfolders containing numerous json
files. Each json is a dataset, and each folder acts as a dataset library.
For example, in the folder [BHoM_Datasets repo folder]\BHoM_Datasets\DataSets\Structure\SectionProperties\EU_SteelSections
you will find the following json files:
These .json files contain multiple objects. To extract objects from these datasets, you'll need the name of the desired object. This can be found as an attribute within the .json file. To locate these names, you can open the .json file in an editor like Visual Studio Code and search for the object name you need.
Compliance
Compliance regulations for Datasets are outlined in IsValidDataset.
Source
For users of the data to be able to verify where it is coming from, it is important to populate the Source object for the dataset. As many of the properties of the source as available should generally be populated, with an emphasis on the following:
Title
The title of the publication/paper/website/... from which the data has been taken.
SourceLink
An HTTP link to the source. Important to allow users of the data to easily identify where the data is coming from.
Confidence
Level of confidence both in the data source and in how well the serialised data in the BHoM dataset has been ensured to match the source. It should be noted that, independent of the confidence level on the Dataset, all Datasets distributed with the BHoM are subject to the General Disclaimer.
The confidence is split into 5 distinct categories, and the creator/distributor/maintainer of the dataset should always aim for the highest level of confidence achievable.
Undefined
Default value - assume no fidelity and no source.
Should generally be avoided when adding a new Dataset for distribution with the BHoM - one of the levels below should be explicitly defined.
None
The Dataset may not have a reliable source and/or fidelity to the source has not been tested.
To be used for prototype Datasets where no reliable data is available, and not for general distribution within the BHoM.
Low
The Dataset comes from an unreliable source, but the data matches the source based on initial checks.
For cases where no reliable source for the data type is available. Can be allowed to be distributed with the BHoM in circumstances where no reliable source can be found and the data still can be deemed useful.
Medium
The Dataset comes from a reliable source and matches the source based on initial checks.
For most cases the minimum required level of confidence for distribution of a Dataset with the BHoM. To reach this level of confidence, the Source object should be properly filled in, and a substantial spot checking of the data should have been made. If at all possible, maintainers of a Medium confidence level Dataset should strive to fulfil the requirements of High confidence.
High
The Dataset comes from a reliable source and matches the source based on extensive review and testing.
Highest level of confidence for BHoM datasets, and should generally be the aspiration for all Datasets included with the BHoM.
To achieve this, a clear testing procedure should generally be in place, which outlines how all of the data points in the Dataset have been checked against the source data and/or verified by other means to be correct.