Core Concepts

PureML Datasets are a crucial component for organizing user datasets. A Dataset serves as an empty container for storing the elements of the datasets and contains lineage, dataset-related graphs, and dataset files.

There are two types of datasets in PureML: Private Datasets, which only the user can access and view the content, and Public Datasets, which are accessible to all PureML users.

To register dataset files and add their relevant content to the Dataset, the user needs to initialize an empty Dataset, which can be done via the PureML Python package.

Creating a Dataset

With the PureML dataset module, you can perform a variety of actions related to creating and managing datasets and branches. Here’s an overview of the available methods:

Creating a Dataset To create a new model, import the pureml module and use the model.init method:

import pureml

pureml.dataset.init(label='FirstDataset:dev', readme='')

The name of the dataset and the branch to be created are required parameters. You can also provide an optional readme file path.

label parameter consists dataset name, branch in the following format:


For initializing a dataset, version is not required. So, we use <name>:<branch> as the label.

label should not contain any spaces. Special characters other than ”-” and ”_” are not allowed

Listing Datasets

To list all available datasets, use the dataset.list method:

import pureml


Creating a Branch

To create a new branch for a dataset, use the dataset.init_branch method:

import pureml


The branch name and the name of the dataset in which the branch will be created are required parameters.

Listing Branches

To list all available branches for a model, use the dataset.branch_list method:

import pureml


label parameter consists dataset name, branch in the following format,


For getting a list of branches of a dataset, branch, and version is not required. So, we use <name> as the label.

These methods make it easy to create and manage the models and branches in PureML. By using them, you can streamline your model management workflows and improve collaboration among team members.