Skip to main content
Version: 8.1


Inductive University

Working with Datasets

Watch the video

Datasets and PyDatasets

A Dataset can be thought of as a two dimensional list, or rather a list where each object is another list of objects. Datasets are not normally native to Python, but are built into Ignition because of their usefulness when dealing with data from a database. It is very common to deal with datasets in scripting, as datasets power many of the interesting features in Ignition, like charts and tables.

The main confusion when dealing with datasets is the difference between the Dataset object and the PyDataset object. Dataset is the kind of object that Ignition uses internally to represent datasets. When you get the data property out of a component like a Table, you will get a Dataset. The PyDataset is a wrapper type that you can use to make Datasets more accessible in Python. The biggest differences are seen in how we access the data in the two different objects. However, you can easily convert between the two with system.dataset.toDataSet and system.dataset.toPyDataSet, making it simple to use the object that you find easier to use.

Creating Datasets

Because Datasets are not native to Python, there is no way to naturally create them within scripting. Instead they must be created using the system.dataset.toDataSet function, which also allows you to convert a PyDataset to a Dataset. It requires a list of headers and a list of each row's data. Each data row list must be the same length as the length of the headers list.

Python - Creating a Dataset
# First create a list that contains the headers, in this case there are 4 headers.
headers = ["City", "Population", "Timezone", "GMTOffset"]

# Then create an empty list, this will house our data.
data = []

# Then add each row to the list. Note that each row is also a list object.
data.append(["New York", 8363710, "EST", -5])
data.append(["Los Angeles", 3833995, "PST", -8])
data.append(["Chicago", 2853114, "CST", -6])
data.append(["Houston", 2242193, "CST", -6])
data.append(["Phoenix", 1567924, "MST", -7])

# Finally, both the headers and data lists are used in the function to create a Dataset object
cities = system.dataset.toDataSet(headers, data)

All code snippets on this page will reference the cities dataset we created above, so place that code at the beginning of every code snippet.

Accessing Data in a Dataset

To access the data inside of a dataset, each dataset has a few functions that can be called on to access different parts of the dataset. These are listed in the table below.

data.getColumnAsList(colIndex)Returns the column at the specified index as a list.
print cities.getColumnAsList(0)
[New York, Los Angeles, Chicago, Houston, Phoenix]
data.getColumnCount()Returns the number of columns in the dataset.
print cities.getColumnCount()
data.getColumnIndex(colName)Returns the index of the column with the name colName.
print cities.getColumnIndex("Timezone")
data.getColumnName(colIndex)Returns the name of the column at the index colIndex.
print cities.getColumnName(1)
data.getColumnNames()Returns a list with the names of all the columns.
print cities.getColumnNames()
[City, Population, Timezone, GMTOffset]
data.getColumnType(colIndex)Returns the type of the column at the index colIndex.
print cities.getColumnType(3)
<type 'java.lang.Integer'>
data.getColumnTypes()Returns a list with the types of all the columns.
print cities.getColumnTypes()
[class java.lang.String, class java.lang.Integer, class java.lang.String, class java.lang.Integer]
data.getRowCount()Returns the number of rows in the dataset.
print cities.getRowCount()
data.getValueAt(rowIndex, colIndex)Returns the value at the specified row and column indexes.
print cities.getValueAt(1, 2)
data.getValueAt(rowIndex, colName)Returns the value at the specified row index and column name.
print cities.getValueAt(2, "Population")

Looping Through a Dataset

Oftentimes you need to loop through the items in a dataset similar to how you would loop through a list of items. You can use the functions above to do this.

Python - Looping Through a Dataset
# We use the same cities dataset from above. Using the range function, we can come up with a range of values that represents the number of columns.
for row in range(cities.getRowCount()):
for col in range(cities.getColumnCount()):
print cities.getValueAt(row, col) # Will print out every item in our cities dataset, starting on the first row and moving left to right.

Accessing Data in a PyDataset


PyDatasets can be accessed in the same ways that Datasets can. This means that all of the above functions ( getColumnCount(), getValueAt(), etc ) can be used with PyDatasets too.

PyDatasets are special in that they can be handled similarly to other Python sequences. Any dataset object can be converted to a PyDataset using the function system.dataset.toPyDataSet. All of the functions listed above can be used on a PyDataset, but the data can also be accessed much easier, similar to how you would a list.

Python - Accessing Data in a PyDataset
# First convert the cities Dataset to a PyDataset.
pyData = system.dataset.toPyDataSet(cities)

# The data can then be access using two brackets at the end with row and column indexes. This will print "PST"
print pyData[1][2]

Looping Through a PyDataset

Looping through a PyDataset is also a bit easier to do, working similar to other sequences. The first for loop will pull out each row, which acts like a list and can be used in a second for loop to extract the values.

Python - Looping Through a PyDataset
# Convert to a PyDataset
pyData = system.dataset.toPyDataSet(cities)

# The for loop pulls out the whole row, so typically the variable row is used.
for row in pyData:
# Now that we have a single row, we can loop through the columns just like a list.
for value in row:
print value

Additionally, a single column of data can be extracted by looping through the PyDataset.

Python - Extract a Column of Data by Looping Through a PyDataset
# Convert to a PyDataset
pyData = system.dataset.toPyDataSet(cities)

# Use a for loop to extract out a single row at a time
for row in pyData:
# Use either the column index or the column name to extract a single value from that row.
city = row[0]
population = row["Population"]
print city, population


A PyRow is a row in a PyDataset. It works similarly to a Python list.

The examples and outputs are based on the results in the table below. In addition, "print" commands are used, but should be replaced by appropriate logging methods (such as system.util.getLogger) depending on the scope of the script.

index()Returns the index of first occurrence of the element. Returns a ValueError if the element isn't present in the list.index(element)
for row in pyDataset:
print row.index("Apple")
print "No apples in this row"
No apples in this row
count()Calculates total occurrence of given element in the row.count(element)
for row in pyDataset:
print row.count("Apple")

Repeating Elements

You can also have repeating elements in a row:

for row in PyDataset
print row * 2
[u'Apple', u'Orange', u'Apple', u'Orange']
[u'Banana', u'Orange', u'Banana', u'Orange']
[u'Apple', u'Apple', u'Apple', u'Apple']

Altering a Dataset

Technically, you cannot alter a dataset. Datasets are immutable, meaning they cannot change. You can, however, create new datasets. To change a dataset, you really create a new one and then replace the old one with the new one. There are system functions that are available that can alter or manipulate datasets in other ways. Any of the functions in the system.dataset section can be used on datasets, the most common ones have been listed below:

The important thing to realize about all of these datasets is that, again, they do not actually alter the input dataset. They return a new dataset. You need to actually use that returned dataset to do anything useful.

For example, the following code is an example of the setValue function, and would change the population value for Los Angeles.

Python - Altering a Dataset Using the setValue Function
# Create a new dataset with the new value.
newData = system.dataset.setValue(cities, 1, "Population", 5000000)

# The cities dataset remains unchanged, and we can see this by looping through both datasets.for row in range(cities.getRowCount()):
for row in range(cities.getRowCount()):
for col in range(cities.getColumnCount()):
print cities.getValueAt(row, col)

for row in range(newData.getRowCount()):
for col in range(newData.getColumnCount()):
print newData.getValueAt(row, col)