First Look at Haystack

I recently saw a few people mentioning Haystack in a few BACnet circles and decided it was time for a little exploration.

Little did I know how weird it would be…

First contact

Project Haystack

Here’s the short intro from haystack.org:

Project Haystack is an open source initiative to streamline working with data from the Internet of Things. We standardize semantic data models and web services with the goal of making it easier to unlock value from the vast quantity of data being generated by the smart devices that permeate our homes, buildings, factories, and cities. Applications include automation, control, energy, HVAC, lighting, and other environmental systems.”

Let’s start with the mention of the Internet of Things. IoT can be resumed as this: Eventually, all our things (fridges, lights, toilets, toasters…) will be accessible on the Internet. How can we make use of that?

The next thing to notice is that they clearly mention working on two different things at once:

(…) We standardize semantic data models and web services (…)

For a project this young to already try to deal with 2 huge endeavours at once is a little scary. Standardizing the semantic data models for everything is… well, complicated. In addition, it wants to also define the web services for those objects. The danger with this approach is that if any of those is suboptimal, or simply bad, it will drag down the other side with it. The more you try to include in a single standard, the more bloated it will be, quickly becoming scary and unwelcoming. Rather than being drawn to your standard for its wisdom, people might try to avoid it completely because of its bureaucracy.

Nevertheless, chance (and experience) could be on their side and they might achieve both of these goals at the same time.

And yet, I must say I was not impressed by what I saw. If haystack had been designed 30 years ago, I could have understood most of its choices. But considering it wants to deal with the Internet of Things, it should rather be focused on the next 50 years.

Here are some of my thoughts on Project Haystack.

Tags System (Which Are Not Tags)

Haystack can look a little weird at first, but it’s mostly because they use the term tag where people would usually use property.

From the doc:

A tag is a name/value pair applied to an entity. A tag defines a fact or attribute about an entity.

On some websites, you can tag something with a given keyword, but you won’t add an additional value to the keyword. The tag itself is the value. Think of the tag outside or sunny on a photograph.

In Haystack, the tags (properties) can have different predefined values. Here are a few examples:

speed : Speed point of a vfd measured in “%” where 0% is off and 100% is the fastest speed.
unit : Unit of measurement identifier from unit database.
sunrise : Boolean point associated with a weather station for historizing sunrise and sunset times.
rooftop : Used with ahu to mark an AHU as a packaged rooftop unit (RTU).

What quickly becomes clear when you read through the tags list is that this isn’t a project for the Internet of Things, but rather a project for the things that happens to be used in the HVAC world, which is a really, really, REALLY small subsets of all the things.

While this might not look like a big deal for those in our industry, it will be a huge deal in the next few years. The reason is that a bunch of things that aren’t considered worthwile for us will still find their way on the Internet. Will my TV remote be connected to the Internet? Will my computer keyboard be connected? Will… my shoes?

Tools will be built to analyse data from all these devices, which means that if your standard doesn’t even consider them, the tools and ecosystem developed won’t be for your standard.

Tag Kinds

Here again, words have a slightly different meaning.

A kind is one of the permitted value types of a tag (…)

In other words, a kind is a data type.

I will now comment on a few kinds (types) for which I think additional thought is required:

Marker

The tag is merely a marker annotation and has no meaningful value. Marker tags are used to indicate a “type” or “is-a” relationship.

The marker type is exactly what tag should have meant. That’s what happens when you start using words without sticking with their meaning: you end up having to come up with other words for the meaning you just threw away.

Number

Number: integer or floating point number annotated with an optional unit of measurement.

Why only those? Why no imaginary numbers? Why no ratios? Why no uncertainty?

Especially considering that there are currencies unit (more on this later), for which case you absolutely don’t want to deal with floating point numbers!

Again, we should have the next 50 years in sight when designing the standard.

Str (because string was too long?)

Str: a string of Unicode characters.

Unicode is great!

Does the encoding matter? If I send you a UTF-16 encoded text instead of UTF-8, is it okay?

Ref

Ref: reference to another entity. Haystack doesn’t prescribe a specific identity or reference mechanism, but they should be some way to cross link entities. Also see Containment. We format refs with a leading “@” and require a specific subset of ASCII characters be used: a-z, A-Z, 0-9, underbar, colon, dash, dot, or tilde.

So let me make this clear: Haystack wants to be a modern standard for the Internet of Things, but doesn’t specify a reference mechanism? Wouldn’t that be the hard part, the part where you need a standard?

Even worse; it still manages to limit the reference to a subset of ASCII!

It doesn’t explain how it works, nor how it could be compatible with other implementation, but it makes sure to limit the characters you use.

This looks like someone was using some kind of reference property in his data scheme and decided to include it into the standard in a way that would be compatible with his own code.

Coord

Coord: geographic coordinate in latitude/longitude formatted as C(lat,lng)

Finally! Yes! Coordinates are a good way to drop all the street address stuff.

I could easily see devices that are aware of their own location in the near future. Planning for that is nice!

Tho I would be a little happier with an elevation.

Time stuff

Date: an ISO 8601 date as year, month, day: 2011-06-07.

Time: an ISO 8601 time as hour, minute, seconds: 09:51:27.354.

DateTime: an ISO 8601 timestamp followed by timezone name:2011-06-07T09:51:27-04:00 New_York 2012-09-29T14:56:18.277Z UTC

Why different time types? Why not everything in a unix-like timestamp?

If you want to use ISO 8601, use ISO 8601. Don’t append an additional timezone to it.

ISO 8601 already provides a way to encode the UTC offset in the datetime.

In Haystack, we use the term timezone to encapsulate two concepts: offset from UTC and daylight saving time rules. For example, US Eastern Standard Time is -5hrs from UTC. But between 2am on the second Sunday of March and 2am on the first Sunday in November is daylight savings time (DST) and is -4hrs from UTC.

Why would you even do that? If I have an ISO 8601 timestamp, I already know the offset from UTC, but I also know what is the current time at the given location (DST or not). In other words, timeseries recorded with ISO 8601 are already, at the recording time, being formatted to correctly represent the legal hour at the instant.

Tag Names

Tag names are restricted to the following characters: (…)

Must start with ASCII lower case letter (a-z) Must contain only ASCII letters, digits, or underbar (a-z, A-Z, 0-9, _) By convention use camel case (fooBarBaz)

And now, allow me to ask: why? Why? WHY?!?!

Why should it start with an ASCII character? (1960s USA here we come!)

Why should it be a lower case character?

Why should it contain only ASCII letters, digits, or underscore?

Restricting tag names, ensures they may be easily used as identifiers in programming languages and databases.

This looks like someone is trying to make the standard fit a particular existing codebase.

You are making a new standard, at least make sure it’s compatible with the Internet TODAY (UTF-8), and try to make it compatible for the next 50 years.

Unicode in programming languages is not a problem anymore. Perl, Python, Java, .NET, Go, Javascript and even Scheme and Common Lisp support unicode.

Id

The id tag is used model the unique identifier of an entity in system using a Ref value type. The scope of an entity is undefined, but must be unique with a given system or project. This identifier may be used by other entities to cross-reference using tags such as siteRef, ahuRef, etc.

No. No. NOOOO!

If we have learned anything from past ID mechanisms, is that systems will always grow/merge in ways we didn’t anticipate. Remember when BACnet IDs were a stupidly big numbers? Ah! Remember when IPv4 had more addresses than we could ever hope to use?

If you make an ID, make sure you can really use it to identify something. If you intend to touch the Internet in any way, make sure it’s a worldwide identifier.

Units

Haystack says that each numeric point must come with a unit for the standard unit database. So far, so good.

But it immediately goes downhill from there: Units can be SI (obviously) or US/imperial.

Seriously? You don’t start to design a standard for the next 50 years by including anything else than SI. (There’s only 3 countries left that aren’t on SI, you can see them on this map.)

I sure hope there won’t be more than one in the next 15 years.

If you want to work in Imperial, fine! But do your conversions in your own application, don’t burden the standard with it.

This is from their units documentation page:

Pressure:

kilopascal, kPa

pounds_per_square_inch, psi

inches_of_water, inH₂O

inches_of_mercury, inHg

There’s only 1 of those required; all the others should be derived from it.

Previously, I’ve shown the example of the speed tag. Let’s revisit it:

Speed point of a vfd measured in “%” where 0% is off and 100% is the fastest speed.

Here, the unit of a tag/property called speed is in percentage. But why? That’s not a unit of speed! How about calling it usage, or anything else where someone could expect to get percentages? Note that it also specifies it’s for VFD. Anything else that could use the tag speed is now screwed.

Now take a look at this:

– temperature differential –

fahrenheit_degrees,Δ°F

celsius_degrees,Δ°C

kelvin_degrees,ΔK

There’s a unit for temperature differential?! That’s not even a unit! 10°C - 5°C is still in °C. How the hell is this even in the standard?!

This is scary stuff.

Money

Interestingly, the currencies are included in the units. One could wonder why… they don’t really represent anything physical. It’s not energy, not temperature, not pressure, not flow…

Why would currencies be included into the standard? For billing?

If we conclude that currencies are indeed needed in this, we should spend a long time wondering how to represent them.

Here’s some of the currencies available in the units database, with some 3 letters code and their symbol.

australian_dollar, AUD

british_pound, GBP, £

canadian_dollar, CAD

chinese_yuan, CNY, 元

euro, EUR, €

us_dollar, USD, $

Where are those 3 letters codes coming from? ISO 4217? The same standard that isn’t logical with itself (hello, EUR!) and which is using troy ounces for precious metals?

Why accept symbols at all? Why is $ reserved for USA? Why is it “us_dollar”, and not “usa_dollar”? (It’s ‘United State of America’, not ‘United States’.)

Countries change, regimes rise and fall, currencies are inflated into oblivion or are re-evaluated… those are little details you have to think of when trying to insert currencies into a standard.

What about nationless currencies? In 50 years, could Bitcoin or another cryptocurrency be widespread?

This all smells like someone saying “Oh, I could use some price value in this situation…” without thinking any further.

In addition, like I said in an earlier section, you don’t want to deal with money if you are using floating point numbers!

Bloating

Haystack is supposed to be brand new, but it’s already bloated with useless or duplicate data.

Let’s take a look at an example given in the Haystack docs:

id: @whitehouse

dis: “White House”

site

area: 55000ft²

tz: “New_York”

weatherRef: @weather.washington

geoAddr: “1600 Pennsylvania Avenue NW, Washington, DC”

geoStreet: “1600 Pennsylvania Ave NW”

geoCity: “Washington D.C.”

geoCountry: “US”

geoPostalCode: “20500”

geoCoord: C(38.898, -77.037)”

The essential information:

GeoCoord

The redundant information:

tz
geoAddr
geoStreet
geoCity
geoCountry
geoPostaCode

Every single tag I included is redundant information that can be obtained with the coordinates.

Worse, it’s unstable data.

What if a street is renamed? Street number changed? Timezone subdivided? Not only is it highly unlikely that someone will go update all the devices already installed, but those fields are now worthless in any kind of logs. (You’d have to update them by using timestamps and historical data.)

Point Min/Max

The following tags may be used to define a minimum and/or maximum for the point:

minVal: minimum point value maxVal: maximum point value

When these tags are applied to a sensor point, they model the range of values the sensor can read and report. Values outside of these range might indicate a fault condition in the sensor.

Which means that for each data point you get, you then have to check if the min/max were set and if the data is within the range. It would be much more efficient and less prone to error to simply send back an error value.

Final Thoughts

What I just talked about are the most obvious flaws I saw while reading the Haystack documentation.

Like I said at the beginning, 30 years ago I could have understood many of the characteristics I just criticized. Unfortunately, this is a new standard, it should be able to offer something new and deal with the modern ecosystem.

From where I stand, if we discard the various flaws, it still looks like more of the same. BACnet, Lon, Haystack? Meh. If you try to touch it from the Internet, it still is painful.

Haystack doesn’t really deal with the Internet of Things. It just tries to make the current controls world data a tiny bit easier to deal with. (And I’m not sure they succeed…)

In addition, there’s a clear influence of an existing codebase over the standard. It appears that questionable choices were made to keep things compatible with a particular player.

On a plus side, Haystack does look much less bureacratic than BACnet. It also has a an active forum and documentation available on the web, without the need to ‘purchase’ the specs. (I’m looking at you ASHRAE. Haystack doesn’t claim to be ‘open’ and then stick a copyright stamp on themselves…)

Haystack Forum

For a new hacker, Haystack is probably much less overwhelming and easier to approach.

Were I new in the field and looking for some fun, I would probably try Haystack. For robustness and technical soundess however, I would still side with BACnet.