My views on type systems

As my programming experience begins to diverse, from traditional command-line programs to web and app development, I have used many programming languages, some of which I like, and some I don’t like. One particular aspect of programming languages, type system, contributes to my feeling significantly.

Data types are a kind of constraint bound to the data, defining what we can and cannot do, and how operations behave. It can be analysed in different dimensions:

Strong vs Weak

No or very few automatic type conversions are allowed in a strong type system, in particular, no data loss is allowed by automatically converting types. However, in a weak type system, most things can be implicitly converted, sometimes in an unintuitive way or even in an undefined way. In particular, if the language provides an identity operator (===) in addition to an equality operator (==), its type system is almost certainly weak. By having a weak type system alone is already enough for me to dislike a programming language because it allows me to make errors uncaught, yielding unexpected behaviour. Examples of strong type systems are Java, Haskell and Python, and examples of weak type systems are C++, Javascript and PHP. Though, the notion of strong or weak is not absolute, for example, the type system of C++ is slightly stronger than C in regard to automatic pointer conversions, but as not as strong to Java in regard to automatic number conversions.

For example, the type system of Javascript is extremely weak, which lead to a lot of bugs circulating on the World Wide Web. By language definition, the following are all true:

0 == "0"
0 == []
"0" != []
0 == [[]]
-1 != true
-1 ? true : false
null != true
null != false

It is already confusing enough which I can’t bother to type anymore. Google “Javascript equality table” and you will find a lot of messy tables. Moreover, the rules of implicit conversion in different programming languages are different. For example, the following comparisons give different results on PHP and Javascript, although the intended meanings from the code are the same in both languages:

Code PHP Javascript
-1 == true true false
"16" == "0x10" true false
null == false true false
null == [] true false
0 == [0] false true

It is impossible to know, from the code, whether the result of comparison behaves as you expect unless you read the documentation. Because the behaviour is brain-damaging, there is an identity operator in both languages, but sometimes they also give different results in different programming language:

Code PHP Javascript
[] === [] true false

What the fuck? An empty array is not identical to an empty array in Javascript? Oh no, in Javascript, arrays are objects, not values, [] actually creates a new Array object, what you get is only a pointer to the array, however, in PHP, an array is a first-class value which represents itself. The Javascript version of array comparison is false because you are comparing pointers to two different objects, although both objects behave the same. (This is a little bit off-topic though)

The unpredictability of operations between different data types is confusing enough such that I always use the identity operator for comparison, and explicit type casts when operating between types. Even worse, when a weak type system is combined with a dynamic type system:

Static vs Dynamic

In static type systems, all expressions have types known before running the program. In dynamic type systems, only values have types, but expressions have not until the program is run, and you can assign different types of values to the same variable according to program logic. For example, C++, Java, Haskell are all static and PHP, Javascript, Python are all dynamic. Traditionally, static type systems are associated to compiled languages, and dynamic type systems are associated to interpreted languages, but it is not always the case as some languages can be both compiled and interpreted.

A weak type system combined with a dynamic type system, with an incompetent programmer, is like explosive handed to an idiot. It basically makes program analysis impossible because you can’t even know what is the behaviour of operations apart from running the program itself! In particular, if the programmer “takes advantage” of the fact by assigning different types of values to the same variable according to program logic, the whole thing is doomed in a weak type system. It is like taking the express train to hell.

Safe vs Unsafe

In a safe type system, there is no way to circumvent the type system. In an unsafe type system, the language provides ways to circumvent the type system, typically with undefined or implementation-defined behaviour. reinterpret_cast in C++ is probably the best example of an unsafe type system, by providing the mechanism exactly for circumventing the type system. Most programming languages in the world are safe, including even PHP and Javascript, that all type conversions have well-defined behaviour, no matter implicit or explicit. The notable exception is C/C++, which the ability is useful for doing hardware-level programming.

An unsafe type system is not necessary evil, but an unsafe type system combined with a weak type system is definitely a way to unnoticeable disaster, like the automatic integer-to-pointer conversion in C.

Explicit vs Implicit

In an explicit type system, the types have to be specified for all identifiers, therefore, if it is initialised with a wrong type, it is caught immediately. However, in an implicit type system, type signatures are optional and types are deduced if not specified. It is an error if the type cannot be deduced. This notion only applies to static type systems because in dynamic type systems, identifiers don’t have types.

I prefer the ability to deduce types because I am lazy and don’t want to type a VeryLongContainer::Iterator for a variable which is used only in a few lines.

Nominative vs Structural

In a nominative type system, two types are compatible (i.e. can be replaced with the other) only when declared so, in a structural type system, two types are compatible if their structures are the same (aka duck typing). One of the incompatibility between C and C++ is their behaviours passing structs. In C, as long as both types satisfy some common criteria, their pointers can be safely casted to each other, which is not true for C++, where proper OOP is encouraged instead.

I prefer to use a nominative type system to prevent accidentally substitute some incompatible type only because it has a method of the same name.

Conclusion

The type system of a programming language directly affects the experience of developing with that language. In particular, what I hate the most is the combination of weak and dynamic systems because I always make mistakes on them, unfortunately, it includes the two most popular languages for web development: PHP on server side, and Javascript on client side, one of which I don’t really have a choice.

Leave a Reply

Your email address will not be published. Required fields are marked *