Programming is an important skill to properly tell a computer what it has to do. This chapter gives an introduction of the Pharo programming language. Although we tried to make this chapter smooth and easy to read, having some basic programming knowledge is expected.
Pharo is an object-oriented programming language, class-based, and dynamically typed. The chapter therefore begins with a brief introduction of what programming with objects is all about. The focus will subsequently moves toward Pharo.
Instead of giving a long rhetorical description about object-orientation, let us pick a simple example. The following code snippet opens a Roassal view with a label showing
To execute the script given above, you need to type it in a playground, and press the green triangle (Figure 1.1). This script displays the message
Hello World. The word
RTView refers to a class. A class is an object factory and its name is easily recognizable because of its first letter, which is always a capital letter. A class is like a baking pan for cakes: creating an object is like backing a cake. All the cakes produced by a pan have the same physical aspects, but attributes, such as ingredients, may vary.
The first line creates an object view. An object is created using the message
new. For example, the expression
RTView new creates a view,
String new creates an empty string character,
Color new creates a black color. The view, produced by executing
RTView new, is said to be the object (for instance) produced by
object asString sends the message
asString to an object, referenced by the variable
In Pharo, a class is also an object, which means that objects are created by sending a message to a class. The message
new is sent to the class
RTView, which has the effect of creating an object. This object is assigned to the variable
v using the operator
In the second line, the message
elementOn: is sent to the class
RTLabel. An argument is provided to that message, which is a string character
'Hello World' as argument. These Roassal instructions simply creates an element that has a shape label. That element is passed as argument to the message
add:. The effect of
add: is simply to add the element in the view. The view, referenced by
v, understands the message
add: because the class
RTView defines a method
Consider this script (Figure 1.2):
This example renders 6 circles, each having a proper size. The expression
#(20 40 35 42 10 25) defines an array, containing a few numbers. The expression
RTEllipse new size: #yourself creates an object of the class
RTEllipse by sending the message
new. The message
size: is sent to that ellipse object, with the symbol
#yourself as argument. This message
size: configures the size of the ellipses: The size of each circle is computed with the model object when creating the element. In particular, the message
#yourself will be sent to each element of the array
values. For example, the size of the circle representing the value 35 has a diameter of 35 pixels.
Circles are then lined up using a dedicated layout, invoked by sending the message
on: to it with the elements as argument. The expression
elements @ RTLabeled labels each elements.
Most visualization engines and data analysis environments operate on the principle illustrated above: scripts are typed in a workspace or a webpage, and executed to produce a visualization. This approach to build a visualization or a small application is appealing since it is self-contained: all one needs to know is within the linear script and the expressed logic is made explicit. However, this way of developing software artifacts has serious limitations. Maintenance and evolution are seriously diminished. For example, a 200-line long script is painful to modify and confusing to look at. If not properly structured, adapting a complex visualization may have the fantastic ability to consume a ridiculously large amount of time. This is a situation well known to journalists, data scientists, and also software engineers. Fortunately, a couple of decades ago the Software Engineering research community produced a way of programing that is able to cope with the inherent complexity of software artifact development. Object-oriented programming is the most successful way to handle complex and large development.
Object-oriented programming simplifies the programming activity. Handling objects, instead of functions or code snippets, uses a metaphor that is familiar to us, humans: an object may react upon some actions, have a behavior on its own, and may hide details about how it is physically built.
Let us bring a bit of theory in all this. There are five essential ingredients to an object-oriented system:
These five pillars are not particularly tied to a programming language. So, in theory, it is perfectly doable to have an object-oriented design in a procedural language such as C. However, having a programming language that enforces these principles greatly alleviates the task of the programmer.
There are numerous object-oriented languages around and Pharo is one of them. Pharo differs from other languages by offering an homogeneous way of expressing computation: everything is an object, therefore computation happens by sending messages. When objects are taken seriously, there is no need for primitive types (e.g.,
float), language operators, and even external compilable files! Considering only message sending significantly reduces the amount of technological details associated with the program execution that most mainstream programming languages unnecessarily expose.
Sending a message is the elementary unit of computation. Understanding how to send a message is key to feel comfortable in Pharo. Consider the expression:
This expression sends to the string object
'the quick brown fox jumped over the lazy dog' a message having the name
#replaceAllRegex:with: and two arguments,
'cat', themselves two string objects. The result of sending this message is
'the quick brown cat jumped over the lazy dog', another string.
In Pharo, a character string (often simply called a string) is a collection of characters written between two accents (e.g.,
'fox'). A string is a plain object, which means one can send messages to it. A message is composed of two essential ingredients: a name and a collection of arguments. It may be that the set of arguments is empty. For example, the expression
'fox' asUppercase, which evaluates to
'FOX', sends the message
#asUppercase to the string
'fox'. No arguments are involved here.
Message sending is at the heart of the Pharo language, and is therefore well expressed within its syntax. There are three kinds of message sending:
'fox' asUppercasesends a unary message to the string
>>. The expression
2 + 3sends to the object
2a binary message named
+with the argument
3. You may notice that this expression has therefore a different semantic than
3 + 2, although the result is obviously the same. Note that the expression
3 + 2 * 2returns
10, and not
7as one may expect. If you wish to enforce mathematical priorities in arithmetic operations, use parenthesis, as in
3 + (2 * 2).
'the quick brown fox jumped over the lazy dog' includesSubstring: 'fox'. This expression evaluates to
true. The name of the keyword message is
#includesSubstring:and the argument is
'fox'. Each argument is preceded by a keyword. For example, the message
replaceAllRegex: 'fox' with: 'cat'contains two keywords and therefore two arguments. Arguments are inserted within the message name.
Sending a message triggers a mechanism that searches for a method to execute. This mechanism, often called "method lookup", begins from the class of the object up and goes to the superclass if not found.
An object is a bundle of data to which messages can be sent to. An object is created most of the time by sending the
new message to a class. This is revealing the true nature of classes, being an object factory. A class may produce as many different objects as
new is sent to it. Objects produced from a unique class are different but understand the same set of messages and have the same variables. Differences between two or more objects issued from the same class are the the values given to these variables. For example, consider the following expression:
This expression sends three messages, twice the message
new and once the message
==, used to compare object identities. The expression evaluates to
false, since the two objects are different, i.e., located at different physical memory location.
Point new creates a point by sending the message
new to the class
Point. There are several ways to create a point:
Point newcreates a point (0, 0). All classes in Pharo understand the message
new. Except when explicitly prohibited, an object is created by sending
newto the class.
Point x: 5 y: 10creates a point (5, 10). This expression sends the message
x:y:to the class
10as arguments. The class
Pointdefines the class method
x:y:. The difference between
x:y:is that the latter allows one to create and initialize a point with a given value for
2 @ 3sends to the object
2the message named
@with the argument
3. The effect is the same than
Point x: 2 y: 3, which is to create the point (2, 3).
Each class has its way to create objects. For example, a point is not created the same way as is a color. Creating an object is also commonly mentioned as "instantiating a class" and an object is often referenced as "instance".
A class is an object factory and an object is necessarily created from a class. An object associates values to the attributes defined by the class of the object. As discussed above, objects interact by sending messages. An object is able to understand messages corresponding to methods defined in its class, and methods defined in the chain of superclasses.
A class is a factory of objects, often regarded as an abstraction of objects. You need to create classes as soon as you wish to bundle logic and data together (i.e., "doing hands on work").
A class belongs to a package. You may want to create a dedicated package to contain the classes you will define. A package is created by right-clicking on the package list in a system browser. We will define a class Tweet:
Tweet. Classes are created by filling the following template in a code browser:
NameOfSubclass has to be replaced by the name of the class you wish to create. After the keyword
instanceVariableNames: you need to provide the instance variables, and after
classVariableNames: the class variables. Right click on the code and select the menu accept to effectively create the class. You should have
You should obtain something similar to Figure 5.1. We have defined the class
Tweet, contained in the package
TweetsAnalysis. The class contains three instance variables,
date. No methods have been defined so far. Note that in Pharo, an instance variable name begins with a minuscule letter.
A method is an executable piece of code. A method is composed of instruction statements typically aiming to carry out a computation. We will define a small mathematical example to illustrate the creation of a method. We will therefore leave out our Twitter example for a short while.
The Fibonnacci sequence is a well known sequence of numbers obtained with the formula
F(n) = F(n-1) + F(n-2). Terminal cases are given with
F(0) = 0 and
F(1) = 1.
We will implement the Fibonacci formula as a method defined on the class
Integer. This class describes all the integer numbers in Pharo. First, let us open a system browser on this class. Spotter is a tool for searching in Pharo (Figure 6.1). We will therefore search for the
Integer class and opens a system browser on it.
Integer in Spotter and select the corresponding class by pressing the Enter key or clicking on it using the mouse. Select the
arithmetic protocol (third list panel) and enter the following code in the lower text pane:
After having entered the code, right click on it and select Accept. Accepting a method compiles it and makes it executable.
Open a playground, type and execute
10 fibonacci. You will see 55, its result (Figure 6.2).
self word refers to a pseudo-variable that designates the object having received the message. When executing the expression
self refers to the object
10. The expression
self <= 1 is
self is either
1 or smaller. If this is the case, then we exit the method with
ifTrue: [ ^ self ]. The caret character (
^) is a return statement: it exits the method and returns a value. If
self is greater or equals
2, then the result is the sum of
(self - 1) fibonacci and
(self - 2) fibonacci.
Another common pseudo-variable is
super. The two pseudo-variables
super reference the same object, the object that has received a message. The unique difference between
super is characterized when when one sends a message to it, in particular:
selftriggers the method lookup from the class of the object,
supertriggers the method lookup from the superclass of the class that contains the call on
Coming back to our Tweet example. Define the following six methods on the class
These methods will enable one to set the content of a tweet and query about it.
Click on the Class button in the system browser. Clicking on it switches the system browser to the class side: methods defined on that side are class methods of the class
Tweet. Define the method (Figure 6.3):
createFromURL: fetches a CSV file we have prepared for that example. The file contains 1000 random tweets. It does a simple parsing of the content by identifying the comma.
Next, you can define the method:
The provided url is an example we have prepared to illustrate our purpose. Open it in a web browser to see what it looks like. At that stage, evaluating the expression
Tweet createFromExample returns a list of 1000 tweet objects, each tweet describing an entry of the online CSV file.
We will define two new methods on the class Tweet. Switch to the instance side (i.e., unselect the Class button in the system browser), and define the following two instance methods:
words simply returns all the words defining the content of a tweet. It uses
substrings which returns a list of words from a string. For example, the expression
'fox and dog' substrings return
#('fox' 'and' 'dog'). The method
isSimilarTo: takes as argument another tweet and returns
false whether the tweet argument is similar to the tweet that receives the message
isSimilarTo:. The notion of similarity we use here is: two tweets are similar if they have at least 6 words in common.
So, we have some objects and a way to establish a relation between them. This is more than enough to start to visualize them. Open a playground and type (Figure 6.4):
We see that only a few of the tweets have actually common words and most of them are negative.
A block closure (also simply called "block") is a piece of code associated to an environment. A block is manipulable, as any object is (i.e., a block may therefore be provided as message argument and be assigned to a variable). The expression
[ :value | value + 5 ] is a block closure that takes one parameter and return the sum between that argument and
5. This block may be evaluated with an argument using the message
value:. Consider the following code snippet:\
Recall the definition of the
fibonacci method, defined on the class
ifTrue: takes a block
[ ^ self ] as argument. In case that
self <= 1 evaluates to true, the block is evaluated and triggers an early exit of the method. The expression
^ self exits the method. The block uses the pseudo-variable
self. A block may access variables defined in the outer lexical scope. A block may use temporary variables, instance variables, and argument variables.
As illustrated in the Fibonacci example, a condition is expressed using the
ifTrue:ifFalse: message. Obviously, it expects to have a boolean as receiver. This message takes two blocks as argument, the first one is evaluated in case the boolean receiver is
true, or the second block is evaluated in case the receiver is
false. Variant exists such as
ifFalse:. For example,
true ifTrue: [ 5 ] evaluates to
5. The receiver can naturally be a combination of boolean expression such as
(5 < 1) ifFalse: [ '5 is not less than 1' ].
Collection is a very common data structure. As previously illustrated, the expression
#(23 42 51) defines an array, instance of the class
Array. This class, and its superclasses, defines a large number of methods. Two operations are very common in Pharo: transformation and filtering of collections.
A transformation is typically realized using
collect:. For example,
#(23 42 51) collect: [ :v | v > 30 ] returns
#(false true true). The initial array of numbers is transformed as an array of booleans.
Filtering is carried out using
select:. For example,
#(23 42 51) select: [ :v | v > 30 ] returns
select: takes a block as argument. In case of
select:, the block has to evaluate to a boolean.
Collections in Pharo are rooted into the Smalltalk programming language, and is often an inspiration for other programming languages. Pharo's collections are rich and expressive. We have just seen the example of
Array. Another collection is
OrderedCollection representing an expandable collections. Elements may be added and removed during the program execution. For example:
This small script shows three squares.
Another useful collection is
Dictionary. A dictionary stores pairs of keys and values. For example, consider the following code snippet:
d at: #two returns the value 2.
The last bit of syntax is yet to be described. A cascade allows one to send several messages to the same object receiver. For example, instead of writing:
One could write:
The cascade, noted
;, is a syntactic construction to make code more concise by avoiding text duplication. It is frequently used in this book.
Pharo provides an expressive reflective API, which means one can programmatically get data about how Pharo code is structured and defined. Consider the following expression
RTShape methods size. This expression returns the number of methods that the class
RTShape defines. The message
methods is sent to the class
RTShape, which is also an object in Pharo. This message returns a collection of the methods defined on the class
Many examples contained in Agile Visualization visualize software source code and therefore use the reflective API. Source code is convenient to illustrates visualization because it is already available (no need to rely on external data) and is complex enough to deserve to be visualized.
This chapter gave a brief introduction to object-oriented programming. From now on, you should be able to understand Pharo syntax. We recommend a number of books to further discover the World of Pharo:
Pharo is a beautiful, elegant, and simple language. Pharo has a small and concise syntax, which makes it each to learn. Its programming environment is also highly customizable.
Building a sophisticated visualization or any non-trivial software artifact often face complex development. Mastering object-orientation is not strictly necessary in order to use Roassal. However, having a good command of object-oriented programming will considerably alleviate development and maintenance effort.
Pharo offers a powerful meta architecture. Do you remember that an object is created by sending the message
new to a class? In Pharo a class is also an object since we send
new to it, as in the expression
Color new. A class is therefore an object, itself an instance of another class, called a metaclass. And it does not stop here. A metaclass is also an object. Methods are also objects, a collection of bytecodes. Many parts of Pharo are truly beautiful, but going into more detail is out of the scope of this book.