Data Structures

What Are They?

It was mentioned in the introduction that programs are like recipes with ingredients and procedures. Procedures are the control structures and simple variables make ingredients, but sometimes simple ingredients aren't enough to make a recipe.

Eventually, programs you'll write will want more than just dummy variables holding little bits of data. There will come a need for records or collections of data that ars correlated. Lists and arrays will emerge. Fortunately, Lisp is really good at list processing.

There are two kinds of data structures really worth mentioning: lists and arrays. Lists are variable-length collections that can grow or shrink. Arrays are fixed-length lists.

"Now, why would I ever ever want to use an array, then?"

Suppose you wanted a record: player's name, X-location, Y-location, and health. If you know the length of this record is four, pulling the first value out of it would grant you the player's name. We'll see this later when we look at arrays. First, let's see some lists

Lists and cons

In any good Lisp book, you'll see a couple of really anachronistic functions. In the days of old, the first element of a list was its head (its "car"), and the rest of it was its body (its "cdr", pronounced "coulder"). The car of the list is a single value, and its cdr is a list of all items except the first. These lists are generated with the cons (short for "construct") and they look a little like this:

(setq ye-olde-cons (cons '"Klesk" '("Hossman" "Bitterman" "Orbb")))

That generates a list of the four coolest-named bots in Quake 3. The same thing can be done a lot easier (and without the apostrophes) using list:

(setq newe-liste (list "Klesk" "Hossman" "Bitterman" "Orbb"))

But, while this is easier in practice, let's look at the other way which makes it obvious with regards to this car / cdr situation...

(car ye-olde-cons) ; Returns "Klesk"
(cdr ye-olde-cons) ; Returns ("Hossman" "Bitterman" "Orbb")

"What? Why are these the functions for traversing lists?!?"

Lists aren't inherently editable. It takes a bit of finesse to do more than adding elements to the beginning or end of a list, so functions like this make things easier. Also, consider combining these functions; they get mighty powerful mighty quickly. The cdr of the cdr of the above list returns just the last two names, so consecutive calls to cdr on the results of the previous call can be used to peel elements off of the front of the list, and car can isolate them.

To make things a wee easier, Elisp provides nthcdr— it takes two arguments, a number and a list. It returns the result of calling cdr on the list n times.

(nthcdr 3 ye-olde-cons) ; Returns a one-element list containing "Orbb"
(car (nthcdr 3 ye-olde-cons)) ; The element "Orbb".

In the above example, it's important to remember that cdr returns a list, even if it has only one element. If it's only one element we're after, and we know its index (counting from zero), then Elisp gives us a shortcut: nth. This is shorthand for taking the car of the nthcdr of a list.

(nth 3 ye-olde-cons) ; Returns "Orbb" as a value.

Just to make things clear, in the above code, I used the variable "ye-olde-cons" a lot. It can be replaced with a list generated with list with absolutely no difference. Lists generated either way are the same internally to Elisp.

Operating on Lists

Modifying individual elements on a list isn't as straightforward as you think. One can't simply setq the nth element... and it kinda makes sense. If you call nth on a list, you're pulling the value at that index of the list. You can't set one string to another, so setq falls apart. Good thing there's an ace in the hole: push.

If you're familiar with other programming languages or theory in general, the words "push" and "pop" mean something special: stacks. The concept is simple enough: add or subtract elements to the list on one side only. pushing an element onto a list will add that element to the beginning (car) of the list. popping the list will remove that element from the top of the list and return it. The difference between pop and car is that pop will remove the element from the list. It is a "destructive" function.

Back to the point, modifying lists requires the list be taken apart and reassembled in the desired order (or with the desired element modified). Fortunately, we're talking Elisp, and this is only a few lines of code. Unsurprisingly, length will return the number of elements in a list.

Before I modify an element of a list arbitrarily, I'd like to spoil a little magic. push may seem like a magical little way to add something to a list, but it's a macro, and in the background, it does something like this:

(defun push (elt l)
"Push element elt onto list l"
(setq l (cons elt l))
)

That's the fancy, list-equivalent of incrementing something by saying "x = 1 + x". But, let's modify an element on a list. The following example can be placed into its own defun and generalized easily:

; List of planets from my favorite ZX Spectrum game...
(setq planets (list "Lave" "Diso" "Reorte" "Leesti" "Zaonce" "Orerve"))

; Replace "Reorte" with "Uszaa"
(let ((temp nil))
(dotimes (i (length planets))
(cond
((= i 2) (push "Uszaa" temp))
(t (push (nth i planets) temp))
)
)
(setq planets temp)
)

Or, using something less formal (this should probably use let), let's just remove that element altogether:

(setq temp nil)
(dotimes (i (length planets))
(when (not (= i 2))
(push (pop planets) temp)))
(setq planets temp)

Arrays

Let's switch gears to arrays. In terms of building and altering elements, arrays are much easier. The downside is that push and the like don't work here. They're best used for records of data... something like this:

(setq humanJ ["Joshua" "Rydekker" 14 185])

Notice that arrays get their own magical syntax with square-brackets. Arrays are their own special data type. The functions to access and modify elements are simple enough:

(aref humanJ 0); Returns "Joshua"
(aset humanJ 0 "Mike") ; Changes "Joshua" to "Mike"

aref takes the array as the first element, which is weird since it's the opposite of how nth works. Unfortunately, this is one of the inconsistencies of Elisp. aset takes the list, then the index, then the new value. Even I don't have that one memorized (which is alright, since I put my point over aset and press C-h f then Enter and the context help drives it home).

The most powerful thing imaginable is mixing arrays with lists. We're probably far enough in this tutorial that I can dump an entire, minimal program and see what happens. Meet "pres-mode": a script for making simple presentations. An iteration variable runs down a list of arbitrary length. Each element of the list is an array. The first element of the array's going to be a quick-and-dirty string describing the format rules, and the second element of that array's going to be the actual text. Pressing space will advance the next line. There is a simple regular expression in here (re-search-forward) which I'll talk more about in Section 6, but let's take a look at some data structures:

(defun pres-output()
(let ( (fmt (downcase (aref (nth pres-i pres-presentation) 0)))
(txt (aref (nth pres-i pres-presentation) 1))
(func nil)
(r 0)
)

; When first char in the fmt elemnt is "c", clear the buffer...
(when (> (length fmt) 0)
(when (string= (substring fmt 0 1) "c") (erase-buffer) )
)

(with-temp-buffer
(insert fmt)
(goto-char 0)
(setq r (re-search-forward ".*h\\(.\\)" nil t))

(cond
( (not r) (setq func (lambda () (insert (propertize
(concat "* " txt) 'face '(:height 180)))) ) )
( (string= (match-string 1) "1") (setq func (lambda ()
(insert (propertize txt 'face
'(:height 300 :family "Arial" )))) ) )
( (string= (match-string 1) "2") (setq func (lambda ()
(insert (propertize txt 'face
'(:height 260 :family "Arial" )))) ) ) )


)
; Get outta the temp buffer and do our business...
(when (functionp func) (funcall func))
(insert "\n") )
)

(defun pres-advance ()
"Increment the presentation index, and present from pres-presentation list.
Usually defined as SPACE."
(interactive)
(message (concat (number-to-string pres-i ) " / "
(number-to-string (- (length pres-presentation) 1))) )
(cond
( (< pres-i (length pres-presentation))
(progn (pres-output)(setq pres-i (+ 1 pres-i))) )
( t (message "End.") )
)
)


(defun pres-mode ()
"Start up pres-mode. Requires a data structure defined
called pres-presentation where the first element of
pres-presentation is a string and the remaining elements
are arrays. These arrays should have two elements-- each
strings. First element is the format, and the second element
is the content. If the format begins with a 'c', the screen
is cleared. If it contains h1 or h2, a heading face is used,
if supported. When running, press SPACE to advance presentation.
An example presentation would look like this--

\(setq pres-presentation
(list
\"Example Presentation\"
[\"ch1\" \"Example\"]
[\"h2\" \"Ex2\"]
[\"\" \"Ex3\"]
[\"c\" \"Ex4\"]
)
)
\(pres-mode)
"

(interactive)
(setq pres-i 1)
(cond
( (not (boundp 'pres-presentation))
(message (concat "List pres-presentation missing... "
"see C-h f pres-mode for details.")) )
( t (progn (switch-to-buffer (nth 0 pres-presentation) )
(local-set-key (kbd "SPC") 'pres-advance) ) )
)

)

; THIS FOLLOWING part won't get bundled up with pres-mode.el... it's
; expected to be implemented by the end user.
(setq pres-presentation
(list "Emacs Lisp"
["ch1" "Emacs Lisp"]
["h2" "Mark Burger"]
["h2" "March 11, 2014"]
["ch2" "Introduction"]
["" "Emacs predates many conventions held by modern text editors"]
["" "Invented by Richard Stallman, it was originally editing macros for TECO"]
)
)

(pres-mode) ; Do it.

That is... a lot to digest. Let's start with that magnificent data structure at the end... the line that starts (setq pres-presentation.... As mentioned, the end user makes a list. The first element is a string ("Emacs Lisp", the title and name of the buffer created to hold the presentation) and the second through last elements are arrays. pres-output is the first function defined. It's not intended for end users— just this script itself. As such, it's not declared interactive, and there's no docstring. After the declaration, we immediately are confronted with a bunch of functions we know: let, downcase, aref, and nth. So, to be fair, pres-i is initialized in the function pres-mode. "fmt" is the first element of the array. It's made to lower-case for easier digestion. It describes the format of the text to be entered. The when statement after the variables are declared will clear the screen if the format text begins with "c".

The with-temp-buffer statement is done to extract (with a tool called a regular expression) the number following the "h", if it's present, in that format string. This I borrowed from HTML where "h1" is the main heading and "h2" is secondary. The cond statement in there is worth inspecting. It sets func to a lambda expression that actually writes the text to the screen. Then, functionp is a predicate function that, if func is a function, will execute it. Since it was set inside the cond, the function could do or be anything, so this script is extend with new format strings. We'll discuss buffers and management, and regular expressions (briefly) in the next chapter.

The next function is pres-advance. This is interactive and end-user accessible, so there's a brief docstring. The main point of this is to increment pres-i inside the first line of the cond statement. We do some checking to make sure pres-i is operating inside the bounds of the presentation data structure. It's hard to display the 13th element of a ten-element list. The message statement in this function is to help the presenter understand where he is, in general, in the presentation. Handy for doing a youtube video, perhaps.

Finally, there's pres-mode. The most impressive and important thing here is its docstring. For the end user, this is what they want to see: "How do I use this stupid script?" the answer is all there. Notice that the text is around 80 characters wide. This is done as a courtesy to our terminal friends. Quotes are "escaped"— that is, they don't terminate the docstring, so Elisp has to know that those quotes are just for presentation. Beyond this, pres-mode doesn't do much more than set pres-i, check the availability of pres-presentation, and bind the space bar to pres-advance.

A Word on Symbols

Lisp sports a pretty weird data type that's pretty unique: the symbol. A symbol can be treated as a string that gets just an opening quote (remember set from the first section?). Variables in Lisp are really symbols, but as interesting as that is, it doesn't really affect implementation of basic variables.

Symbols and Their Properties

Symbols can also be used as data structures.

Lists and arrays are really cool for a lot of purposes, but we live in a world of key-value pairs. While it's cool to say the 4th element of an array is a particular value, it'd be even cooler to give that number "four" a name. The data structure here is called a symbol. A key like "firstname" on a symbol might have a value of "John". For Lisp, symbols live in something called (in the manual) an obarray, sort of a bucket of symbols. Your Elisp program can have its own obarray or just use the global one.

An important note: using symbols for key-value pairs is a bit archaic. Things are a bit more simple and intuitive with hash tables, which we'll see in a little bit.

The easiest way to create a new symbol is to specify it with intern. A symbol here is just a string. intern takes an optional second argument— the obarray. If you're just starting out, it might be a good idea to leave this out. If you're feeling adventurous, creating an obarray looks like this:

(setq myObArray (make-vector 15 0))
...where "5" in the above call to make-vector is the maximum number of symbols your obarray will hold initially. Here's fair warning: Elisp's main obarray is called "obarray", so don't call setq to it, or you'll ruin everything. I'm serious.

put and get are the functions for setting and getting properties on symbols. Let's look at something concrete:

(intern "anacreon")
(put 'anacreon 'firepower 80)

A nasty surprise here is the use of the single quote. The keys for anacreon are traditionally also symbols (though they can also be integers), but their values can be of any type. Getting values from a symbol is similarly easy:

(get 'anacreon 'firepower) ; returns 80

Remember that note above? Variables themselves are symbols? Symbols that are variables (created with setq or the like) are automatically interned. That means the following is completely permissible:

(setq titanic "R. M. S. Titanic")
(put 'titanic 'length 882.6)
(message titanic) ; Prints the string
(message (number-to-string (get 'titanic 'length))) ; Prints the number

But, what if we don't know the keys? What if we wanted all the data from the symbol's property list? The answer is easy enough: symbol-plist returns a list of keys and values associated with the symbol. A little snag here: this returns all the keys and values in a single list. Let's look at an example:

(intern "mars") ; Wikipedia!
(put 'mars 'radius "3,400 km")
(put 'mars 'day-length "24h 37m 22s")
(put 'mars 'year-length "686 days")
; (symbol-plist 'mars) returns: (radius "3,400 km" day-length "24h 37m 22s" year-length "686 days")

As you can see, it takes a modest amount of fishing to get data out of a property list this way.

Another warning here that further detracts from the usefulness of symbols: calling get and put to undefined symbols tentatively works, but can rather randomly fail at runtime. The results are undefined which makes error-checking really tough.

Also, like we'll see later with hashes, you can technically call put with any data type as the key. Numbers, strings, symbols... they all work. Internally, though, with get, Elisp uses the = operator, so if you call put with a string as the key, you won't get the results you expect. Fortunately, casting between symbols and strings are pretty easy:

(intern "destroyer")

(put 'destroyer "width" 80)
(get 'destroyer "width") ; Returns nil

(put 'destroyer (intern "width") 80) ; Switch string to symbol
(get 'destroyer (intern "width")) ; Returns 80

What if I want to turn a symbol into a string? Use symbol-name.

(symbol-name 'destroyer) ; Returns "destroyer"

A last word on symbols: it'll be tempting to use symbols as statuses of things, and there's yet another equality function to learn for checking equality.

(setq gamestate :in-space)
; THIS FAILS:
(when (= gamestate :in-space) (message "We're in space!"))
; This is what you want:
(when (eql gamestate :in-space) (message "We're really in space!"))

Hashes

But, what if we wanted keys and values both to be of any type? What if we didn't want to dink with obarrays? What if we, for real, just wanted a symbol that had associated key-value pairs? I've got the answer: hash maps!

They're made simply enough:

(setq browns (make-hash-table))

This is a bit more typing than simply interning a symbol. And while the rest of this talk of hash tables will be really easy, there are some points of complexity. If you look in the Elisp manual, there are a number of parameters that can be passed to make-hash-table to optimize Elisp's behavior. I'll skip it here since this section's already pretty heavy.

Similar to the get and put in dealing with symbols, we have gethash and puthash— where puthash takes a key, then a value, and thirdly, the table. With everything else in Elisp, this function takes its arguments in a different order than regular get and put. I'm sorry.

(puthash 1 "pizza" browns)
(puthash 'glick 9.2 browns)
(puthash :dabadase "database" browns)

You'll notice in the above example, keys and values, unlike with symbols, can be of any data type. Numbers are cool if you're doing complicated array or list things and would like a hash table instead. Likewise, strings or symbols make for intuitive later use. In this regard, hash tables are remarkably superior to symbols. Getting data back from a hash is simple enough, too:

(message (gethash 1 browns)) ; Prints the word pizza
(message (number-to-string (gethash 'glick browns))) ; Prints 9.2

There are some other hash-related functions:

(remhash 1 browns) ; Removes the mapping 1 → "pizza" from the hash
(clrhash browns) ; Empty the hash completely.

If you're following along closely, the question may arise: "if variables are symbols, and hash tables are symbols, can't I use properties on symbols, too?" Yes! I have no idea why this would be useful, but go for it!

Lastly, I want to present the ultimate reason to use hashes instead of symbols and their properties: maphash. Remember all that talk about functions being able to be treated as variables in Section 4? That's relevant here. maphash takes two arguments: a function and a hash. The function will take two values and be called for each key-value pair in the hash.

Let's look at this in action with a lambda function that simply prints out all the keys and values at the end of the current buffer (though the function would work just as well with one defined with defun):

; Make some data...
(setq titanic (make-hash-table))

(puthash "name" "R. M. S. Titanic" titanic)
(puthash "length" "882 ft, 6 in." titanic)
(puthash "height" "175 ft." titanic)
(puthash "launch" "May 31, 1911" titanic)

; Print it out:
(switch-to-buffer "Titanic")
(maphash
(lambda (k v)
(insert (concat " | " (format "%-15s" k) " | "
(format "%-20s" v) " |\n")))
titanic)

This uses switch-to-buffer and really sports a lot of things from Section 2 with insert as opposed to the message that pervades these examples and format to pretty-print a table.

There's one last thing I want to say about hash maps... a lesson I learned the hard way. They work like symbols with key names. Using strings isn't advised since gethash doesn't work as described above when strings are used for keys. Fortunately, simply throwing intern and symbol-name around the instances of the strings solves all the problems by casting the strings to symbols.