





















































In this article Eduardo Díaz, author of the book Clojure for Java Developers, we will learn how to build a command line tool in Clojure.
We'll also look to some new features in the Clojure world; we'll be discussing core.async and transducers, which are the new ways to write asynchronous programs. core.async is a very exciting method to create thousands of light threads, along with the capability to manage them.
Transducers are a way to separate computation from the source of data; you can use transducers with the data flowing between the light threads or use them with a vector of data.
(For more resources related to this topic, see here.)
First, let's take the time to understand the requirements fully. Let's try to summarize our requirement in a single statement:
We need to know all the places in a set of pages that use the same CSS selector.
This seems to be a well-defined problem, but let's not forget to specify some things in order to make the best possible decisions.
How can we solve the previous requirements? Java and Clojure already have a wide variety of libraries that we can use. Let's have a look at a couple of them, which we can use in this example.
The biggest issue seems to be finding a simple way to build a cross browser. Even automating a single browser sounds like a complex task; how could we automate different browsers?
You probably have heard about selenium, a library that enables us to automate a browser. It is normally used to test, but it also lets us take screenshots to lookup for certain elements and allows us to run custom JavaScript on the browser and in turn its architecture allows it to run on different browsers. It does seem like a great fit.
In the modern world, you can use selenium for almost any language you want; however, it is written in Java and you can expect a first class support if you are running in the JVM.
We are using Clojure and we can expect better integration with the Clojure language; for this particular project we will rely on clj-webdriver (https://github.com/semperos/clj-webdriver).
It is an open source project that features an idiomatic Clojure API for selenium, called Taxi.
You can find the documentation for Taxi at https://github.com/semperos/clj-webdriver/wiki/Introduction%3A-Taxi
We want to build a command-line app and if we want to do it in the best possible way, it is important to think of our users. Command-line users are comfortable with using their apps in a standard way. One of the most used and best-known standards to pass arguments to command-line apps is the standard GNU.
There are libraries in most of the languages that help you parse command-line arguments and Clojure is no exception.
Let's use the tools.cli library (https://github.com/clojure/tools.cli). The tools.cli library is a parser for the command-line arguments, it adheres to the GNU standard and makes it very easy to create a command-line interface that is safe and familiar to use.
Some very interesting features that tools.cli gives you are:
The README file in GitHub is very helpful and it can be updated to the latest version. Therefore, I recommend that you have a look to understand all the possibilities of this awesome library.
We have everything we need in order to build our app, let's write it now.
This project has the following three main namespaces:
Let's check the responsibilities of each of these namespaces.
(ns check-css.browser
(:require [clj-webdriver.taxi :as taxi]
[clojure.string :as s]
[clojure.java.io :as io]))
(defn exec-site-fn [urls f & {:keys [driver]
:or {driver {:browser :chrome}}}]
(taxi/with-driver driver
(doseq [url urls]
(taxi/to url)
(f url))))
This is very simple code, it includes the function exec-site-fn that receives a list of urls and the optional configuration of your driver. If you don't specify a driver, it will be Chrome by default.
Taxi includes a macro with-driver, which allows you to execute a procedure with a single browser in a sequential manner. We get the following benefits from this:
So this function just executes something for some urls using a single browser, we can think of it as just a helper function.
(ns check-css.page
(:require [clj-webdriver.taxi :as taxi]
[clojure.string :as s]
[clojure.java.io :as io]))
(defn execute-script-fn [base-path js-src selector url]
(let [path-url (s/replace url #"/" "_")
path (str base-path "/" path-url ".png")]
(taxi/execute-script (s/replace js-src #"#selector# " selector))
(println (str "Checking site " url))
(taxi/take-screenshot :file path)))
This is again a helper function, which does two things:
How can we use this to our advantage? It is very easy to use JavaScript to mark the elements you are interested in.
You can use this script as shown:
var els = document.querySelectorAll('#selector#);
for (var i = 0; i < els.length; i++) {
els[i].style.border = '2px solid red';
}
Therefore, we just need to use everything together for this to work and we'll see that in the check-css.core namespace.
(ns check-css.core …)
(def cli-options
[["-s" "--selector SELECTOR" "CSS Selector"]
["-p" "--path PATH" "The base folder for images"]
["-b" "--browser BROWSER" "Browser"
:default :chrome
:parse-fn keyword]
["-h" "--help"]])
(defn find-css-usages [browser selector output-path urls]
(let [js-src (-> (io/resource "script.js") slurp)
apply-script-fn (partial p/execute-script-fn
output-path
js-src
selector)]
(doseq [url urls]
(b/exec-site-fn urls apply-script-fn
:driver {:browser browser}))))
(defn -main [& args]
(let [{:keys [options arguments summary]} (parse-opts args cli-options)
{:keys [browser selector path help]} options
urls arguments]
(if-not help
(find-css-usages browser selector path urls)
(exit 0 summary))))
This code looks very simple; here we can see the usage of tools.cli and the function that takes everything together, find-css-usages. This function:
This is all that is needed to execute our program. Now we can do the following from the command line:
# lein uberjar
# java -jar target/uberjar/check-css-0.1.0-SNAPSHOT-standalone.jar -p . -s "input" -b chrome http://www.google.com http://www.facebook.com
It creates a couple of screenshots of Google and Facebook, pointing out the elements that are inputs.
Granted, we can do something more interesting with our app, but for now, let's focus on the code.
There are a couple of things we want to do to this code. The first thing is that we want to have some sort of statistical record of how many elements were found, not just the screenshots.
The second important thing has to do with an opportunity to learn about core.async and what's coming up next in the Clojure world.
Core.async is yet another way of programming concurrently, it uses the idea of lightweight threads and channels it to communicate between them.
The lightweight threads are used in languages like go and erlang. They pride in being able to run thousands of threads in a single process.
What is the difference between the lightweight threads and traditional threads?
The traditional threads need to reserve memory and this also takes some time. If you want to create a couple thousand threads, you will be using a noticeable amount of memory for each thread and asking the kernel to do that also takes time.
What difference do lightweight threads make? To have a couple hundred lightweight threads you only need to create a couple of threads, there is no need to reserve memory. The lightweight threads are merely a software idea.
This can be achieved with most languages and Clojure adds first class support (without changing the language, this is part of the lisp power) using core.async! Let's have a look of how it works.
There are two concepts that you need to keep in mind:
Now, let's play a little with each of them so you can understand how to use them for our program.
You will find goblocks in the clojure.core.async namespace.
Goblocks are extremely easy to use, you need the go macro and you will do something similar to this:
(ns test
(:require [clojure.core.async :refer [go]]))
(go
(println "Running in a goblock!"))
They are similar to threads; you just need to remember that you can create goblocks freely. There can be thousands of running goblocks in a single JVM.
You can actually use anything you like to communicate between goblocks, but it is recommended that you use channels.
Channels have two main operations namely, putting and getting. Let's see how to do it:
(ns test
(:require [clojure.core.async :refer [go chan >! <!]]))
(let [c (chan)]
(go (println (str "The data in the channel is" (<! c))))
(go (>! c 6)))
That's it! It looks pretty simple, as you can see there are three main functions that we are using with channels:
There are lots of other functions that you can use with channels, for now let's add two related functions that you will probably use soon
If you look into the core.async API docs, (http://clojure.github.io/core.async/) you will find a fair amount of functions.
Some of them look similar to the functions that give you functionalities similar to queues, let's look at the broadcast function.
(ns test
(:require [clojure.core.async.lab :refer [broadcast]]
[clojure.core.async :refer [chan <! >!! go-loop]])
(let [c1 (chan 5)
c2 (chan 5)
bc (broadcast c1 c2)]
(go-loop []
(println "Getting from the first channel" (<! c1))
(recur))
(go-loop []
(println "Getting from the second channel" (<! C2))
(recur))
(>!! bc 5)
(>!! bc 9))
With this you can now publish to several channels at the same time, this is helpful to subscribe multiple processes to a single source of events, with a great amount of separation of concerns.
If you take a good look, you will also find familiar functions over there: map, filter, and reduce.
Depending of the version of core.async, some of these functions could not be there anymore.
Why are these functions there? Those functions are for modifying collections of data, right?
The reason is that there has been a good amount of effort towards using channels as higher-level abstractions.
The idea is to see channels as collections of events, if you think of them that way it's easy to see that you can create a new channel by mapping every element of an old channel, or you can create a new channel by filtering away some elements.
In recent versions of Clojure, the abstraction has become even more noticeable with transducers.
Transducers are a way to separate the computations from the input source, simply they are a way to apply a sequence of steps to a sequence or a channel.
Let's look at an example for a sequence.
(let [odd-counts (comp (map count)
(filter odd?))
vs [[1 2 3 4 5 6]
[:a :c :d :e]
[:test]]]
(sequence odd-counts vs))
comp feels similar to the threading macros, it composes functions and stores the steps of the computation.
The interesting part is that we can use this same odd-counts transformation with a channel, as shown:
(let [odd-counts (comp (map count)
(filter odd?))
input (chan)
output (chan 5 odd-counts)]
(go-loop []
(let [x (<! output)]
(println x))
(recur))
(>!! input [1 2 3 4 5 6])
(>!! input [:a :c :d :e])
(>!! input [:test]))
This is quite interesting and now you can use this to understand how to improve the code of the check-css program. The main thing we'll gain (besides learning core.async and how to use transducers) is the visibility and separation of concerns; if we have channels publishing events, then it becomes extremely simple to add new subscribers without changing anything.
In this article we have learned how to build a command line tool, what are the requirements we needed to build the tool, how we can use this in different projects.
Further resources on this subject: