Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Programming

1081 Articles
article-image-web-scraping-python
Packt
17 Feb 2010
5 min read
Save for later

Web Scraping with Python

Packt
17 Feb 2010
5 min read
To perform this task, usually three basic steps are followed: Explore the website to find out where the desired information is located in the HTML DOM tree Download as many web pages as needed Parse downloaded web pages and extract the information from the places found in the exploration step The exploration step is performed manually with the aid of some tools that make it easier to locate the information and reduce the development time in next steps. The download and parsing steps are usually performed in an iterative cycle since they are interrelated. This is because the next page to download may depend on a link or similar in the current page, so not every web page can be downloaded without previously looking into the earlier one. This article will show an example covering the three steps mentioned and how this could be done using python with some development. The code that will be displayed is guaranteed to work at the time of writing, however it should be taken into account that it may stop working in future if the presentation format changes. The reason is that web scraping depends on the DOM tree to be stable enough, that is to say, as happens with regular expressions, it will work fine for slight changes in the information being parsed. However, when the presentation format is completely changed, the web scraping scripts have to be modified to match the new DOM tree. Explore Let's say you are a fan of Pack Publishing article network and that you want to keep a list of the titles of all the articles that have been published until now and the link to them. First of all, you will need to connect to the main article network page (http://www.packtpub.com/article-network) and start exploring the web page to have an idea about where the information that you want to extract is located. Many ways are available to perform this task such as view the source code directly in your browser or download it and inspect it with your favorite editor. However, HTML pages often contain auto-generated code and are not as readable as they should be, so using a specialized tool might be quite helpful. In my opinion, the best one for this task is the Firebug add-on for the Firefox browser. With this add-on, instead of looking carefully in the code looking for some string, all you have to do is press the Inspect button, move the pointer to the area in which you are interested and click. After that, the HTML code for the area marked and the location of the tag in the DOM tree will be clearly displayed. For example, the links to the different pages containing all the articles are located inside a right tag, and, in every page, the links to the articles are contained as list items in an unnumbered list. In addition to this, the links URLs, as you probably have noticed while reading other articles, start with http://www.packtpub.com/article/ So, our scraping strategy will be Get the list of links to all pages containing articles Follow all links so as to extract the article information in all pages One small optimization here is that main article network page is the same as the one pointed by the first page link, so we will take this into account to avoid loading the same page twice when we develop the code. Download Before parsing any web page, the contents of that page must be downloaded. As usual, there are many ways to do this: Creating your own HTTP requests using urllib2 standard python library Using a more advanced library that provides the capability to navigate through a website simulating a browser such as  mechanize. In this article mechanize will be covered as it is the easiest choice. mechanize is a library that provides a Browser class that lets the developer to interact with a website in a similar way a real browser would. In particular it provides methods to open pages, follow links, change form data and submit forms. Recalling the scraping strategy in our previous version, the first thing we would like to do is to download the main article network web page. To do that we will create a Browser class instance and then open the main article network page: >>> import mechanize>>> BASE_URL = "http://www.packtpub.com/article-network">>> br = mechanize.Browser()>>> data = br.open(BASE_URL).get_data()>>> links = scrape_links(BASE_URL, data) Where the result of the open method is an HTTP response object, the get_data method returns the contents of the web page. The scrape_links function will be explained later. For now, as pointed out in the introduction section, bear in mind that the downloading and parsing steps are usually performed iteratively since some contents to be downloaded depends on the parsing done in some kind of initial contents such as in this case.
Read more
  • 0
  • 0
  • 8491

article-image-normalizing-dimensional-model
Packt
08 Feb 2010
3 min read
Save for later

Normalizing Dimensional Model

Packt
08 Feb 2010
3 min read
First Normal Form Violation in Dimension table Let’s revisit the author problem in the book dimension. The AUTHOR column contains multiple authors, a first-normal-form violation, which prevents us from querying book or sales by author. BOOK dimension table BOOK_SK TITLE AUTHOR PUBLISHER CATEGORY SUB-CATEGORY 1 Programming in Java King, Chan Pac Programming Java 2 Learning Python Simpson Pac Programming Python 3 Introduction to BIRT Chan, Gupta, Simpson (Editor) Apes Reporting BIRT 4 Advanced Java King, Chan Apes Programming Java Normalizing and spinning off the authors into a dimension and adding an artificial BOOK AUTHOR fact solves the problem; it is an artificial fact as it does not contain any real business measure.  Note that the Editor which is an author’s role is also “normalized” into its column in the AUTHOR table (It is related to author, but not actually an author’s name). AUTHOR table AUTHOR_SK AUTHOR_NAME 1 King 2 Chan 3 Simpson 4 Gupta BOOK AUTHOR table BOOK_SK AUTHOR_SK ROLE COUNT 1 1 Co-author 1 1 2 Co-author 1 2 3 Author 1 3 2 Co-author 1 3 3 Editor 1 3 4 Co-author 1 4 1 Co-author 1 4 2 Co-author 1 Note the artificial COUNT measure which facilitates aggregation always has a value of numeric 1. SELECT name, SUM(COUNT) FROM book_author ba, book_dim b, author_dim aWHERE ba.book_sk = b.book_sk AND ba.author_sk = a.author_skGROUP BY name You might need to query sales by author, which you can do so by combining the queries of each of the two stars (the two facts) on their common dimension (BOOK dimension), producing daily book sales by author. SELECT dt, title, name, role, sales_amt FROM (SELECT book_sk, dt, title, sales_amt FROM sales_fact s, date_dim d, book_dim b WHERE s.book_sk = b.book_sk AND s.date_sk = d.date_sk) sales, (SELECT b.book_sk, name, role FROM book_author ba, book_dim b, author_dim a WHERE ba.book_sk = b.book_sk AND ba.author_sk = a.author_sk) authorWHERE sales.book_sk = author.book_sk Single Column with Repeating Value in Dimension table Columns like the PUBLISHER, though not violating any normal form, is also good to get normalized, which we accomplish by adding an artificial fact, PUBLISHED BOOK fact, and its own dimension, PUBLISHER dimension. This normalization is not exactly the same as that in normalizing first-normal-form violation; the PUBLISHER dimension can correctly be linked to the SALES fact, the publisher surrogate key must be added though in the SALES fact. PUBLISHER table   PUBLISHER_SK PUBLISHER 1 Pac 2 Apes BOOK PUBLISHER table BOOK_SK PUBLISHER_SK COUNT 1 1 1 1 2 1 2 3 1 3 2 1 3 3 1 3 4 1 4 1 1 4 2 1 Related Columns with Repeating Value in Dimension table CATEGORY and SUB-CATEGORY columns are related, they form a hierarchy. Each of them can be normalized into its own dimension, but they need to be all linked into one artificial fact. Non-Measure Column in Fact table The ROLE column inside the BOOK AUTHOR fact is not a measure; it violates the dimensional modeling norm; to resolve we just need to spin it off into its own dimension, effectively normalizing the fact table. ROLE dimension table and sample rows   ROLE_SK ROLE 1 Author 2 Co-Author 3 Editor BOOK AUTHOR table with normalized ROLE BOOK_SK AUTHOR_SK ROLE_SK COUNT 1 1 2 1 1 2 2 1 2 3 1 1 3 2 2 1 3 3 3 1 3 4 2 1 4 1 2 1 4 2 2 1   Summary This article shows that both dimensional table and fact table in a dimensional model can be normalized without violating its modeling norm. If you have read this article, you may be interested to view : Solving Many-to-Many Relationship in Dimensional Modeling
Read more
  • 0
  • 0
  • 4834

article-image-packaging-python-project-using-doit
Packt
05 Feb 2010
6 min read
Save for later

Packaging a Python Project using doit

Packt
05 Feb 2010
6 min read
The article won't attempt to reproduce doit documentation, but will explain how it could be used to solve a specific problem in a practical way. For a complete introduction of doit, and a description of all its features, please refer to the project documentation. Debian packaging or bazaar knowledge isn't required to follow the discussion, but it would be helpful. Background When working on a project's source code, a developer usually needs to perform different repetitive administrative tasks that are required to compile, test and distribute the source code. In general, those tasks are pretty similar from project to project; although, the details may greatly vary depending on the application type, target platform, software development cycle, etc. As a consequence, the implementation of custom scripts that automate them is needed as a part of the maintenance of the source code. Given that, this is a very common problem, many task automation tools have been created, make is one of the most well-known among them and is used as a reference to compare with other similar tools. As the reader probably knows, make provides an imperative way to automate small tasks by defining in a file (a makefile) a series of rules that have a target file, multiple dependency files and a set of commands. To reach a given target, make must ensure that target file isn't outdated and that all the dependency files are present before running the commands that will generate the target file. During this process, the evaluation of other rules might be needed to fulfill the required dependencies. Although this approach may look simple, it has been really successful in many projects for years. However, since it tries to solve a general problem, it doesn't perfectly fit in every situation. This fact has led to the creation of similar tools that attempt to address some of the drawbacks of make.: The makefile format forces the developer to learn a new mini-language. Rules are statically defined. Just one target file per rule is allowed. With the advent of dynamic programming languages, a new generation of make-like tools that solved those issues were designed. For example, rake did a really good job in providing a familiar environment for ruby developers who wanted to use an advanced task automation tool without having to learn something new other than an API. With regard to python developers, many of these tools are currently available for them with different goals in their designs. One that I find particularly interesting is doit because it doesn't have any of the make problems listed above. In particular: It's really simple to use because it uses python itself as the language to write the configuration statements needed. Tasks, the equivalent to make's rules, may have as many targets as needed, which makes things simpler when the execution of a command entails the creation of multiple files. Task themselves aren't defined in the configuration, but task generators. This is really flexible when dependencies and/or targets depend on variables that need to be evaluated at run time. The problem Let's imagine that we are working on checkbox-editor, a simple python project hosted in Launchpad that provides an easy GTK interface to write test cases for checkbox. The way the application is delivered to users is by means of .deb packages for the latest Ubuntu distributions in a Personal Package Archive or PPA, so we'd like to be able to: Package the application at any time. Install the package locally for testing. Upload the package automatically to a PPA. Fortunately, the project's trunk branch already has the configuration files needed to generate a .deb package using the usual set of tools, so we're going to focus on the process of writing the file needed to generate and upload the desired packages. Of course, since we don't like to waste our time, we only want to generate the files needed for packaging when necessary; that's is to say, we're going to follow make's approach of generating target files only when they aren't up-to-date. Tasks In this section, a file that contains the tasks generators, which are required to automate the package generation using doit, will be created step by step. The same way as a makefile is created with all the rules for make; in doit, the default file name with the task generators is dodo.py. Of course, another file name can be used by passing an argument to doit, but we'll stick to the usual name in this example. In the code snippets that will be displayed in the following sections, some global variables will be used mainly to get the name of some files. For now, just assume that they're available in the task generators methods. The code that calculates those variables value will be shown at the end of the article. Identification There are two different classes of packages: source and binary ones. Binary packages are the ones that are compiled for an specific architecture, and that are expected to be installed directly into the destination hardware without any problem. These are the type of packages that we need to generate to accomplish the goal of installing a package locally for testing purposes. Hence, two of the tasks that we need to automate are the generation of the binary package and it's installation. Source packages are useful to distribute the source code of an application in a platform independent way, so that anyone can take a look at the code, fix it or compile it for another architecture if needed. This is also the package that must be uploaded to a Launchpad PPA, since it will take care to compile it for different architectures and publish the binary packages for them. Consequently, two more tasks that should be automated are the generation of a source package and the upload to the Launchpad PPA. Before creating any package is generated, we also need to generate a copy of the source code with the latest changes. This is not absolutely needed; but it's advised since the package generation process creates some temporary files. The diagram of the tasks that have just been identified is the following: Tasks that should be automated Code The first task before any package generation is copying the source code to a new directory (for example, pkg), to keep the development directory clean from the temporary files created during the packaging process. The code that implements this task is as follows: 1 def task_code(): 2 """ 3 Create package directory and copy source files 4 (bzr branch not used to take into account uncommited changes) 5 """ 6 def copy_file(from_file, to_file): 7 dir = os.path.dirname(to_file) 8 if dir and not os.path.isdir(dir): 9 os.makedirs(dir) 10 print from_file, '=>', to_file 11 shutil.copyfile(from_file, to_file) 12 shutil.copystat(from_file, to_file) 13 return True 14 15 yield {'name': 'clean', 16 'actions': None, 17 'clean': ['rm -rf pkg']} 18 19 for target, dependency in zip(PKG_FILES, SRC_FILES): 20 yield {'name': dependency, 21 'actions': [(copy_file, (dependency, target))], 22 'dependencies': [dependency], 23 'targets': [target]}
Read more
  • 0
  • 0
  • 1848
Visually different images

article-image-build-your-own-application-access-twitter-using-java-and-netbeans-part-1
Packt
05 Feb 2010
6 min read
Save for later

Build your own Application to access Twitter using Java and NetBeans: Part 1

Packt
05 Feb 2010
6 min read
Due to the fact that writing a Java app to control your Twitter account is quite a long process and requires several features, I intend to divide this article in several sections, so you can see in extreme detail all the bells and whistles involved in writing Java applications. Downloading and installing NetBeans for your developing platform To download NetBeans, open a web browser window and go to the NetBeans website. Then click on the Download button and select the All IDE download bundle. After downloading NetBeans, install it with the default options. Creating your SwingAndTweet project Open NetBeans and select File | New Project to open the New Project dialog. Now select Java from the Categories panel and Java Application from the Projects panel. Click on Next to continue. The New Java Application dialog will show up next. Type SwingAndTweet in the Project Name field, mark the Use Dedicated Folder for Storing Libraries option, deselect the Create Main Class box (we’ll deal with that later), make sure the Set as Main Project box is enabled and click on Next to continue: NetBeans will create the SwingAndTweet project and will show it under the Projects tab, in the NetBeans main window. Right click on the project’s name and select JFrame Form... in the pop-up menu: The New JFrame Form window will appear next. Type SwingAndTweetUI in the Class Name field, type swingandtweet in the Package field and click on Finish to continue: NetBeans will open the SwingAndTweetUI frame in the center panel of the main screen. Now you’re ready to assemble your Tweeter Java application! Now let me explain a little bit about what we did in the previous exercise: First, we created a new Java application called SwingAndTweet. Then we created a Swing JFrame component and we named it SwingAndTweetUI, because this is going to act as the foundation, where we’re going to put all the other Swing components required to interact with Twitter. Now I’m going to show you how to download and integrate the Twitter4J API to your SwingAndTweetJava application. Downloading and integrating the Twitter4J API into your NetBeans environment For us to be able to use the powerful classes and methods from the Twitter4J API, we need to tell NetBeans where to find them and integrate them into our Java applications. Open a web browser window, go to http://repo1.maven.org/maven2/net/homeip/yusuke/twitter4j/ and search for the latest twitter4j.2.X.X.jar file, or download the most recent version at the time of this writing from here:http://repo1.maven.org/maven2/net/homeip/yusuke/twitter4j/2.0.9/twitter4j-2.0.9.jar. Once you download it in your computer, go to NetBeans, right-click on the SwingAndTweet project and select Properties from the context menu. Once at the project properties screen, select the Libraries category under the Categories panel, click on the Add JAR/Folder... button at the middle-right part of the screen to open the Add JAR/Folder dialog, navigate to the directory where you downloaded the twitter4j-2.X.X.jar file and double click on it to add it to your project’s library path: Click on OK to close the Project Properties dialog and return to the NetBeans main screen. Ok, you have integrated the Twitter4J API to your SwingAndTweet application. Now, let’s see how to log into your Twitter account from our Java application... Logging into Twitter from Java and seeing your last Tweet In the following exercise, I’ll show you how easy it is to start communicating with Twitter from a Java application, thanks to the Twitter class from the Twitter4J API. You‘ll also learn how to check your last tweet through your Java application. Let’s see how to log into a Twitter account: Go to the Palette window and locate the JLabel component under the Swing Controls section; then drag and drop it into the TweetAndSwing JFrame component: Now drag a Button and a Text Editor, too. Once you have the three controls inside the SwingAndTweetUI JFrame control, arrange them as shown below: The next step is to change their names and captions, to make our application look more professional. Right click on the JLabel1 control, select Edit from the context menu, type My Last Tweet and hit Enter. Do the same procedure with the other two controls: erase the text in the jTextField1 control and type Login in the jButton1 control. Rearrange the jLabel1 and jTextField1 controls, and drag one of the ends of jTextField1 to increase its length all you can. Once done, your application will look like this: And now, let’s inject some life to our application! Double click on the JButton1 control to open your application’s code window. You’ll be inside a java method called jButton1ActionPerformed. This method will execute every time you click on the Login button, and this is where we’re going to put all the code for logging into your Twitter account. Delete the // TODO add your handling code here: line and type the following code inside the JButton1ActionPerformed method: Remember to replace username and password with your real Twitter username and password. If you look closely at the line numbers, you‘ll notice there are five error icons on lines 82, 84, 85,  88 and 89. That’s because we need to add some import lines at the beginning of your code, to indicate NetBeans where to find the Twitter and JOptionPane classes, and the TwitterException. Scroll up until you locate the package swingandtweet; line; then add the following lines: Now all the errors will disappear from your code. To see your Java application in action, press F6 or select Run  Run | Main Project from the NetBeans main menu. The Run Project window will pop up, asking you to select the main class for your project. The swingandtweet.SwingAndTweetUI class will already be selected, so just click on OK to continue. Your SwingAndTweetUI application window will appear next, showing the three controls you created. Click on the Login button and wait for the SwingAndTweet application to validate your Twitter username and password. If they’re correct, the following dialog will pop up: Click on OK to return to your SwingAndTweet application. Now you will see your last tweet on the textbox control: If you want to be really sure it’s working, go to your Twitter account and update your status through the Web interface; for example, type Testing my Java app. Then return to your SwingAndTweet application and click on the Login button again to see your last tweet. The textbox control will now reflect your latest tweet: As you can see, your SwingAndTweet Java application can now communicate with your Twitter account! Click on the X button to close the window and exit your SwingAndTweet application.
Read more
  • 0
  • 0
  • 4594

article-image-basic-doctest-python
Packt
29 Jan 2010
9 min read
Save for later

Basic Doctest in Python

Packt
29 Jan 2010
9 min read
Doctest will be the mainstay of your testing toolkit. You'll be using it for tests, of course, but also for things that you may not think of as tests right now. For example, program specifications and API documentation both benefit from being written as doctests and checked alongside your other tests. Like program source code, doctest tests are written in plain text. Doctest extracts the tests and ignores the rest of the text, which means that the tests can be embedded in human-readable explanations or discussions. This is the feature that makes doctest so suitable for non-classical uses such as program specifications. Time for action – creating and running your first doctest We'll create a simple doctest, to demonstrate the fundamentals of using doctest. Open a new text file in your editor, and name it test.txt. Insert the following text into the file: This is a simple doctest that checks some of Python's arithmeticoperations.>>> 2 + 24>>> 3 * 310 We can now run the doctest. The details of how we do that depend on which version of Python we're using. At the command prompt, change to the directory where you saved test.txt. If you are using Python 2.6 or higher, type: $ python -m doctest test.txt If you are using python 2.5 or lower, the above command may seem to work, but it won't produce the expected result. This is because Python 2.6 is the first version in which doctest looks for test file names on the command line when you invoke it this way. If you're using an older version of Python, you can run your doctest by typing: $ python -c "__import__('doctest').testfile('test.txt')" When the test is run, you should see output as shown in the following screen: What just happened? You wrote a doctest file that describes a couple of arithmetic operations, and executed it to check whether Python behaved as the tests said it should. You ran the tests by telling Python to execute doctest on the files that contained the tests. In this case, Python's behavior differed from the tests because according to the tests, three times three equals ten! However, Python disagrees on that. As doctest expected one thing and Python did something different, doctest presented you with a nice little error report showing where to find the failed test, and how the actual result differed from the expected result. At the bottom of the report, is a summary showing how many tests failed in each file tested, which is helpful when you have more than one file containing tests. Remember, doctest files are for computer and human consumption. Try to write the test code in a way that human readers can easily understand, and add in plenty of plain language commentary. The syntax of doctests You might have guessed from looking at the previous example: doctest recognizes tests by looking for sections of text that look like they've been copied and pasted from a Python interactive session. Anything that can be expressed in Python is valid within a doctest. Lines that start with a >>> prompt are sent to a Python interpreter. Lines that start with a ... prompt are sent as continuations of the code from the previous line, allowing you to embed complex block statements into your doctests. Finally, any lines that don't start with >>> or ..., up to the next blank line or >>> prompt, represent the output expected from the statement. The output appears as it would in an interactive Python session, including both the return value and the one printed to the console. If you don't have any output lines, doctest assumes it to mean that the statement is expected to have no visible result on the console. Doctest ignores anything in the file that isn't part of a test, which means that you can place explanatory text, HTML, line-art diagrams, or whatever else strikes your fancy in between your tests. We took advantage of that in the previous doctest, to add an explanatory sentence before the test itself. Time for action – writing a more complex test We'll write another test (you can add it to test.txt if you like) which shows off most of the details of doctest syntax. Insert the following text into your doctest file (test.txt), separated from the existing tests by at least one blank line: Now we're going to take some more of doctest's syntax for a spin.>>> import sys>>> def test_write():... sys.stdout.write("Hellon")... return True>>> test_write()HelloTrue Think about it for a moment: What does this do? Do you expect the test to pass, or to fail? Run doctest on the test file, just as we discussed before. Because we added the new tests to the same file containing the tests from before, we still see the notification that three times three does not equal ten. Now, though, we also see that five tests were run, which means our new tests ran and succeeded. What just happened? As far as doctest is concerned, we added three tests to the file. The first one says that when we import sys, nothing visible should happen. The second test says that when we define the test_write function, nothing visible should happen. The third test says that when we call the test_write function, Hello and True should appear on the console, in that order, on separate lines. Since all three of these tests pass, doctest doesn't bother to say much about them. All it did was increase the number of tests reported at the bottom from two to five. Expecting exceptions That's all well and good for testing that things work as expected, but it is just as important to make sure that things fail when they're supposed to fail. Put another way; sometimes your code is supposed to raise an exception, and you need to be able to write tests that check that behavior as well. Fortunately, doctest follows nearly the same principle in dealing with exceptions, that it does with everything else; it looks for text that looks like a Python interactive session. That means it looks for text that looks like a Python exception report and traceback, matching it against any exception that gets raised. Doctest does handle exceptions a little differently from other tools. It doesn't just match the text precisely and report a failure if it doesn't match. Exception tracebacks tend to contain many details that are not relevant to the test, but which can change unexpectedly. Doctest deals with this by ignoring the traceback entirely: it's only concerned with the first line—Traceback (most recent call last)—which tells it that you expect an exception, and the part after the traceback, which tells it which exception you expect. Doctest only reports a failure if one of these parts does not match. That's helpful for a second reason as well: manually figuring out what the traceback would look like, when you're writing your tests would require a significant amount of effort, and would gain you nothing. It's better to simply omit them. Time for action – expecting an exception This is yet another test that you can add to test.txt, this time testing some code that ought to raise an exception. Insert the following text into your doctest file (Please note that the last line of this text has been wrapped due to the constraints of the article's format, and should be a single line): Here we use doctest's exception syntax to check that Python iscorrectly enforcing its grammar.>>> def faulty():... yield 5... return 7Traceback (most recent call last):SyntaxError: 'return' with argument inside generator(<doctest test.txt[5]>, line 3) The test is supposed to raise an exception, so it will fail if it doesn't raise the exception, or if it raises the wrong exception. Make sure you have your mind wrapped around that: if the test code executes successfully, the test fails, because it expected an exception. Run the tests using doctest and the following screen will be displayed: What just happened? Since Python doesn't allow a function to contain both yield statements and return statements with values, having the test to define such a function caused an exception. In this case, the exception was a SyntaxError with the expected value. As a result, doctest considered it a match with the expected output, and thus the test passed. When dealing with exceptions, it is often desirable to be able to use a wildcard matching mechanism. Doctest provides this facility through its ellipsis directive, which we'll discuss later Expecting blank lines in the output Doctest uses the first blank line to identify the end of the expected output. So what do you do, when the expected output actually contains a blank line? Doctest handles this situation by matching a line that contains only the text <BLANKLINE> in the expected output, with a real blank line in the actual output. Using directives to control doctest Sometimes, the default behavior of doctest makes writing a particular test inconvenient. That's where doctest directives come to our rescue. Directives are specially formatted comments that you place after the source code of a test, which tell doctest to alter its default behavior in some way. A directive comment begins with # doctest:, after which comes a comma-separated list of options, that either enable or disable various behaviors. To enable a behavior, write a + (plus symbol) followed by the behavior name. To disable a behavior, white a – (minus symbol) followed by the behavior name. Ignoring part of the result It's fairly common that only part of the output of a test is actually relevant to determining whether the test passes. By using the +ELLIPSIS directive, you can make doctest treat the text ... (called an ellipsis) in the expected output as a wildcard, which will match any text in the output. When you use an ellipsis, doctest will scan ahead until it finds text matching whatever comes after the ellipsis in the expected output, and continue matching from there. This can lead to surprising results such as an ellipsis matching against a 0-length section of the actual output, or against multiple lines. For this reason, it needs to be used thoughtfully.
Read more
  • 0
  • 0
  • 3201

article-image-embedding-doctests-python-docstrings
Packt
29 Jan 2010
12 min read
Save for later

Embedding Doctests in Python Docstrings

Packt
29 Jan 2010
12 min read
Doctests aren't confined to simple text files. You can put doctests into Python's docstrings. Why would you want to do that? There are a couple of reasons. First of all, docstrings are an important part of the usability of Python code (but only if they tell the truth). If the behavior of a function, method, or module changes and the docstring doesn't get updated, then the docstring becomes misinformation, and a hindrance rather than a help. If the docstring contains a couple of doctest examples, then the out-of-date docstrings can be located automatically. Another reason for placing doctest examples into docstrings is simply that it can be very convenient. This practice keeps the tests, documentation and code all in the same place, where it can all be located easily. If the docstring becomes home to too many tests, this can destroy its utility as documentation. This should be avoided; if you find yourself with so many tests in the docstrings that they aren't useful as a quick reference, move most of them to a separate file. Time for action – embedding a doctest in a docstring We'll embed a test right inside the Python source file that it tests, by placing it inside a docstring. Create a file called test.py with the following contents: def testable(x): r""" The `testable` function returns the square root of its parameter, or 3, whichever is larger. >>> testable(7) 3.0 >>> testable(16) 4.0 >>> testable(9) 3.0 >>> testable(10) == 10 ** 0.5 True """ if x < 9: return 3.0 return x ** 0.5 At the command prompt, change to the directory where you saved test.py and then run the tests by typing: $ python -m doctest test.py As mentioned earlier before, if you have an older version of Python, this isn't going to work for you. Instead, you need to type python -c "__import__('doctest').testmod(__import__('test'))" If everything worked, you shouldn't see anything at all. If you want some confirmation that doctest is doing something, turn on verbose reporting by changing the command to: python -m doctest -v test.py For older versions of Python, instead use python -c "__import__('doctest').testmod(__import__('test'), verbose=True)" What just happened You put the doctest right inside the docstring of the function it was testing. This is a good place for tests that also show a user how to do something. It's not a good place for detailed, low-level tests (the above example, which was quite detailed for illustrative purposes, is skirting the edge of being too detailed), because docstrings need to serve as API documentation. You can see the reason for this just by looking back at the example, where the doctests take up most of the room in the docstring, without telling the readers any more than they would have learned from a single test. Any test that will serve as good API documentation is a good candidate for including in the docstrings. Notice the use of a raw string for the docstring (denoted by the r character before the first triple-quote). Using raw strings for your docstrings is a good habit to get into, because you usually don't want escape sequences—e.g. n for newline—to be interpreted by the Python interpreter. You want them to be treated as text, so that they are correctly passed on to doctest. Doctest directives Embedded doctests can accept exactly the same directives as doctests in text files can, using exactly the same syntax. Because of this, all of the doctest directives that we discussed before can also be used to aff ect the way embedded doctests are evaluated. Execution scope Doctests embedded in docstrings have a somewhat different execution scope than doctests in text files do. Instead of having a single scope for all of the tests in the file, doctest creates a single scope for each docstring. All of the tests that share a docstring, also share an execution scope, but they're isolated from tests in other docstrings. The separation of each docstring into its own execution scope often means that we don't need to put much thought into isolating doctests, when they're embedded in docstrings. That is fortunate, since docstrings are primarily intended for documentation, and the tricks needed to isolate the tests might obscure the meaning. Putting it in practice: an AVL tree We'll walk step-by-step through the process of using doctest to create a testable specification for a data structure called an AVL Tree. An AVL tree is a way to organize key-value pairs, so that they can be quickly located by key. In other words, it's a lot like Python's built-in dictionary type. The name AVL references the initials of the people who invented this data structure. As its name suggests, an AVL tree organizes the keys that are stored in it into a tree structure, with each key having up to two child keys—one child key that is less than the parent key by comparison, and one that is more. In the following picture, the key Elephant has two child keys, Goose has one, and Aardvark and Frog both have none. The AVL tree is special, because it keeps one side of the tree from getting much taller than the other, which means that users can expect it to perform reliably and efficiently no matter what. In the previous image, an AVL tree would reorganize to stay balanced if Frog gained a child. We'll write tests for an AVL tree implementation here, rather than writing the implementation itself. Therefore, we'll elaborate over the details of how an AVL tree works, in favor of looking at what it should do when it works right If you want to know more about AVL Trees, you will find many good references on the Internet. Wikipedia's entry on the subject is a good place to start with:http://en.wikipedia.org/wiki/AVL_tree. We'll start with a plain language specification, and then interject tests between the paragraphs. You don't have to actually type all of this into a text file; it is here for you to read and to think about. English specification The first step is to describe what the desired result should be, in normal language. This might be something that you do for yourself, or something that somebody else does for you. If you're working for somebody, hopefully you and your employer can sit down together and work this part out. In this case, there's not much to work out, because AVL Trees have been fully described for decades. Even so, the description here isn't quite like one you'd find anywhere else. This capacity for ambiguity is exactly the reason why a plain language specification isn't good enough. We need an unambiguous specification, and that's exactly what the tests in a doctest file can give us. The following text goes in a file called AVL.txt, (which you can find in its final form in the accompanying code archive. At this stage of the process, the file contains only the normal language specification.): An AVL Tree consists of a collection of nodes organized in a binarytree structure. Each node has left and right children, each of whichmay be either None or another tree node. Each node has a key, whichmust be comparable via the less-than operator. Each node has a value.Each node also has a height number, measuring how far the node is frombeing a leaf of the tree -- a node with height 0 is a leaf.The binary tree structure is maintained in ordered form, meaning thatof a node's two children, the left child has a key that comparesless than the node's key and the right child has a key that comparesgreater than the node's key.The binary tree structure is maintained in a balanced form, meaningthat for any given node, the heights of its children are either thesame or only differ by 1.The node constructor takes either a pair of parameters representinga key and a value, or a dict object representing the key-value pairswith which to initialize a new tree.The following methods target the node on which they are called, andcan be considered part of the internal mechanism of the tree:Each node has a recalculate_height method, which correctly sets theheight number.Each node has a make_deletable method, which exchanges the positionsof the node and one of its leaf descendants, such that the the treeordering of the nodes remains correct.Each node has rotate_clockwise and rotate_counterclockwise methods.Rotate_clockwise takes the node's right child and places it wherethe node was, making the node into the left child of its own formerchild. Other nodes in the vicinity are moved so as to maintainthe tree ordering. The opposite operation is performed by rotate_counterclockwise.Each node has a locate method, taking a key as a parameter, whichsearches the node and its descendants for a node with the specifiedkey, and either returns that node or raises a KeyError.The following methods target the whole tree rooted at the currentnode. The intent is that they will be called on the root node:Each node has a get method taking a key as a parameter, which locatesthe value associated with the specified key and returns it, or raisesKeyError if the key is not associated with any value in the tree.Each node has a set method taking a key and a value as parameters, andassociating the key and value within the tree.Each node has a remove method taking a key as a parameter, andremoving the key and its associated value from the tree. It raisesKeyError if no values was associated with that key. Node data The first three paragraphs of the specification describe the member variables of a AVL tree node, and tell us what the valid values for the variables are. They also tell us how tree height should be measured and define what a balanced tree means. It's our job now to take up those ideas, and encode them into tests that the computer can eventually use to check our code. We could check these specifications by creating a node and then testing the values, but that would really just be a test of the constructor. It's important to test the constructor, but what we really want to do is to incorporate checks that the node variables are left in a valid state into our tests of each member function. To that end, we'll define a function that our tests can call to check that the state of a node is valid. We'll define that function just after the third paragraph: Notice that this test is written as if the AVL tree implementation already existed. It tries to import an avl_tree module containing an AVL class, and it tries to use the AVL class is specific ways. Of course, at the moment there is no avl_tree module, so the test will fail. That's as it should be. All that the failure means is that, when the ti me comes to implement the tree, we should do so in a module called avl_tree, with contents that function as our test assumes. Part of the benefit of testing like this is being able to test-drive your code before you even write it. >>> from avl_tree import AVL>>> def valid_state(node):... if node is None:... return... if node.left is not None:... assert isinstance(node.left, AVL)... assert node.left.key < node.key... left_height = node.left.height + 1... else:... left_height = 0...... if node.right is not None:... assert isinstance(node.right, AVL)... assert node.right.key > node.key... right_height = node.right.height + 1... else:... right_height = 0...... assert abs(left_height - right_height) < 2... node.key < node.key... node.value>>> def valid_tree(node):... if node is None:... return... valid_state(node)... valid_tree(node.left)... valid_tree(node.right) Notice that we didn't actually call those functions yet. They aren't tests, per se, but tools that we'll use to simplify writing tests. We define them here, rather than in the Python module that we're going to test, because they aren't conceptually part of the tested code, and because anyone who reads the tests will need to be able to see what the helper functions do. Constructor The fourth paragraph describes the constructor for an AVL node: The node constructor takes either a pair of parameters representing a key and a value, or a dict object representing the key-value pairs with which to initialize a new tree. The constructor has two possible modes of operation: it can either create a single initialized node or it can create and initialize a whole tree of nodes. The test for the single node mode is easy: >>> valid_state(AVL(2, 'Testing is fun')) The other mode of the constructor is a problem, because it is almost certain that it will be implemented by creating an initial tree node and then calling its set method to add the rest of the nodes. Why is that a problem? Because we don't want to test the set method here: this test should be focused entirely on whether the constructor works correctly, when everything it depends on works. In other words, the tests should be able to assume that everything outside of the specific chunk of code being tested works correctly. However, that's not always a valid assumption. So, how can we write tests for things that call on code outside of what's being tested? There is a solution for this problem. For now, we'll just leave the second mode of operation of the constructor untested.
Read more
  • 0
  • 0
  • 2536
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €14.99/month. Cancel anytime
article-image-unittest-python
Packt
21 Jan 2010
11 min read
Save for later

Unittest in Python

Packt
21 Jan 2010
11 min read
So let's get on with it! Basic unittest Before we start talking about new concepts and features, let's take a look at how to use unittest to express the ideas that we've already learned about. That way, we'll have something solid to ground our new understanding into. Time for action – testing PID with unittest We'll visit the PID class (or at least the tests for the PID class). We'll write the tests so that they operate within the unittest framework. We'll be implementing the tests using the unittest framework. Create a new file called test_pid.py in the same directory as pid.py. Notice that this is a .py file: unittest tests are pure python source code, rather than being plain text with source code embedded in it. That means the tests will be less useful from a documentary point of view, but grants other benefits in exchange. Insert the following code into your newly-created test_pid.py (and please note that a few lines are long enough to get wrapped on the article's page): from unittest import TestCase, mainfrom mocker import Mockerimport pidclass test_pid_constructor(TestCase): def test_without_when(self): mocker = Mocker() mock_time = mocker.replace('time.time') mock_time() mocker.result(1.0) mocker.replay() controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=0, initial=12) mocker.restore() mocker.verify() self.assertEqual(controller.gains, (0.5, 0.5, 0.5)) self.assertAlmostEqual(controller.setpoint[0], 0.0) self.assertEqual(len(controller.setpoint), 1) self.assertAlmostEqual(controller.previous_time, 1.0) self.assertAlmostEqual(controller.previous_error, -12.0) self.assertAlmostEqual(controller.integrated_error, 0) def test_with_when(self): controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=1, initial=12, when=43) self.assertEqual(controller.gains, (0.5, 0.5, 0.5)) self.assertAlmostEqual(controller.setpoint[0], 1.0) self.assertEqual(len(controller.setpoint), 1) self.assertAlmostEqual(controller.previous_time, 43.0) self.assertAlmostEqual(controller.previous_error, -11.0) self.assertAlmostEqual(controller.integrated_error, 0)class test_calculate_response(TestCase): def test_without_when(self): mocker = Mocker() mock_time = mocker.replace('time.time') mock_time() mocker.result(1.0) mock_time() mocker.result(2.0) mock_time() mocker.result(3.0) mock_time() mocker.result(4.0) mock_time() mocker.result(5.0) mocker.replay() controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=0, initial=12) self.assertEqual(controller.calculate_response(6), -3) self.assertEqual(controller.calculate_response(3), -4.5) self.assertEqual(controller.calculate_response(-1.5), -0.75) self.assertEqual(controller.calculate_response(‑2.25), ‑1.125) mocker.restore() mocker.verify() def test_with_when(self): controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=0, initial=12, when=1) self.assertEqual(controller.calculate_response(6, 2), -3) self.assertEqual(controller.calculate_response(3, 3), -4.5) self.assertEqual(controller.calculate_response(‑1.5, 4), ‑0.75) self.assertEqual(controller.calculate_response(‑2.25, 5), ‑1.125)if __name__ == '__main__': main() Run the tests by typing: $ python test_pid.py What just happened? Let's go through the code section and see what each part does. After that, we'll talk about what it all means when put together. from unittest import TestCase, mainfrom mocker import Mockerimport pidclass test_pid_constructor(TestCase): def test_without_when(self): mocker = Mocker() mock_time = mocker.replace('time.time') mock_time() mocker.result(1.0) mocker.replay() controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=0, initial=12) mocker.restore() mocker.verify() self.assertEqual(controller.gains, (0.5, 0.5, 0.5)) self.assertAlmostEqual(controller.setpoint[0], 0.0) self.assertEqual(len(controller.setpoint), 1) self.assertAlmostEqual(controller.previous_time, 1.0) self.assertAlmostEqual(controller.previous_error, -12.0) self.assertAlmostEqual(controller.integrated_error, 0) After a little bit of setup code, we have a test that the PID controller works correctly when not given a when parameter. Mocker is used to replace time.time with a mock that always returns a predictable value, and then we use several assertions to confirm that the attributes of the controller have been initialized to the expected values. def test_with_when(self): controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=1, initial=12, when=43) self.assertEqual(controller.gains, (0.5, 0.5, 0.5)) self.assertAlmostEqual(controller.setpoint[0], 1.0) self.assertEqual(len(controller.setpoint), 1) self.assertAlmostEqual(controller.previous_time, 43.0) self.assertAlmostEqual(controller.previous_error, -11.0) self.assertAlmostEqual(controller.integrated_error, 0) This test confirms that the PID constructor works correctly when the when parameter is supplied. Unlike the previous test, there's no need to use Mocker, because the outcome of the test is not supposed to be dependant on anything except the parameter values—the current time is irrelevant. class test_calculate_response(TestCase): def test_without_when(self): mocker = Mocker() mock_time = mocker.replace('time.time') mock_time() mocker.result(1.0) mock_time() mocker.result(2.0) mock_time() mocker.result(3.0) mock_time() mocker.result(4.0) mock_time() mocker.result(5.0) mocker.replay() controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=0, initial=12) self.assertEqual(controller.calculate_response(6), -3) self.assertEqual(controller.calculate_response(3), -4.5) self.assertEqual(controller.calculate_response(-1.5), -0.75) sel+f.assertEqual(controller.calculate_response(‑2.25), ‑1.125) mocker.restore() mocker.verify() The tests in this class describe the intended behavior of the calculate_response method. This first test checks the behavior when the optional when parameter is not supplied, and mocks time.time to make that behavior predictable. def test_with_when(self): controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=0, initial=12, when=1) self.assertEqual(controller.calculate_response(6, 2), -3) self.assertEqual(controller.calculate_response(3, 3), -4.5) self.assertEqual(controller.calculate_response(‑1.5, 4), ‑0.75) self.assertEqual(controller.calculate_response(‑2.25, 5), ‑1.125) In this test, the when parameter is supplied, so there is no need to mock time.time. We just have to check that the result is what we expected. The actual tests that we performed are the same ones that were written in the doctest. So far, all that we see is a different way of expressing them. The first thing to notice is that the test file is divided up into classes that inherit from unittest.TestCase, each of which contains one or more test methods. The name of each test method begins with the word test, which is how unittest recognizes that they are tests. Each test method embodies a single test of a single unit. This gives us a convenient way to structure our tests, grouping together related tests into the same class, so that they're easier to find. Putting each test into its own method means that each test executes in an isolated namespace, which makes it somewhat easier to keep unittest‑style tests from interfering with each other, relative to doctest‑style tests. It also means that unittest knows how many unit tests are in your test file, instead of simply knowing how many expressions there are (you may have noticed that doctest counts each >>> line as a separate test). Finally, putting each test in its own method means that each test has a name, which can be a valuable feature. Tests in unittest don't directly care about anything that isn't part of a call to one of the assert methods of TestCase. That means that when we're using Mocker, we don't have to be bothered about the mock objects that get returned from demonstration expressions, unless we want to use them. It also means that we need to remember to write an assert describing every aspect of the test that we want to have checked. We'll go over the various assertion methods of TestCase shortly. Tests aren't of much use, if you can't execute them. For the moment, the way we'll be doing that is by calling unittest.main when our test file is executed as a program by the Python interpreter. That's about the simplest way to run unittest code, but it's cumbersome when you have lots of tests spread across lots of files. if __name__ == '__main__': might look strange to you, but its meaning is fairly straight forward. When Python loads any module, it stores that module's name in a variable called __name__ within the module (unless the module is the one passed to the interpreter on the command line). That module always gets the string '__main__' bound to its __name__ variable. So, if __name__ == '__main__': means—if this module was executed directly from the command line. Assertions Assertions are the mechanism that we use to tell unittest what the important outcomes of the test are. By using appropriate assertions, we can tell unittest exactly what to expect from each test. assertTrue When we call self.assertTrue(expression), we're telling unittest that the expression must be true in order for the test to be a success. This is a very flexible assertion, since you can check for nearly anything by writing the appropriate boolean expression. It's also one of the last assertions you should consider using, because it doesn't tell unittest anything about the kind of comparison you're making, which means that unittest can't tell you as clearly what's gone wrong if the test fails. For an example of this, consider the following test code which contains two tests that are guaranteed to fail: from unittest import TestCase, mainclass two_failing_tests(TestCase): def test_assertTrue(self): self.assertTrue(1 == 1 + 1) def test_assertEqual(self): self.assertEqual(1, 1 + 1)if __name__ == '__main__': main() It might seem like the two tests are interchangeable, since both test the same thing. Certainly they'll both fail (or in the unlikely event that one equals two, they'll both pass), so why prefer one over the other? Take a look at what happens when we run the tests (and also notice that the tests were not executed in the same order as they were written; tests are totally independent of each other, so that's okay, right?): Do you see the difference? The assertTrue test was able to correctly determine that the test should fail, but it didn't know enough to report any useful information about why it failed. The assertEqual test, on the other hand, knew first of all that it was checking that two expressions were equal, and second it knew how to present the results, so that they would be most useful: by evaluating each of the expressions that it was comparing and placing a != symbol between the results. It tells us both what expectation failed, and what the relevant expressions evaluate to. assertFalse The assertFalse method will succeed when the assertTrue method would fail, and vice versa. It has the same limits in terms of producing useful output that assertTrue has, and the same flexibility in terms of being able to test nearly any condition. assertEqual As mentioned in the assertTrue discussion, the assertEqual assertion checks that its two parameters are in fact equal, and reports a failure if they are not, along with the actual values of the parameters. assertNotEqual The assertNotEqual assertion fails whenever the assertEqual assertion would have succeeded, and vice versa. When it reports a failure, its output indicates that the values of the two expressions are equal, and provides you with those values. assertAlmostEqual As we've seen before, comparing floating point numbers can be troublesome. In particular, checking that two floating point numbers are equal is problematic, because things that you might expect to be equal—things that, mathematically, are equal—may still end up differing down among the least significant bits. Floating point numbers only compare equal when every bit is the same. To address that problem, unittest provides assertAlmostEqual, which checks that two floating point values are almost the same; a small amount of difference between them is tolerated. Lets look at this problem in action. If you take the square root of 7, and then square it, the result should be 7. Here's a pair of tests that check that fact: from unittest import TestCase, mainclass floating_point_problems(TestCase): def test_assertEqual(self): self.assertEqual((7.0 ** 0.5) ** 2.0, 7.0) def test_assertAlmostEqual(self): self.assertAlmostEqual((7.0 ** 0.5) ** 2.0, 7.0) if __name__ == '__main__': main() The test_assertEqual method checks that , which is true in reality. In the more specialized number system available to computers, though, taking the square root of 7 and then squaring it doesn't quite get us back to 7, so this test will fail. More on that in a moment. Test test_assertAlmostEqual method checks that , which even the computer will agree is true, so this test should pass. Running those tests produces the following, although the specific number that you get back instead of 7 may vary depending on the details of the computer the tests are being run on: Unfortunately, floating point numbers are not precise, because the majority of numbers on the real number line can not be represented with a finite, non-repeating sequence of digits, much less a mere 64 bits. Consequently, what you get back from evaluating the mathematical expression is not quite 7. It's close enough for government work though—or practically any other sort of work as well—so we don't want our test to quibble over that tiny difference. Because of that, we should use assertAlmostEqual and assertNotAlmostEqual when we're comparing floating point numbers for equality. This problem doesn't generally carry over into other comparison operators. Checking that one floating point number is less than the other, for example, is very unlikely to produce the wrong result due to insignificant errors. It's only in cases of equality that this problem bites us.
Read more
  • 0
  • 0
  • 2455

article-image-developing-applications-jboss-and-hibernate-part-1
Packt
19 Jan 2010
4 min read
Save for later

Developing Applications with JBoss and Hibernate: Part 1

Packt
19 Jan 2010
4 min read
Introducing Hibernate Hibernate provides a bridge between the database and the application by persisting application objects in the database, rather than requiring the developer to write and maintain lots of code to store and retrieve objects. The main configuration file, hibernate.cfg.xml, specifies how Hibernate obtains database connections, either from a JNDI DataSource or from a JDBC connection pool. Additionally, the configuration file defines the persistent classes, which are backed by mapping definition files. This is a sample hibernate.cfg.xml configuration file that is used to handle connections to a MySQL database, mapping the com.sample.MySample class. <hibernate-configuration><session-factory><property name="connection.username">user</property><property name="connection.password">password</property><property name="connection.url"> jdbc:mysql://localhost/database</property><property name="connection.driver_class"> com.mysql.jdbc.Driver</property><property name="dialect"> org.hibernate.dialect.MySQLDialect</property><mapping resource="com/sample/MyClass.hbm.xml"/></session-factory></hibernate-configuration> From our point of view, it is important to know that Hibernate applications can coexist in both the managed environment and the non-managed environment. An application server is a typical example of a managed environment that provides services to hosting applications, such as connection pooling and transaction. On the other hand, a non-managed application refers to standalone applications, such as Swing Java clients that typically lack any built-in service. In this article, we will focus on managed environment applications, installed on JBoss Application Server. You will not need to download any library to your JBoss installation. As a matter of fact, JBoss persistence layer is designed around Hibernate API, so it already contains all the core libraries. Creating a Hibernate application You can choose different strategies for building a Hibernate application. For example, you could start building Java classes and map files from scratch, and then let Hibernate generate the database schema accordingly. You can also start from a database schema and reverse engineer it into Java classes and Hibernate mapping files. We will choose the latter option, which is also the fastest. Here's an overview of our application. In this example, we will design an employee agenda divided into departments. The persistence model will be developed with Hibernate, using the reverse engineering facet of JBoss tools. We will then need an interface for recording our employees and departments, and to query them as well. The web interface will be developed using a simple Model-View-Controller (MVC) pattern and basic JSP 2.0 and servlet features. The overall architecture of this system resembles the AppStore application that has been used to introduce JPA. As a matter of fact, this example can be used to compare the two persistence models and to decide which option best suits your project needs. We have added a short section at the end of this example to stress a few important points about this choice. Setting up the database schema The overall architecture of this system resembles the AppStore application that has been used to introduce JPA. As a matter of fact, this example can be used to compare the two persistence models and to decide which option best suits your project needs. We have added a short section at the end of this example to stress a few important points about this choice. CREATE schema hibernate;GRANT ALL PRIVILEGES ON hibernate.* TO 'jboss'@'localhost' WITH GRANTOPTION;CREATE TABLE `hibernate`.`department` (`department_id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,`department_name` VARCHAR(45) NOT NULL,PRIMARY KEY (`department_id`))ENGINE = InnoDB;CREATE TABLE `hibernate`.`employee` (`employee_id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,`employee_name` VARCHAR(45) NOT NULL,`employee_salary` INTEGER UNSIGNED NOT NULL,`employee_department_id` INTEGER UNSIGNED NOT NULL,PRIMARY KEY (`employee_id`),CONSTRAINT `FK_employee_1` FOREIGN KEY `FK_employee_1` (`employee_department_id`)REFERENCES `department` (`department_id`)ON DELETE CASCADEON UPDATE CASCADE)ENGINE = InnoDB; With the first Data Definition Language (DDL) command, we have created a schema named Hibernate that will be used to store our tables. Then, we have assigned the necessary privileges on the Hibernate schema to the user jboss. Finally, we created a table named department that contains the list of company units, and another table named employee that contains the list of workers. The employee table references the department with a foreign key constraint.
Read more
  • 0
  • 0
  • 2679

article-image-developing-applications-jboss-and-hibernate-part-2
Packt
19 Jan 2010
6 min read
Save for later

Developing Applications with JBoss and Hibernate: Part 2

Packt
19 Jan 2010
6 min read
Adding a web client to your project There are several ways to test our Hibernate application. The simplest of all is adding a web application, which is packaged in an enterprise application along with the Hibernate application. Create a new dynamic web project named HibernateWeb. The first step, before adding servlets and JSPs is linking the HibernateProject libraries to your web application, otherwise, you will not be able to reference the Hibernate POJOs. Right-click on your project and select Properties. Reach the Java Build Path option and select the tab Projects. From there add HibernateProject. Let's move on. This project will contain a main servlet that acts as a controller, and a few JPSs for the client view. We will start by adding com.packtpub.hibernateWeb.HibernateServlet to our project. In the following snippet, you can see the core section of the servlet. Here, we will not detail the Controller logic, which is straightforward if you have some rudiments of the MVC pattern; rather we want to highlight the most interesting part of it, which is how to query and persist Hibernate objects. public class HibernateServlet extends HttpServlet {private SessionFactory getSessionFactory() {return (SessionFactory)getServletContext().getAttribute("sessionFactory");}public void init() { [1]if (getSessionFactory() != null) {return;}InitialContext ctx;try {ctx = new InitialContext();factory = (SessionFactory)ctx.lookup("java:/hibernate/SessionFactory");getServletContext().setAttribute("sessionFactory", factory);}catch (NamingException e) {e.printStackTrace();}}private String saveEmployee(HttpServletRequest request) {Session hsession=null;String name=request.getParameter("name");String salary=request.getParameter("salary");String departmentId=request.getParameter("departmentId");try {hsession = getSessionFactory().openSession();hsession.beginTransaction();Query query = hsession.createQuery("from Department d whered.departmentId = :departmentId"); [2]query.setInteger("departmentId", new Integer(departmentId));Department dep = (Department) query.uniqueResult();Employee emp = new Employee();emp.setDepartment(dep);emp.setEmployeeName(name);emp.setEmployeeSalary(Integer.parseInt(salary));hsession.save(emp); [3]hsession.getTransaction().commit();}catch (Exception e) {// TODO Auto-generated catch block e.printStackTrace();hsession.getTransaction().rollback();}finally {if (hsession.isOpen())hsession.close();}return employeeList(request);}private String employeeList(HttpServletRequest request) {Session hsession=null;Department dep;try {hsession = getSessionFactory().openSession();Query query = hsession.createQuery("select p from Employee pjoin fetch p.department c"); [4]List <Employee>list = query.list();request.setAttribute("employee", list);}catch (Exception e) {e.printStackTrace();}finally {if (hsession.isOpen())hsession.close();}return "/listEmployees.jsp";}private String saveDepartment(HttpServletRequest request) {String depName=request.getParameter("depName");Session hsession=null;Department dep;try {hsession = getSessionFactory().openSession();hsession.beginTransaction();dep = new Department();dep.setDepartmentName(depName);hsession.save(dep); [5]hsession.getTransaction().commit();}catch (Exception e) {// TODO Auto-generated catch blocke.printStackTrace();hsession.getTransaction().rollback();}finally {if (hsession.isOpen())hsession.close();}return employeeList(request);}} As you can see from the preceding code, we recover the SessionFactory from the JNDI tree in the init() [1] method of the servlet. Instances of SessionFactory are thread-safe and typically shared throughout an application, so we store it in the ServletContext and share it among all servlet instances. The SessionFactory is subsequently used to start a Hibernate session, which is not thread-safe and should only be used for a single transaction or unit of work in an application. In order to store our Employee, in the saveEmployee method, we first retrieve the corresponding Department from our schema [2], and finally the Employee is saved [3] and the transaction is committed. The list of employees is fetched by the employeeList method. Notice we are using a join fetch statement to retrieve all the employees [4], which will be routed to the listEmployees.jsp view. Why? The answer is that with the default fetch mode (Lazy), once the Hibernate session is closed, the client will not be able to navigate through the department field of the Employee. The common solution to this issue is switching to the EAGER fetch mode that reads the related fields (in our case department) in memory, as soon as we query the Employee table. You have more than one option to achieve this. One possible solution, if you don't want to change the default fetch mode for the Employee table, is to build an ad hoc query that forces Hibernate to read also the fields in relation with the Employee table. "select p from Employee p join fetch p.department c" If you prefer to use the XML class files to configure the fetch mode, you can also change the lazy="true" attribute in the employee-department relationship. The last method, saveDepartment [5] takes care to persist a new Department in the corresponding table. We complete our excursus on the web tier with the listEmployees.jsp that is used to display a tabular view of the employees: <%@ taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c" %><html><script language="JavaScript">function doSubmit(url) {document.module.action = url;document.module.submit();}</script><body><table border="1"><tr><th>Name</th><th>Salary</th> <TH>department</th></tr><c:forEach items="${employee}" var="emp"><tr><td> <c:out value="${emp.employeeName}"/> </td><td> <c:out value="${emp.employeeSalary}"/></td><td> <c:out value="${emp.department.departmentName}"/></td></tr></c:forEach></table><form name="module" method="POST"><input type="button" value ="New Employee"onClick="doSubmit('actionServlet?op=newEmployee')"><input type="button" value ="New Department"onClick="doSubmit('actionServlet?op=newDepartment')"></form></body></html> This page uses JSP 2.0 Expression Language (EL) to iterate through the list of employees, as highlighted in the last code snippet. We have also hightlighted the taglib directive, at the beginning of the page. This directive will be used to resolve the JSTL core set of libraries that ships with JBoss AS in the server/xxx/deploy/jbossweb.sar/jstl.jar library. (Eclipse does not contain references to this library when you create a web project; you have to add jstl.jar to your build path, otherwise Eclipse will mark it as an error. However, that's only a visual annoyance because the JBoss Web container has got everything it needs to run JSTL.) The complete web application is available on the Packtpub website (http://www.packtpub.com) and includes two additional JSPs for entering the employee (newEmployee.jsp) and department (newDepartment.jsp) data, plus one placeholder index.jsp that merely forwards to the Hibernate servlet.
Read more
  • 0
  • 0
  • 1163

article-image-setting-tools-build-applications-using-jbpm-part-1
Packt
18 Jan 2010
15 min read
Save for later

Setting Up Tools to Build Applications Using jBPM: Part 1

Packt
18 Jan 2010
15 min read
Background about the jBPM project In this section, we will talk about where the jBPM framework is located inside the JBoss projects. As we know, JBoss jBPM was created and maintained for JBoss. JBoss is in charge of developing middleware "enterprise" software in Java. It is middleware because it is a type of software to make or run software, and "enterprise", as it is focused on big scenarios. This enterprise does not necessarily mean Java EE. It is also interesting to know that JBoss was bought from a company called Red Hat (famous for the Linux distribution with the same name, and also in charge of the Fedora community distribution). In order to get the right first impression about the framework, you will need to know a little about other products that JBoss has developed and where this framework is located and focused inside the company projects. At this moment, the only entry point that we have is the JBoss community page, http://www.jboss.org/. This page contains the information about all the middleware projects that JBoss is developing (all open source). If we click on the Projects link in the top menu, we are going to be redirected to a page that shows us the following image: This image shows us one important major central block for the JBoss Application Server, which contains a lot of projects intended to run inside this application server. The most representative modules are: JBoss Web: The web container based on Tomcat Web Server JBoss EJB3: EJB3 container that is standard EJB3 compliance for Java EE 5 Hibernate: The world-renowned Object Relational Mapping (ORM) framework Seam: The new web framework to build rich Internet applications JBoss Messaging: The default JMS provider that enables high performance, scalable, clustered messaging for Java On top of that, we can see two frameworks for Web Interface design (RichFaces/Ajax4jsf and Gravel) based on the components, which can be used in any web application that you code. And then, on top of it all, we can see three important blocks—Portal, Integration, and Telecom. As you can imagine, we are focused on the Integration block that contains three projects inside it. As you can see, this Integration block is also outside the JBoss Application Server boundaries. Therefore, we might suppose that these three products will run without any dependency from JBoss or any other application server. Now we are going to talk about these three frameworks, which have different focuses inside the integration field. JBoss Drools Drools is, of late, focused on business knowledge, and because it was born as an inference engine, it will be in charge of using all that business knowledge in order to take business actions based on this knowledge for a specific situation. You can find out more information about this framework (now redefined as Business Logic integration Platform) at http://www.drools.org. JBoss ESB It is a product focused on supplying an Enterprise Service Bus (ESB), which allows us to use different connectors to communicate with heterogeneous systems that were created in different languages. These use different protocols for communication. You can find out more information about this project at http://www.jboss.org/jbossesb/. JBoss jBPM jBPM has a process-centric philosophy. This involves all the APIs and tools that are related to the processes and how to manage them. The framework perspective is always centered on the business process that we describe. Also, the services available inside the framework are only for manipulating the processes. All the other things that we want or need for integration with our processes will be delegated to third-party frameworks or tools. Now, if we enter into the official page of jBPM (http://www.jbpm.org), we are going to see all the official information and updates about the framework. It is important to notice the home page, which shows us the following image: This is the first image that developers see when they get interested in jBPM. This image shows us the component distribution inside the jBPM framework project. Understanding these building blocks (components) will help us to understand the code of the framework and each part's functionality. Most of the time, this image is not clearly understood, so let's analyze it! Supported languages One of the important things that the image shows is the multi-language support for modeling processes in different scenarios. We can see that three languages are currently supported/proposed by the framework with the possibility to plug in new languages that we need, in order to represent our business processes with extra technology requirements. These supported languages are selected according to our business scenario and the technology that this scenario requires. The most general and commonly used language is jBPM Process Definition Language (jPDL). This language can be used in  situations where we are defining the project architecture and the technology that the project will use. In most of the cases, jPDL will be the correct choice, because it brings the flexibility to model any kind of situation, the extensibility to expand our process language with new words to add extra functionality to the base implementation, and no technology pluggability limitation, thereby allowing us to interact with any kind of external services and systems. That is why jPDL can be used in almost all situations. If you don't have any technology restriction in your requirements, this language is recommended. jBPM also implements the Business Process Execution Language (BPEL), which is broadly used to orchestrate Web Services classes between different systems. To support business scenarios where all the interactions are between web services, I recommend that you make use of this language, only if you are restricted to using a standard like BPEL, in order to model your business process. PageFlow is the last one shown in the image. This language will be used when you use the JBoss Seam framework and want to describe how your web pages are synchronized to fulfill some requirements. These kind of flows are commonly used to describe navigation flow possibilities that a user will have in a website. Web applications will benefit enormously from this, because the flow of the web application will be decoupled from the web application code, letting us introduce changes without modifying the web pages themselves. At last, the language pluggability feature is represented with the ellipse (...). This will be required in situations wherein the available languages are not enough to represent our business scenarios. This could happen when a new standard like BPEL or  BPMN arises, or if our company has its own language to represent business processes. In these kind of situations, we will need to implement our custom language on top of the process' virtual machine. This is not an easy task and it is important for you to know that it is not a trivial thing to implement an entire language. So, here we will be focused on learning jPDL in depth, to understand all of its features and how to extend it in order to fulfill our requirements. Remember that jPDL is a generic language that allows us to express almost every situation. In other words, the only situation where jPDL doesn't fit is where the process definition syntax doesn't allow us to represent our business process or where the syntax needs to follow a standard format like BPMN or BPEL. Also, it is important to notice that all these languages are separate from the Process Virtual Machine (PVM), the block on the bottom-left of the image, which will execute our defined process. PVM is like the core of the framework and understands all the languages that are defined. This virtual machine will know how to execute them and how to behave for each activity in different business scenarios. When we begin to understand the jPDL language in depth, we will see how PVM behaves for each activity described in our process definitions. Other modules Besides the PVM and all the languages, we can also see some other modules that implement extra functionality, which will help us with different requirements. The following list contains a brief description of each module: Graphical Process Designer (GPD) module: It is the graphical process designer module implemented as an Eclipse plugin. Identity module: This module is a proof of concept, out-of-the-box working module used to integrate business roles for our processes. This module is focused on letting us represent people/users inside the process definition and execution. This module shows us a simple structure for users and groups that can be used inside our processes. For real scenarios, this module will help us to understand how we will map our users' structures with the jBPM framework. Task ManaGeMenT (TaskMGMT) module: This module's functionality involves dealing with all the integration that the people/employees/business roles have with the processes. This module will help us to manage all the necessary data to create application clients, which the business roles will use in their everyday work. Enterprise module: This module brings us extra functionality for enterprise environments. Now that we know how the components are distributed inside the framework, we can jump to the jPDL section of jBPM's official web page. Here we will find the third image that all the developers will see when they get started with jBPM. Let's analyze this image to understand why and how the framework can be used in different platforms. This image tries to give us an example of how jBPM could be deployed on a web server or an application server. Please, keep in mind that this is not the only way that jBPM could be deployed on, or embedded in, an application, because jBPM can also be used in a standalone application. In addition, this image shows us some of the BPM stages that are implemented. For example, we can see how the designed processes will be formalized in the jPDL XML syntax in Graphical Process Designer (GPD)— here called the Eclipse jPDL Editor. On the other side of the image, we can see the execution stage implemented inside a container that could be an Enterprise Container (such as JBoss Application Server) or just a web server (such as Tomcat or Jetty). This distinction is made with the extensions of the deployed files (war, for Web Archives, and ear, for Enterprise Archives). In this container, it is important to note the jpdl-jbpm.jar archive that contains the PVM and the language definition, which lets us understand the process defined in jPDL. Also, we have the jbpm-identity.jar as a result of the Identity Module that we have seen in the other image. Besides, we have the hibernate.jar dependency. This fact is very important to note, because our processes will be persisted with Hibernate and we need to know how to adapt this to our needs. The last thing that we need to see is the Firefox/Internet Explorer logo on top of the image, which tries to show us how our clients (users), all the people who interact and make activities in our processes will talk (communicate) with the framework. Once again, HTTP interaction is not the only way to interact with the processes, we can implement any kind of interactions (such as JMS for enterprise messaging, Web Services to communicate with heterogeneous systems, mails for some kind of flexibility, SMS, and so on). Here we get a first impression about the framework, now we are ready to go ahead and install all the tools that we need, in order to start building applications. Tools and software For common tools such as Java Development Kit, IDE installation, database installation, and so on, only the key points will be discussed. In jBPM tooling, a detailed explanation will follow the download and installation process. We will be going into the structure detail and specification in depth; about how and why we are doing this installation. If you are an experienced developer, you can skip this section and go directly to the jBPM installation section. In order to go to the jBPM installation section straightaway, you will need to have the following software installed correctly: Java Development Kit 1.5 or higher (This is the first thing that Java developers learn. If you don't know how to install it, please take a look at the following link: http://java.sun.com/javase/6/webnotes/install/index.html.) Maven 2.0.9 or higher A Hibernate supported database, here we will use MySQL You will need to have downloaded the Java Connector for your selected database JBoss 5.0.1.GA installed (If you are thinking about creating Enterprise Applications, you will need JBoss AS installed. If you only want to create web applications with Tomcat or Jetty installed, this will be fine.) Eclipse IDE 3.4 Ganymede (Eclipse IDE 3.4 Ganymede is the suggested version. You can try it with other versions, but this is the one tested in the article.) An SVN client, here we will use Tortoise SVN (Available for Windows only, you can also use a subversion plugin for Eclipse or for your favorite IDE.) If you have all this software up and running, you can jump to the next section. If not, here we will see a brief introduction of each one of them with some reasons that explain why we need each of these tools. Maven—why do I need it? Maven is an Apache project that helps us to build, maintain, and manage our Java Application projects. One of the main ideas behind Maven is to solve all the dependency problems between our applications and all the framework libraries that we use. If you read the What is Maven? page (http://maven.apache.org/what-is-maven.html), you will find the key point behind this project. The important things that we will use here and in your diary work will be: A standard structure for all your projects Centralized project and dependencies description Standard structure for all your projects Maven proposes a set of standard structures to build our Java projects. The project descriptor that we need to write/create depends on the Java project type that we want to build. The main idea behind it is to minimize the configuration files to build our applications. A standard is proposed to build each type of application. You can see all the suggested standard structure on the official Maven page: http://maven.apache.org/guides/introduction/introduction-to-thestandard-directory-layout.html. Centralized project and dependencies description When we are using Maven, our way of building applications and managing the dependencies needed by these applications changes a lot. In Maven, the concept of Project Object Model (POM) is introduced. This POM will define our project structure, dependencies, and outcome(s) in XML syntax. This means that we will have just one file where we will define the type of project we are building, the first order dependencies that the project will have, and the kind of outcome(s) that we are expecting after we build our project. Take a look at the following pom.xml file: <?xml version="1.0" encoding="UTF-8"?><project xsi_schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.jbpm.examples</groupId> <artifactId>chapter02.homeworkSolution</artifactId> <packaging>jar</packaging> <version>1.0-SNAPSHOT</version> <name>chapter02.homeworkSolution</name> <url>http://maven.apache.org</url> <build> <plugins> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>2.0.2</version> <configuration> <source>1.5</source> <target>1.5</target> </configuration> </plugin> </plugins> </build><dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency></dependencies></project> We are basically defining all the mentioned characteristics of our project. All this information is deduced from the packaging attribute, which in this case is: <packaging>jar</packaging> The standard structure of directories will be used in order to know where the source code is located and where the compiled outcome will be placed. Maven installation Getting maven installed is a very simple task. You should download the Maven binaries from the official page(http://maven.apache.org). This will be a .zip file, or a .tar.gz file, which you will only need to uncompress in the programs directory. You will also add the bin directory to the system Path variable. With that, you will be able to call the mvn command from the console. To test whether Maven is working properly, you can open the Windows console and type mvn. You should get something like this: This output shows us that Maven is correctly installed. However, as it is installed in C:Documents and Settingssalaboy21 (the installation directory) where there is no project descriptor, the build failed. I strongly recommend that you read and understand the Getting Started section in the official Maven documentation at http://maven.apache.org/guides/getting-started/index.html.
Read more
  • 0
  • 0
  • 2557
article-image-overview-tomcat-6-servlet-container-part-1
Packt
18 Jan 2010
11 min read
Save for later

An Overview of Tomcat 6 Servlet Container: Part 1

Packt
18 Jan 2010
11 min read
In practice, it is highly unlikely that you will interface an EJB container from WebSphere and a JMS implementation from WebLogic, with the Tomcat servlet container from the Apache foundation, but it is at least theoretically possible. Note that the term 'interface', as it is used here, also encompasses abstract classes. The specification's API might provide a template implementation whose operations are defined in terms of some basic set of primitives that are kept abstract for the service provider to implement. A service provider is required to make available concrete implementations of these interfaces and abstract classes. For example, the HttpSession interface is implemented by Tomcat in the form of org.apache.catalina.session.StandardSession. Let's examine the image of the Tomcat container: The objective of this article is to cover the primary request processing components that are present in this image. Advanced topics, such as clustering and security, are shown as shaded in this image and are not covered. In this image, the '+' symbol after the Service, Host, Context, and Wrapper instances indicate that there can be one or more of these elements. For instance, a Service may have a single Engine, but an Engine can contain one or more Hosts. In addition, the whirling circle represents a pool of request processor threads. Here, we will fly over the architecture of Tomcat from a 10,000-foot perspective taking in the sights as we go. Component taxonomy Tomcat's architecture follows the construction of a Matrushka doll from Russia. In other words, it is all about containment where one entity contains another, and that entity in turn contains yet another. In Tomcat, a 'container' is a generic term that refers to any component that can contain another, such as a Server, Service, Engine, Host, or Context. Of these, the Server and Service components are special containers, designated as Top Level Elements as they represent aspects of the running Tomcat instance. All the other Tomcat components are subordinate to these top level elements. The Engine, Host, and Context components are officially termed Containers, and refer to components that process incoming requests and generate an appropriate outgoing response. Nested Components can be thought of as sub-elements that can be nested inside either Top Level Elements or other Containers to configure how they function. Examples of nested components include the Valve, which represents a reusable unit of work; the Pipeline, which represents a chain of Valves strung together; and a Realm which helps set up container-managed security for a particular container. Other nested components include the Loader which is used to enforce the specification's guidelines for servlet class loading; the Manager that supports session management for each web application; the Resources component that represents the web application's static resources and a mechanism to access these resources; and the Listener that allows you to insert custom processing at important points in a container's life cycle, such as when a component is being started or stopped. Not all nested components can be nested within every container. A final major component, which falls into its own category, is the Connector. It represents the connection end point that an external client (such as a web browser) can use to connect to the Tomcat container. Before we go on to examine these components, let's take a quick look at how they are organized structurally. Note that this diagram only shows the key properties of each container. When Tomcat is started, the Java Virtual Machine (JVM) instance in which it runs will contain a singleton Server top level element, which represents the entire Tomcat server. A Server will usually contain just one Service object, which is a structural element that combines one or more Connectors (for example, an HTTP and an HTTPS connector) that funnel incoming requests through to a single Catalina servlet Engine. The Engine represents the core request processing code within Tomcat and supports the definition of multiple Virtual Hosts within it. A virtual host allows a single running Tomcat engine to make it seem to the outside world that there are multiple separate domains (for example, www.my-site.com and www.your-site.com) being hosted on a single machine. Each virtual host can, in turn, support multiple web applications known as Contexts that are deployed to it. A context is represented using the web application format specified by the servlet specification, either as a single compressed WAR (Web Application Archive) file or as an uncompressed directory. In addition, a context is configured using a web.xml file, as defined by the servlet specification. A context can, in turn, contain multiple servlets that are deployed into it, each of which is wrapped in a Wrapper component. The Server, Service, Connector, Engine, Host, and Context elements that will be present in a particular running Tomcat instance are configured using the server.xml configuration file. Architectural benefits This architecture has a couple of useful features. It not only makes it easy to manage component life cycles (each component manages the life cycle notifications for its children), but also to dynamically assemble a running Tomcat server instance that is based on the information that has been read from configuration files at startup. In particular, the server.xml file is parsed at startup, and its contents are used to instantiate and configure the defined elements, which are then assembled into a running Tomcat instance. The server.xml file is read only once, and edits to it will not be picked up until Tomcat is restarted. This architecture also eases the configuration burden by allowing child containers to inherit the configuration of their parent containers. For instance, a Realm defines a data store that can be used for authentication and authorization of users who are attempting to access protected resources within a web application. For ease of configuration, a realm that is defined for an engine applies to all its children hosts and contexts. At the same time, a particular child, such as a given context, may override its inherited realm by specifying its own realm to be used in place of its parent's realm. Top Level Components The Server and Service container components exist largely as structural conveniences. A Server represents the running instance of Tomcat and contains one or more Service children, each of which represents a collection of request processing components. Server A Server represents the entire Tomcat instance and is a singleton within a Java Virtual Machine, and is responsible for managing the life cycle of its contained services. The following image depicts the key aspects of the Server component. As shown, a Server instance is configured using the server.xml configuration file. The root element of this file is <Server> and represents the Tomcat instance. Its default implementation is provided using org.apache.catalina.core.StandardServer, but you can specify your own custom implementation through the className attribute of the <Server> element. A key aspect of the Server is that it opens a server socket on port 8005 (the default) to listen a shutdown command (by default, this command is the text string SHUTDOWN). When this shutdown command is received, the server gracefully shuts itself down. For security reasons, the connection requesting the shutdown must be initiated from the same machine that is running this instance of Tomcat. A Server also provides an implementation of the Java Naming and Directory Interface (JNDI) service, allowing you to register arbitrary objects (such as data sources) or environment variables, by name. At runtime, individual components (such as servlets) can retrieve this information by looking up the desired object name in the server's JNDI bindings. While a JNDI implementation is not integral to the functioning of a servlet container, it is part of the Java EE specification and is a service that servlets have a right to expect from their application servers or servlet containers. Implementing this service makes for easy portability of web applications across containers. While there is always just one server instance within a JVM, it is entirely possible to have multiple server instances running on a single physical machine, each encased in its own JVM. Doing so insulates web applications that are running on one VM from errors in applications that are running on others, and simplifies maintenance by allowing a JVM to be restarted independently of the others. This is one of the mechanisms used in a shared hosting environment (the other is virtual hosting, which we will see shortly) where you need isolation from other web applications that are running on the same physical server. Service While the Server represents the Tomcat instance itself, a Service represents the set of request processing components within Tomcat. A Server can contain more than one Service, where each service associates a group of Connector components with a single Engine. Requests from clients are received on a connector, which in turn funnels them through into the engine, which is the key request processing component within Tomcat. The image shows connectors for HTTP, HTTPS, and the Apache JServ Protocol (AJP). There is very little reason to modify this element, and the default Service instance is usually sufficient. A hint as to when you might need more than one Service instance can be found in the above image. As shown, a service aggregates connectors, each of which monitors a given IP address and port, and responds in a given protocol. An example use case for having multiple services, therefore, is when you want to partition your services (and their contained engines, hosts, and web applications) by IP address and/or port number. For instance, you might configure your firewall to expose the connectors for one service to an external audience, while restricting your other service to hosting intranet applications that are visible only to internal users. This would ensure that an external user could never access your Intranet application, as that access would be blocked by the firewall. The Service, therefore, is nothing more than a grouping construct. It does not currently add any other value to the proceedings. Connectors A Connector is a service endpoint on which a client connects to the Tomcat container. It serves to insulate the engine from the various communication protocols that are used by clients, such as HTTP, HTTPS, or the Apache JServ Protocol (AJP). Tomcat can be configured to work in two modes—Standalone or in Conjunction with a separate web server. In standalone mode, Tomcat is configured with HTTP and HTTPS connectors, which make it act like a full-fledged web server by serving up static content when requested, as well as by delegating to the Catalina engine for dynamic content. Out of the box, Tomcat provides three possible implementations of the HTTP/1.1 and HTTPS connectors for this mode of operation. The most common are the standard connectors, known as Coyote which are implemented using standard Java I/O mechanisms. You may also make use of a couple of newer implementations, one which uses the non-blocking NIO features of Java 1.4, and the other which takes advantage of native code that is optimized for a particular operating system through the Apache Portable Runtime (APR). Note that both the Connector and the Engine run in the same JVM. In fact, they run within the same Server instance. In conjunction mode, Tomcat plays a supporting role to a web server, such as Apache httpd or Microsoft's IIS. The client here is the web server, communicating with Tomcat either through an Apache module or an ISAPI DLL. When this module determines that a request must be routed to Tomcat for processing, it will communicate this request to Tomcat using AJP, a binary protocol that is designed to be more efficient than the text based HTTP when communicating between a web server and Tomcat. On the Tomcat side, an AJP connector accepts this communication and translates it into a form that the Catalina engine can process. In this mode, Tomcat is running in its own JVM as a separate process from the web server. In either mode, the primary attributes of a Connector are the IP address and port on which it will listen for incoming requests, and the protocol that it supports. Another key attribute is the maximum number of request processing threads that can be created to concurrently handle incoming requests. Once all these threads are busy, any incoming request will be ignored until a thread becomes available. By default, a connector listens on all the IP addresses for the given physical machine (its address attribute defaults to 0.0.0.0). However, a connector can be configured to listen on just one of the IP addresses for a machine. This will constrain it to accept connections from only that specified IP address. Any request that is received by any one of a service's connectors is passed on to the service's single engine. This engine, known as Catalina, is responsible for the processing of the request, and the generation of the response. The engine returns the response to the connector, which then transmits it back to the client using the appropriate communication protocol.
Read more
  • 0
  • 0
  • 7723

article-image-overview-tomcat-6-servlet-container-part-2
Packt
18 Jan 2010
8 min read
Save for later

An Overview of Tomcat 6 Servlet Container: Part 2

Packt
18 Jan 2010
8 min read
Nested components These components are specific to the Tomcat implementation, and their primary purpose is to enable the various Tomcat containers to perform their tasks. Valve A valve is a processing element that can be placed within the processing path of each of Tomcat's containers—engine, host, context, or a servlet wrapper. A Valve is added to a container using the <Valve> element in server.xml. They are executed in the order in which they are encountered within the server.xml file. The Tomcat distribution comes with a number of pre-rolled valves. These include: A valve that logs specific elements of a request (such as the remote client's IP address) to a log file or database A valve that lets you control access to a particular web application based on the remote client's IP address or host name A valve that lets you log every request and response header A valve that lets you configure single sign-on access across multiple web applications on a specific virtual host If these don't meet your needs, you can write your own implementations of org.apache.catalina.Valve and place them into service. A container does not hold references to individual valves. Instead, it holds a reference to a single entity known as the Pipeline, which represents a chain of valves associated with that container. When a container is invoked to process a request, it delegates the processing to its associated pipeline. The valves in a pipeline are arranged as a sequence, based on how they are defined within the server.xml file. The final valve in this sequence is known as the pipeline's basic valve. This valve performs the task that embodies the core purpose of a given container. Unlike individual valves, the pipeline is not an explicit element in server.xml, but instead is implicitly defined in terms of the sequence of valves that are associated with a given container. Each Valve is aware of the next valve in the pipeline. After it performs its pre processing, it invokes the next Valve in the chain, and when the call returns, it performs its own post processing before returning. This is very similar to what happens in filter chains within the servlet specification. In this image, the engine's configured valve(s) fire when an incoming request is received. An engine's basic valve determines the destination host and delegates processing to that host. The destination host's (www.host1.com) valves now fire in sequence. The host's basic valve then determines the destination context (here, Context1) and delegates processing to it. The valves configured for Context1 now fire and processing is then delegated by the context's basic valve to the appropriate wrapper, whose basic valve hands off processing to its wrapped servlet. The response then returns over the same path in reverse. A Valve becomes part of the Tomcat server's implementation and provides a way for developers to inject custom code into the servlet container's processing of a request. As a result, the class files for custom valves must be deployed to CATALINA_HOME/lib, rather than to the WEB-INF/classes of a deployed application. As they are not part of the servlet specification, valves are non-portable elements of your enterprise application. Therefore, if you rely on a particular valve, you will need to find equivalent alternatives in a different application server. It is important to note that valves are required to be very efficient in order not to introduce inordinate delays into the processing of a request. Realm Container managed security works by having the container handle the authentication and authorization aspects of an application. Authentication is defined as the task of ensuring that the user is who she says she is, and authorization is the task of determining whether the user may perform some specific action within an application. The advantage of container managed security is that security can be configured declaratively by the application's deployer. That is, the assignment of passwords to users and the mapping of users to roles can all be done through configuration, which can then be applied across multiple web applications without any coding changes being required to those web applications. Application Managed Security The alternative is having the application manage security. In this case, your web application code is the sole arbiter of whether a user may access some specific functionality or resource within your application. For Container managed security to work, you need to assemble the following components: Security constraints: Within your web application's deployment descriptor, web.xml, you must identify the URL patterns for restricted resources, as well as the user roles that would be permitted to access these resources. Credential input mechanism: In the web.xml deployment descriptor, you specify how the container should prompt the user for authentication credentials. This is usually accomplished by showing the user a dialog that prompts the user for a user name and password, but can also be configured to use other mechanisms such as a custom login form. Realm: This is a data store that holds user names, passwords, and roles, against which the user-supplied credentials are checked. It can be a simple XML file, a table in a relational database that is accessed using the JDBC API, or a Lightweight Directory Access Protocol (LDAP) server that can be accessed through the JNDI API. A realm provides Tomcat with a consistent mechanism of accessing these disparate data sources. All three of the above components are technically independent of each other. The power of container based security is that you can assemble your own security solution by mixing and matching selections from each of these groups. Now, when a user requests a resource, Tomcat will check to see whether a security constraint exists for this resource. For a restricted resource, Tomcat will then automatically request the user for her credentials and will then check these credentials against the configured realm. Access to the resource will be allowed only if the user's credentials are valid and if the user is a member of the role that is configured to access that resource. Executor This is a new element, available only since 6.0.11. It allows you to configure a shared thread pool that is available to all your connectors. This places an upper limit on the number of concurrent threads that may be started by your connectors. Note that this limit applies even if a particular connector has not used up all the threads configured for it. Listener Every major Tomcat component implements the org.apache.catalina.Lifecycle interface. This interface lets interested listeners to register with a component, to be notified of lifecycle events, such as the starting or stopping of that component. A listener implements the org.apache.catalina.LifecycleListener interface and implements its lifecycleEvent() method, which takes a LifecycleEvent that represents the event that has occurred. This gives you an opportunity to inject your own custom processing into Tomcat's lifecycle. Manager Sessions allows 'applications' to be made possible over the stateless HTTP protocol. A session represents a conversation between a client and a server and is implemented by a javax.servlet.http.HttpSession instance that is stored on the server and is associated with a unique identifier that is passed back by the client on each interaction. A new session is created on request and remains alive on the server either until it times out after a period of inactivity by its associated client, or until it is explicitly invalidated, for instance, by the client choosing to log out. The above image shows a very simplistic view of the session mechanism within Tomcat. An org.apache.catalina.Manager component is used by the Catalina engine to create, find, or invalidate sessions. This component is responsible for the sessions that are created for a context and their life cycles. The default Manager implementation simply retains sessions in memory, but supports session survival across server restarts. It writes out all active sessions to disk when the server is stopped and will reload them into memory when the server is started up again. A <Manager> must be a child of a <Context> element and is responsible for managing the sessions associated with that web application context. The default Manager takes attributes such as the algorithm that is used to generate its session identifiers, the frequency in seconds with which the manager should check for expired sessions, the maximum number of active sessions supported, and the file in which the sessions should be stored. Other implementations of Manager are provided that let you persist sessions to a durable data store such as a file or a JDBC database.
Read more
  • 0
  • 0
  • 2727

article-image-setting-tools-build-applications-using-jbpm-part-2
Packt
18 Jan 2010
13 min read
Save for later

Setting Up Tools to Build Applications Using jBPM: Part 2

Packt
18 Jan 2010
13 min read
jBPM structure It is an important task to understand the jBPM framework structure. We will find out how the framework sources are divided. Also, this section is very important for those programmers who want to be active community developers, fixing issues and adding new functionalities. As we have already discussed, jBPM was built and managed with Maven. For this reason, we will find a file called pom.xml inside our working copy of the official JBoss SVN repository that represents the project as a whole. If we run Maven goals to this project, all the framework will be built. As we have seen in the previous section, all the project modules were built. Look at the previous screenshot that informs us that, by default, the modules Core, Identity, Enterprise, Examples, and Simulation are built when we run the clean install goals to the main project. With the install goal too, the generated jar files are copied to our local Maven repository, so we can use it in our applications by only referencing the local Maven repository. So, the idea here is to see in detail what exactly these modules include. If you open the modules directory that is located inside your working copy, you will see the following sub-directories: In the next few sections, we will talk about the most important modules that developers need to know in order to feel comfortable with the framework. Take this as a quick, deep introduction to becoming a JBoss jBPM community member. Core module The most important module of the jBPM framework is the core module. This module contains all the framework functionality. Here we will find the base classes that we will use in our applications. If you open this directory, you will find the pom.xml file that describes this project. The important thing to notice from this file is that it gives us the Maven ArtifactID name and the GroupID. We will use this information to build our applications, because in our applications, we will need to specify the jBPM dependency in order to use the framework classes. The following image will show us only the first section of the pom.xml file located inside the modules/core/directory. This file will describe the project name, the group id that it belongs to, and also the relationship with its parent (the main project). If you open this file, you will notice that all the dependencies that this project (jar archive) needs, in order to be built, will be described in the next section. This is also interesting when you want to know exactly which libraries the framework will need to have in the classpath in order to run. You need to remember that Maven will take care of all the transitory dependencies, meaning that in this project file, only the first order dependencies will be described. So, for example, in the dependencies section of the pom.xml file, we will see the Hibernate dependency, but you won't see all the artifacts needed to build and run Hibernate—Maven will take care of all these second order dependencies. If we build only the Core module project by running the clean install goal (mvn clean install -Dmaven.test.skip), we will get three new JAR archives in the target directory. These archives will be: jbpm-jpdl-3.2.6.SP1.jar : The core functionality of the framework—you will need this JAR in all your applications that use jBPM directly. Remember, if you are using Maven, you will need to add this artifact dependency to your project and not this archive. jbpm-jpdl-3.2.6.SP1-config.jar : Some XML configurations that the framework needs. This confi guration will be used if you need your process to persist in some relational database. jbpm-jpdl-3.2.6.SP1-sources.jar : This JAR will contain all the sources that were used to build the main jar file. This can be helpful to debug our application and see how the core classes interact with each other when our processes are in the execution stage. You will also find a few directories that were used as temporary directories to build these three JAR files. DB module This module is in charge of building the different database schemes to run jBPM needed by the different vendors that support Hibernate. If you build this module in the target directory of the project (generated with the clean install of maven goals). Distribution module This is only a module created with specific goals to build and create the binary installer, which can be downloaded from jBPM's official page. If you want to get a modified installer of this framework, you will need to build this module. But it is rarely used by development teams. Enterprise module This module will contain extra features for high-availability environments, including a command service to interact with the framework's APIs, an enterprise messages solution for asynchronous execution, and enterprise-ready timers. If we build this module, we will get three JAR fies. The main one will be jbpm-enterprise-3.2.6.SP1.jar, the source code and the configuration files that these classes will need. Example module This is a very interesting module, because it contains some basic examples about how the framework could be used. If you open this module, you will find different packages with JUnit tests that show us how to use the framework APIs to achieve some common situations. These tests are used only for a learning purpose and try to introduce the most common classes that all developers will use. Feel free to play with these tests, modify them, and try to understand what is going on there. Identity module This module contains a proof of concept model to use out of the box when we start creating applications that handle human interactions. The basic idea here is to have a simple model to represent how the process users are structured in our company. As you can imagine, depending on the company structure, we need to be able to adapt this model to our business needs. This is just a basic implementation that you will probably replace for your own customized implementation. Simulation module This module includes some use cases for simulating our process executions. The idea here is to know how to obtain reports that help us to improve our process executions, measuring times, and costs for each execution. User Guide module This module will let you build the official documentation from scratch. It is not built when the main project is built, just to save us time. You can build all the documentation in three formats—HTML file separated, one single and long HTML file, or PDF. Knowing this structure will help us to decide where to make changes and where to look for a specific functionality inside the framework sources. Try to go deep inside the src directory for each project to see how the sources are distributed for each project in more detail. Building real world applications In this section, we are going to build two example applications—both similar in content and functionalities, but built with different methodologies. The first one will be created using the Eclipse Plugin provided by the jBPM framework. This approach will give us a quick structure that lets us create our first application using jBPM. On the other hand, in the second application that we will build, we will use Maven to describe and manage our project, simulating a more realistic situation where complex applications could be built by mixing different frameworks. Eclipse Plugin Project/GPD Introduction In this section, we will build an example application that uses jBPM using the Eclipse plugin, which provides us with the jBPM framework. The idea here is to look at how these kinds of projects are created and what the structure proposed by the plugin. The outcome of this section will be a Process Archive (PAR) file generated by the GPD plugin, which contains a process definition and all the technical details needed to run in an execution environment. To achieve this, I have set up my workspace in the directory projects inside the software directory. And by having the jBPM plugin installed, you will be able to create new jBPM projects. You can do this by going to File | New | Other and choosing the New type of project called Process Project. Then you must click on the Next button to assign a new name to this project. I chose FirstProcess for the project name (I know, a very original one!) and click on Next again. The first time that you create some of these projects, Eclipse will ask you to choose the jBPM Runtime that you want. This means that you can have different runtimes (different versions of jBPM to use with this plugin) installed. To configure the correct runtime, you will need to locate the directory that the installer creates—it's called jbpm-3.2.6.SP1—then assign a name to this runtime. A common practice here is to put the name with the correct version, this will help us to identify the runtime with which we are configuring our process projects. Then you should click on the Finish button at the bottom of the window. This will generate our first process project called FirstProcess. If you have problems creating a new jBPM project, this can be noticed because you'll have a red cross placed in your project name in the Project Explorer window. You could see the current problems in the Problems window (Windows | Show View | Problems). If the problem is that a JAR file called activation.jar is missing, you should do a workaround to fix this situation. To fix this, you should go to your jBPM installation directory—in this case, software/programs/jbpm-3.2.6.SP1 on my desktop, and then go to src/resources/gpd and open a file called version.info.xml and remove the line that makes the reference to the file called activation.jar. Then you should restart the IDE and the problem will disappear. If you create the process project and the sample process definition is not created (under src/main/jpdl), you could use the project created inside this article's code directory called FirstProcess. GPD Project structure Once you have created the project, we could take a look at the current structure proposed by the plugin. This image show us the structure proposed by the GPD plugin. Four source directories will be used to contain different types of resources that our project will use the first one src/main/java will contain all the Java sources that our process uses in order to execute custom Java logic. Here we will put all the classes that will be used to achieve custom behaviors at runtime. When you create a process project, a sample process and some classes are generated. If you take a look inside this directory, you will find a class called MessageActionHandler.java. This class represents a technical detail that the process definition will use in order to execute custom code when the process is being executed. The src/main/config directory will contain all the resources that will be needed to configure the framework. In the src/main/jpdl directory, you will find all the defined processes. When you create a sample process with your project, a process called simple is created. And in src/test/ java, you will find all the tests created to ensure that our processes behave in the right way when they get executed. When the sample process is created, a test for this process is also created. It will give us a quick preview of the APIs that we will use to run our processes. For the sample process, a test called SimpleProcessTest is created. This test creates a process execution and runs it to test whether the process will behave in the way in which it is supposed to work. Be careful if you modify the process diagram, because this test will fail. Feel free to play with the diagram and with this test to see what happens. Here we will see a quick introduction about what this test does. SimpleProcessTest This test is automatically created when you create a jBPM process project with a sample process. If you open this class located in the src/test/java directory of the project, you will notice that the behavior of the test is described with comments in the code. Here we will try to see step by step what the test performs and how the test uses the jBPM APIs to interact with the process defined using the Graphical Process Editor. This test class, like every JUnit tests class will extend the class TestCase (for JUNIT 3.x). It then defines each test inside methods that start with the prefix test*. In this case, the test is called testSimpleProcess(). Feel free to add your own test in other methods that use the prefix test* in the name of the method. If we see the testSimpleProcess() method, we will see that the first line of code will create an object called processDefinition of the ProcessDefinition type using the processdefinition.xml file. ProcessDefinition processDefinition = ProcessDefinition.parseXmlResource("simple/processdefinition.xml"); At this point, we will have our process definition represented as an object. In other words, the same structure that was represented in the XML file, is now represented in the Java Object. Using the APIs provided by JUnit, we will check that the ProcessDefinition object is correctly created. assertNotNull("Definition should not be null", processDefinition); Then we need to create a process execution that will run based on the process definition object. In the jBPM language, this concept of execution is represented with the word instance. So, we must create a new ProcessInstance object that will represent one execution of our defined process. ProcessInstance instance = new ProcessInstance(processDefinition); Then the only thing we need to do is interact with the process and tell the process instance to jump from one node to the next using the concept of a signal, which represents an external event. It tells the process that it needs to continue the execution to the next node. instance.signal(); If you take a look at all the assert methods used in the code, they only confirm that the process is in the node in which it is supposed to be. Another thing that this test checks is that the Actions attached to the process change the value of a process variable. Try to figure out what is happening with that variable and where the process definition changes this variable's value. The following assert can give you a small clue about it: assertEquals("Message variable contains message", instance.getContextInstance(). getVariable("message"), "Going to the first state!"); To run this test, you just need to right click on the source of this class and go to Run As, and then choose JUnit Test. You should check whether the test succeeded in the JUnit panel (a green light will be shown if all goes well). Graphical Process Editor In this section, we will analyze the most used GPD windows, giving a brief introduction to all the functionality that this plugin provides us. We already see the project structure and the wizard to create new jBPM projects. The most frequently used window that GPD proposes to us is the Graphical Process Editor, which lets us draw our processes in a very intuitive way. This editor contains four tabs that gives us different functionalities for different users/roles.
Read more
  • 0
  • 0
  • 1771
article-image-jbpm-developers-part-2
Packt
07 Jan 2010
11 min read
Save for later

jBPM for Developers: Part 2

Packt
07 Jan 2010
11 min read
Process execution At this point, where our definitions are ready, we can create an execution of our defined processes. This can be achieved by creating a class where each instance represents one execution of our process definition—bringing our processes to life and guiding the company with their daily activities; letting us see how our processes are moving from one node to the next one. With this concept of execution, we will gain the power of interaction and influence the process execution by using the methods proposed by this class. We are going to add all of the methods that we need to represent the executional stage of the process, adding all the data and behavior needed to execute our process definitions. This process execution will only have a pointer to the current node in the process execution. This will let us query the process status when we want. An important question about this comes to our minds: why do we need to interact with our processes? Why doesn't the process flow until the end when we start it? And the answer to these important questions is: it depends. The important thing here is to notice that there will be two main types of nodes: One that runs without external interaction (we can say that is an automatic node). These type of nodes will represent automatic procedures that will run without external interactions. The second type of node is commonly named wait state or event wait. The activity that they represent needs to wait for a human or a system interaction to complete it. This means that the system or the human needs to create/fire an event when the activity is finished, in order to inform the process that it can continue to the next node. Wait states versus automatic nodes The difference between them is basically the activity nature. We need to recognize this nature in order to model our processes in the right way. As we have seen before, a "wait state" or an "event wait" situation could occur when we need to wait for some event to take place from the point of view of the process. These events are classified into two wide groups—Asynchronous System Interactions and Human tasks. Asynchronous System Interactions This means the situation when the process needs to interact with some other system, but the operation will be executed in some asynchronous way. For non-advanced developers, the word "asynchronous" could sound ambiguous or without meaning. In this context, we can say that an asynchronous execution will take place when two systems communicate with each other without blocking calls. This is not the common way of execution in our Java applications. When we call a method in Java, the current thread of execution will be blocked while the method code is executed inside the same thread. See the following example: The doBackup() method will block until the backup is finished. When this happens, the call stack will continue with the next line in the main class. This blocking call is commonly named as a synchronous call. On the other hand, we got the non-blocking calls, where the method is called but we (the application) are not going to wait for the execution to finish, the execution will continue to the next line in the main class without waiting. In order to achieve this behavior, we need to use another mechanism. One of the most common mechanisms used for this are messages. Let's see this concept in the following image: In this case, by using messages for asynchronous executions, the doBackup() method will be transformed into a message that will be taken by another thread (probably an external system) in charge of the real execution of the doBackup() code. The main class here will continue with the next line in the code. It's important for you to notice that the main thread can end before the external system finishes doing the backup. That's the expected behavior, because we are delegating the responsibility to execute the backup code in the external system. But wait a minute, how do we know if the doBackup() method execution finished successfully? In such cases, the main thread or any other thread should query the status of the backup to know whether it is ready or not. Human tasks Human tasks are also asynchronous, we can see exactly the same behavior that we saw before. However, in this case, the executing thread will be a human being and the message will be represented as a task in the person's task list. As we can see in this image, a task is created when the Main thread's execution reaches the doBackup() method. This task goes directly to the corresponding user in the task list. When the user has time or is able to do that task, he/she completes it. In this case, the "Do Backup" activity is a manual task that needs to be performed by a human being. In both the situations, we have the same asynchronous behavior, but the parties that interact change and this causes the need for different solutions. For system-to-system interaction, probably, we need to focus on the protocols that the systems use for communication. In human tasks, on the other hand, the main concern will probably be the user interface that handles the human interaction. How do we know if a node is a wait state node or an automatic node?First of all, by the name. If the node represents an activity that is done by humans, it will always wait. In system interactions, it is a little more difficult to deduce this by the name (but, if we see an automatic activity that we know takes a lot of time, that will probably be an asynchronous activity which will behave as a wait state). A common example could be a backup to tape, where the backup action is scheduled in an external system. If we are not sure about the activity nature we need to ask about the activity nature to our stakeholder. We need to understand these two behaviors in order to know how to implement each node's executional behavior, which will be related with the specific node functionality. Creating the execution concept in Java With this class, we will represent each execution of our process, which means that we could have a lot of instances at the same time running with the same definition. Inside the package called org.jbpm.examples.chapter02.simpleGOP.execution (provided at www.packtpub.com/files/code/5685_Code.zip), we will find the following class: public class Execution { private Definition definition; private Node currentNode; public Execution(Definition definition) { this.definition = definition; //Setting the first Node as the current Node this.currentNode = definition.getNodes().get(0); } public void start(){ // Here we start the flow leaving the currentNode. currentNode.leave(this); }... (Getters and Setters methods)} As we can see, this class contains a Definition and a Node, the idea here is to have a currentNode that represents the node inside the definition to which this execution is currently "pointing". We can say that the currentNode is a pointer to the current node inside a specific definition. The real magic occurs inside each node. Now each node has the responsibility of deciding whether it must continue the execution to the next node or not. In order to achieve this, we need to add some methods (enter(), execute(), leave()) that will define the internal executional behavior for each node. We do this in the Node class to be sure that all the subclasses of the Node class will inherit the generic way of execution. Of course, we can change this behavior by overwriting the enter(), execute(), and leave() methods. We can define the Node.java class (which is also found in the chapter02.simpleGOPExecution project in the code bundle) as follows: ...public void enter(Execution execution){ execution.setCurrentNode(this); System.out.println("Entering "+this.getName()); execute(execution);}public void execute(Execution execution){ System.out.println("Executing "+this.getName()); if(actions.size() > 0){ Collection<Action> actionsToExecute = actions.values(); Iterator<Action> it = actionsToExecute.iterator(); while(it.hasNext()){ it.next().execute(); } leave(execution); }else{ leave(execution); }}public void leave(Execution execution){ System.out.println("Leaving "+this.getName()); Collection<Transition> transitions = getLeavingTransitions().values(); Iterator<Transition> it = transitions.iterator(); if(it.hasNext()){ it.next().take(execution); }}... As you can see in the Node class, which is the most basic and generic implementation, three methods are defined to specify the executional behavior of one of these nodes in our processes. If you carefully look at these three methods, you will notice that they are chained, meaning that the enter() method will be the first to be called. And at the end, it will call the execute() method, which will call the leave() method depending on the situation. The idea behind these chained methods is to demarcate different phases inside the execution of the node. All of the subclasses of the Node class will inherit these methods, and with that the executional behavior. Also, all the subclasses could add other phases to demarcate a more complex lifecycle inside each node's execution. The next image shows how these phases are executed inside each node. As you can see in the image, the three methods are executed when the execution points to a specific node. Also, it is important to note that transitions also have the Take phase, which will be executed to jump from one node to the next. All these phases inside the nodes and in the transition will let us hook custom blocks of code to be executed. One example for what we could use these hooks for is auditing processes. We could add in the enter() method, that is the first method called in each node, a call to an audit system that takes the current timestamp and measures the time that the node uses until it finishes the execution when the leave() method is called. Another important thing to notice in the Node class is the code inside the execute() method. A new concept appears. The Action interface that we see in that loop, represents a pluggable way to include custom specific logic inside a node without changing the node class. This allows us to extend the node functionality without modifying the business process graph. This means that we can add a huge amount of technical details without increasing the complexity of the graph. For example, imagine that in our business process each time we change node, we need to store the data collected from each node in a database. In most of the cases, this requirement is purely technical, and the business users don't need to know about that. With these actions, we achieve exactly the above. We only need to create a class with the custom logic that implements the Action interface and then adds it to the node in which we want to execute the custom logic. The best way to understand how the execution works is by playing with the code. In the chapter02.simpleGOPExecution maven project, we have another test that shows us the behavior of the execution class. This test is called TestExecution and contains two basic tests to show how the execution works. If you don't know how to use maven, there is a quick start guide at the end of this article. You will need to read it in order to compile and run these tests. public void testSimpleProcessExecution(){ Definition definition = new Definition("myFirstProcess"); System.out.println("########################################"); System.out.println(" Executing PROCESS: "+definition.getName()+" "); System.out.println("########################################"); Node firstNode = new Node("First Node"); Node secondNode = new Node("Second Node"); Node thirdNode = new Node("Third Node"); firstNode.addTransition("to second node", secondNode); secondNode.addTransition("to third node", thirdNode); //Add an action in the second node. CustomAction implements Action secondNode.addAction(new CustomAction("First")); definition.addNode(firstNode); definition.addNode(secondNode); definition.addNode(thirdNode); //We can graph it if we want. //definition.graph(); Execution execution = new Execution (definition); execution.start(); //The execution leave the third node assertEquals("Third Node", execution.getCurrentNode().getName());} If you run this first test, it creates a process definition as in the definition tests, and then using the definition, it creates a new execution. This execution lets us interact with the process. As this is a simple implementation, we only have the start() method that starts the execution of our process, executing the logic inside each node. In this case, each node is responsible for continuing the execution to the next node. This means that there are no wait state nodes inside the example process. In case we have a wait state, our process will stop the execution in the first wait state. So, we need to interact with the process again in order to continue the execution. Feel free to debug this test to see how this works. Analyze the code and follow the execution step by step. Try to add new actions to the nodes and analyze how all of the classes in the project behave. When you get the idea, the framework internals will be easy to digest.
Read more
  • 0
  • 0
  • 1050

article-image-jbpm-developers-part-1
Packt
07 Jan 2010
10 min read
Save for later

jBPM for Developers: Part 1

Packt
07 Jan 2010
10 min read
In this article, the following key points will be covered: Common development process Decoupling processes from our applications Graph Oriented Programming Modeling nodes Modeling transitions Expanding our language Implementing our graph-oriented solution in Java Wait states versus automatic nodes Executing our processes Let's get started with the main cornerstone behind the framework. This article will give us the way to represent our business processes using the Java language and all the points that you need to cover in order to be able to represent real situations. Graph Oriented Programming We will start talking about the two main concepts behind the framework's internal implementation. The Graph Oriented Programming (GOP) approach is used to gain some features that we will want when we need to represent business processes inside our applications. Basically, graph oriented programming gives us the following features: Easy and intuitive graphical representation Multi-language support These are concepts that jBPM implementers have in mind when they start with the first version of the framework. We are going to take a quick look at that and formulate some code in order to try to implement our minimal solution with these features in mind. Starting with GOP as a bigger concept, you will see that the official documentation of jBPM mentions this topic as one of the most important concepts behind the framework. Here, we will reveal all of the advantages that this approach implementation will give us. Basically, by knowing GOP, we will gain complete knowledge about how processes are represented and how they are executed. Therefore, a common question here is, why do we need a new approach (GOP) for programming our processes when we have already learnt about the object-oriented programming paradigm? Common development process In order to answer the previous question, we will quickly analyze the situation here. To achieve this, we need to understand the nature of our processes. We will also analyze what kind of advantages developers gain when the business process information is decoupled from the rest of the application code. Let's clarify this point with a simple example. Imagine that we have to build an entire application that represents the stages in the "Recycling Things Co." example previously presented. The most common approach for a three-tier application and development process will be the following: This is a traditional approach where all the stages are iteratively repeated for each stage/module of the application. One thing that we can notice here, and which happens in real software projects, is that the business analyst's description will be lost in the design phase, because the business analyst doesn't fully understand the design class diagrams as these diagrams are focused on implementation patterns and details. If we are lucky and have a very good team of business analysts, they will understand the diagrams. However, there is no way that they could understand the code. So, in the best case, the business analyst description is lost in the code—this means that we cannot show our clients how the stages of their processes are implemented in real working code. That is why business analysts and clients (stakeholders) are blind. They need to trust that we (the developers) know what we are doing and that we understand 100 percent of the requirements that the business analysts collect. Also, it is important to notice that in most of the cases, the client validates the business analyst's work—if changes emerge in the implementation phase, sometimes these changes are not reflected in the business analyst's text and the client/stakeholders never realize that some implementation aspect of their software changes. Maybe they are not functional changes, but there are sometimes changes that affect the behavior of the software or the way users will interact with it. This uncertainty generated in the stakeholder causes some dependency and odd situations where the stakeholder thinks that if he/she cannot count on us (the developers and architects team) any longer, nobody will be able to understand our code (the code that we write). With this new approach, the client/stakeholders will be able to perceive, in a transparent way, the code that we write to represent each business situation. This allows them (the stakeholders) to ask for changes that will be easily introduced to reflect everyday business requirements. Let's be practical and recognize that, in most situations, if we have the application implemented in a three-tier architecture, we will have the following artifacts developed: Database model That includes logic tables to do calculations, UI tables to store UI customizations or users' data about their custom interfaces, domain entities (tables that represent the business entities, for example, Invoice, Customer, and so on), and history logs all together. Business logic If we are careful developers, here we are going to have all of the code related to a logical business processes method. In the case of the example, here we will have all the stages represented in some kind of state machine in the best cases. If we don't have a kind of state machine, we will have a lot of if and switch statements distributed in our code that will represent each stage in the process. For example, if we have the same application for all the branches of a company, this application will need to behave differently for the main office's employee than for the 'just finished' warehouse employee. This is because the tasks that they are doing are very different in nature. Imagine what would happen if we want to add some activity in the middle, probably the world would collapse! Developers will need to know in some depth how all the if and switch statements are distributed in the code in order to add/insert the new activity. I don't want to be one of these developers. User interfaces Once again, if we are lucky developers, the process stages will not be represented here, but probably many if and switch statements will be dispersed in our UI code that will decide what screen is shown to each of the users in each activity inside the process. So, for each button and each form, we need to ask if we are in a specific stage in the process with a specific user. Decoupling processes from our applications By decoupling the business process from these models, we introduce an extra layer (tier) with some extra artifacts, but this helps us to keep the application simple. This new paradigm proposed here includes the two Business Process Management (BPM) roles in all the development and execution stages of our processes (business analysts and developers). This is mainly achieved through the use of a common language that both sides understand. It lets us represent the current process that the business analysts see in the everyday work inside the company, and all of the technical details that these processes need, in order to run in a production environment. As we can see in the next image, both roles interact in the creation of these new artifacts. We don't have to forget about the clients/managers/stakeholders that can validate the processes every day as they are running them. Also, they can ask for changes to improve the performance and the current way used to achieve the business goal of the process. On comparing with the OOP paradigm, class diagrams here are commonly used to represent static data, but no executional behavior. These newly created artifacts (process definitions) can be easily represented in order to be validated by our clients/stakeholders. One of the main advantages of this approach is that we can get visibility about how the processes are executed and which activity they are in at any given moment of time. This requirement will force us to have a simple way to represent our business processes—in a graphicable way. We need to be able to see, all the time, how our production processes are running. Graph Oriented Programming on top of OOP Here, we will discuss the main points of the Graph Oriented Programming paradigm. With this analysis, we will implement some basic approach to understand how we use this paradigm on top of the Java language in the next section. In order to do that, we need to know the most important requisites that we have to fulfill in order to achieve the goal which integrates, maintains, and executes our business processes in real-world implementation: Easy and intuitive graphical representation: To let the business analysts and developers communicate smoothly and to fully understand what is happening in the real process. Must give us the possibility of seeing the processes' executions in real time: In order to know how our processes are going on to make more accurate business decisions. Could be easily extended to provide extra functionality to fulfill all of the business situations. Could be easily modified and adapted to everyday business (reality) changes. No more huge development projects for small changes and no more migrations. Implementing Graph Oriented Programming on top of the Java language (finally Java code!) With these requisites, presented in the previous section, in mind, we are able to implement a simple solution on top of the Java language that implements this new approach (called the Graph Oriented Programming paradigm). As the name of the paradigm says, we are going to work with graphs—directed graphs to be more precise. A graph can be defined as a set of nodes linked to each other as the following image shows us: If we are talking about directed graphs, we need to know that our nodes will be linked using directed transitions. These transitions will be directed, because they will define a source node and a destination node. This means that if we have a transition that has node A as the source node, and node B as the destination node, that transition will not be the same as the one that has node B as the source node, and node A as the destination node. Take a look at the following image: Like in the object-oriented programming paradigm, we need to have a language with specific set of words (for example, object) here. We will need words to represent our graphs, as we can represent objects in the object-oriented paradigm. Here we will try to expand the official documentation proposed by the jBPM team and guide the learning process of this important topic. We will see code in this section and I will ask you to try it at home, debug it, and play with this code until you feel confident about what is the internal behavior of this example. Let's get started first with the graph definition and with some of the rules that the graph needs to implement, in order to represent our business processes correctly. Up until now, we have had two concepts that will appear in our graph oriented programming language—Node and Transition. These two concepts need to be implemented in two separate classes, but with a close relationship. Let's see a class diagram about these two classes and make a short analysis about the attributes and methods proposed in this example.
Read more
  • 0
  • 0
  • 1618