Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Programming

1081 Articles
article-image-exploring-compilers
Packt
23 Jun 2017
17 min read
Save for later

Exploring Compilers

Packt
23 Jun 2017
17 min read
In this article by Gabriele Lanaro, author of the book, Python High Performance - Second Edition, we will see that Python is a mature and widely used language and there is a large interest in improving its performance by compiling functions and methods directly to machine code rather than executing instructions in the interpreter. In this article, we will explore two projects--Numba and PyPy--that approach compilation in a slightly different way. Numba is a library designed to compile small functions on the fly. Instead of transforming Python code to C, Numba analyzes and compiles Python functions directly to machine code. PyPy is a replacement interpreter that works by analyzing the code at runtime and optimizing the slow loops automatically. (For more resources related to this topic, see here.) Numba Numba was started in 2012 by Travis Oliphant, the original author of NumPy, as a library for compiling individual Python functions at runtime using the Low-Level Virtual Machine  ( LLVM ) toolchain. LLVM is a set of tools designed to write compilers. LLVM is language agnostic and is used to write compilers for a wide range of languages (an important example is the clang compiler). One of the core aspects of LLVM is the intermediate representation (the LLVM IR), a very low-level platform-agnostic language similar to assembly, that can be compiled to machine code for the specific target platform. Numba works by inspecting Python functions and by compiling them, using LLVM, to the IR. As we have already seen in the last article, the speed gains can be obtained when we introduce types for variables and functions. Numba implements clever algorithms to guess the types (this is called type inference) and compiles type-aware versions of the functions for fast execution. Note that Numba was developed to improve the performance of numerical code. The development efforts often prioritize the optimization of applications that intensively use NumPy arrays. Numba is evolving really fast and can have substantial improvements between releases and, sometimes, backward incompatible changes.  To keep up, ensure that you refer to the release notes for each version. In the rest of this article, we will use Numba version 0.30.1; ensure that you install the correct version to avoid any error. The complete code examples in this article can be found in the Numba.ipynb notebook. First steps with Numba Getting started with Numba is fairly straightforward. As a first example, we will implement a function that calculates the sum of squares of an array. The function definition is as follows: def sum_sq(a): result = 0 N = len(a) for i in range(N): result += a[i] return result To set up this function with Numba, it is sufficient to apply the nb.jit decorator: from numba import nb @nb.jit def sum_sq(a): ... The nb.jit decorator won't do much when applied. However, when the function will be invoked for the first time, Numba will detect the type of the input argument, a , and compile a specialized, performant version of the original function. To measure the performance gain obtained by the Numba compiler, we can compare the timings of the original and the specialized functions. The original, undecorated function can be easily accessed through the py_func attribute. The timings for the two functions are as follows: import numpy as np x = np.random.rand(10000) # Original %timeit sum_sq.py_func(x) 100 loops, best of 3: 6.11 ms per loop # Numba %timeit sum_sq(x) 100000 loops, best of 3: 11.7 µs per loop You can see how the Numba version is order of magnitude faster than the Python version. We can also compare how this implementation stacks up against NumPy standard operators: %timeit (x**2).sum() 10000 loops, best of 3: 14.8 µs per loop In this case, the Numba compiled function is marginally faster than NumPy vectorized operations. The reason for the extra speed of the Numba version is likely that the NumPy version allocates an extra array before performing the sum in comparison with the in-place operations performed by our sum_sq function. As we didn't use array-specific methods in sum_sq, we can also try to apply the same function on a regular Python list of floating point numbers. Interestingly, Numba is able to obtain a substantial speed up even in this case, as compared to a list comprehension: x_list = x.tolist() %timeit sum_sq(x_list) 1000 loops, best of 3: 199 µs per loop %timeit sum([x**2 for x in x_list]) 1000 loops, best of 3: 1.28 ms per loop Considering that all we needed to do was apply a simple decorator to obtain an incredible speed up over different data types, it's no wonder that what Numba does looks like magic. In the following sections, we will dig deeper and understand how Numba works and evaluate the benefits and limitations of the Numba compiler. Type specializations As shown earlier, the nb.jit decorator works by compiling a specialized version of the function once it encounters a new argument type. To better understand how this works, we can inspect the decorated function in the sum_sq example. Numba exposes the specialized types using the signatures attribute. Right after the sum_sq definition, we can inspect the available specialization by accessing the sum_sq.signatures, as follows: sum_sq.signatures # Output: # [] If we call this function with a specific argument, for instance, an array of float64 numbers, we can see how Numba compiles a specialized version on the fly. If we also apply the function on an array of float32, we can see how a new entry is added to the sum_sq.signatures list: x = np.random.rand(1000).astype('float64') sum_sq(x) sum_sq.signatures # Result: # [(array(float64, 1d, C),)] x = np.random.rand(1000).astype('float32') sum_sq(x) sum_sq.signatures # Result: # [(array(float64, 1d, C),), (array(float32, 1d, C),)] It is possible to explicitly compile the function for certain types by passing a signature to the nb.jit function. An individual signature can be passed as a tuple that contains the type we would like to accept. Numba provides a great variety of types that can be found in the nb.types module, and they are also available in the top-level nb namespace. If we want to specify an array of a specific type, we can use the slicing operator, [:], on the type itself. In the following example, we demonstrate how to declare a function that takes an array of float64 as its only argument: @nb.jit((nb.float64[:],)) def sum_sq(a): Note that when we explicitly declare a signature, we are prevented from using other types, as demonstrated in the following example. If we try to pass an array, x, as float32, Numba will raise a TypeError: sum_sq(x.astype('float32')) # TypeError: No matching definition for argument type(s) array(float32, 1d, C) Another way to declare signatures is through type strings. For example, a function that takes a float64 as input and returns a float64 as output can be declared with the float64(float64) string. Array types can be declared using a [:] suffix. To put this together, we can declare a signature for our sum_sq function, as follows: @nb.jit("float64(float64[:])") def sum_sq(a): You can also pass multiple signatures by passing a list: @nb.jit(["float64(float64[:])", "float64(float32[:])"]) def sum_sq(a): Object mode versus native mode So far, we have shown how Numba behaves when handling a fairly simple function. In this case, Numba worked exceptionally well, and we obtained great performance on arrays and lists.The degree of optimization obtainable from Numba depends on how well Numba is able to infer the variable types and how well it can translate those standard Python operations to fast type-specific versions. If this happens, the interpreter is side-stepped and we can get performance gains similar to those of Cython. When Numba cannot infer variable types, it will still try and compile the code, reverting to the interpreter when the types can't be determined or when certain operations are unsupported. In Numba, this is called object mode and is in contrast to the intepreter-free scenario, called native mode. Numba provides a function, called inspect_types, that helps understand how effective the type inference was and which operations were optimized. As an example, we can take a look at the types inferred for our sum_sq function: sum_sq.inspect_types() When this function is called, Numba will print the type inferred for each specialized version of the function. The output consists of blocks that contain information about variables and types associated with them. For example, we can examine the N = len(a) line: # --- LINE 4 --- # a = arg(0, name=a) :: array(float64, 1d, A) # $0.1 = global(len: <built-in function len>) :: Function(<built-in function len>) # $0.3 = call $0.1(a) :: (array(float64, 1d, A),) -> int64 # N = $0.3 :: int64 N = len(a) For each line, Numba prints a thorough description of variables, functions, and intermediate results. In the preceding example, you can see (second line) that the argument a is correctly identified as an array of float64 numbers. At LINE 4, the input and return type of the len function is also correctly identified (and likely optimized) as taking an array of float64 numbers and returning an int64. If you scroll through the output, you can see how all the variables have a well-defined type. Therefore, we can be certain that Numba is able to compile the code quite efficiently. This form of compilation is called native mode. As a counter example, we can see what happens if we write a function with unsupported operations. For example, as of version 0.30.1, Numba has limited support for string operations. We can implement a function that concatenates a series of strings, and compiles it as follows: @nb.jit def concatenate(strings): result = '' for s in strings: result += s return result Now, we can invoke this function with a list of strings and inspect the types: concatenate(['hello', 'world']) concatenate.signatures # Output: [(reflected list(str),)] concatenate.inspect_types() Numba will return the output of the function for the reflected list (str) type. We can, for instance, examine how line 3 gets inferred. The output of concatenate.inspect_types() is reproduced here: # --- LINE 3 --- # strings = arg(0, name=strings) :: pyobject # $const0.1 = const(str, ) :: pyobject # result = $const0.1 :: pyobject # jump 6 # label 6 result = '' You can see how this time, each variable or function is of the generic pyobject type rather than a specific one. This means that, in this case, Numba is unable to compile this operation without the help of the Python interpreter. Most importantly, if we time the original and compiled function, we note that the compiled function is about three times slower than the pure Python counterpart: x = ['hello'] * 1000 %timeit concatenate.py_func(x) 10000 loops, best of 3: 111 µs per loop %timeit concatenate(x) 1000 loops, best of 3: 317 µs per loop This is because the Numba compiler is not able to optimize the code and adds some extra overhead to the function call.As you may have noted, Numba compiled the code without complaints even if it is inefficient. The main reason for this is that Numba can still compile other sections of the code in an efficient manner while falling back to the Python interpreter for other parts of the code. This compilation strategy is called object mode. It is possible to force the use of native mode by passing the nopython=True option to the nb.jit decorator. If, for example, we apply this decorator to our concatenate function, we observe that Numba throws an error on first invocation: @nb.jit(nopython=True) def concatenate(strings): result = '' for s in strings: result += s return result concatenate(x) # Exception: # TypingError: Failed at nopython (nopython frontend) This feature is quite useful for debugging and ensuring that all the code is fast and correctly typed. Numba and NumPy Numba was originally developed to easily increase performance of code that uses NumPy arrays. Currently, many NumPy features are implemented efficiently by the compiler. Universal functions with Numba Universal functions are special functions defined in NumPy that are able to operate on arrays of different sizes and shapes according to the broadcasting rules. One of the best features of Numba is the implementation of fast ufuncs. We have already seen some ufunc examples in article 3, Fast Array Operations with NumPy and Pandas. For instance, the np.log function is a ufunc because it can accept scalars and arrays of different sizes and shapes. Also, universal functions that take multiple arguments still work according to the  broadcasting rules. Examples of universal functions that take multiple arguments are np.sum or np.difference. Universal functions can be defined in standard NumPy by implementing the scalar version and using the np.vectorize function to enhance the function with the broadcasting feature. As an example, we will see how to write the Cantor pairing function. A pairing function is a function that encodes two natural numbers into a single natural number so that you can easily interconvert between the two representations. The Cantor pairing function can be written as follows: import numpy as np def cantor(a, b): return int(0.5 * (a + b)*(a + b + 1) + b) As already mentioned, it is possible to create a ufunc in pure Python using the np.vectorized decorator: @np.vectorize def cantor(a, b): return int(0.5 * (a + b)*(a + b + 1) + b) cantor(np.array([1, 2]), 2) # Result: # array([ 8, 12]) Except for the convenience, defining universal functions in pure Python is not very useful as it requires a lot of function calls affected by interpreter overhead. For this reason, ufunc implementation is usually done in C or Cython, but Numba beats all these methods by its convenience. All that is needed to do in order to perform the conversion is using the equivalent decorator, nb.vectorize. We can compare the speed of the standard np.vectorized version which, in the following code, is called cantor_py, and the same function is implemented using standard NumPy operations: # Pure Python %timeit cantor_py(x1, x2) 100 loops, best of 3: 6.06 ms per loop # Numba %timeit cantor(x1, x2) 100000 loops, best of 3: 15 µs per loop # NumPy %timeit (0.5 * (x1 + x2)*(x1 + x2 + 1) + x2).astype(int) 10000 loops, best of 3: 57.1 µs per loop You can see how the Numba version beats all the other options by a large margin! Numba works extremely well because the function is simple and type inference is possible. An additional advantage of universal functions is that, since they depend on individual values, their evaluation can also be executed in parallel. Numba provides an easy way to parallelize such functions by passing the target="cpu" or target="gpu" keyword argument to the nb.vectorize decorator. Generalized universal functions One of the main limitations of universal functions is that they must be defined on scalar values. A generalized universal function, abbreviated gufunc, is an extension of universal functions to procedures that take arrays. A classic example is the matrix multiplication. In NumPy, matrix multiplication can be applied using the np.matmul function, which takes two 2D arrays and returns another 2D array. An example usage of np.matmul is as follows: a = np.random.rand(3, 3) b = np.random.rand(3, 3) c = np.matmul(a, b) c.shape # Result: # (3, 3) As we saw in the previous subsection, a ufunc broadcasts the operation over arrays of scalars, its natural generalization will be to broadcast over an array of arrays. If, for instance, we take two arrays of 3 by 3 matrices, we will expect np.matmul to take to match the matrices and take their product. In the following example, we take two arrays containing 10 matrices of shape (3, 3). If we apply np.matmul, the product will be applied matrix-wise to obtain a new array containing the 10 results (which are, again, (3, 3) matrices): a = np.random.rand(10, 3, 3) b = np.random.rand(10, 3, 3) c = np.matmul(a, b) c.shape # Output # (10, 3, 3) The usual rules for broadcasting will work in a similar way. For example, if we have an array of (3, 3) matrices, which will have a shape of (10, 3, 3), we can use np.matmul to calculate the matrix multiplication of each element with a single (3, 3) matrix. According to the broadcasting rules, we obtain that the single matrix will be repeated to obtain a size of (10, 3, 3): a = np.random.rand(10, 3, 3) b = np.random.rand(3, 3) # Broadcasted to shape (10, 3, 3) c = np.matmul(a, b) c.shape # Result: # (10, 3, 3) Numba supports the implementation of efficient generalized universal functions through the nb.guvectorize decorator. As an example, we will implement a function that computes the euclidean distance between two arrays as a gufunc. To create a gufunc, we have to define a function that takes the input arrays, plus an output array where we will store the result of our calculation. The nb.guvectorize decorator requires two arguments: The types of the input and output: two 1D arrays as input and a scalar as output The so called layout string, which is a representation of the input and output sizes; in our case, we take two arrays of the same size (denoted arbitrarily by n), and we output a scalar In the following example, we show the implementation of the euclidean function using the nb.guvectorize decorator: @nb.guvectorize(['float64[:], float64[:], float64[:]'], '(n), (n) - > ()') def euclidean(a, b, out): N = a.shape[0] out[0] = 0.0 for i in range(N): out[0] += (a[i] - b[i])**2 There are a few very important points to be made. Predictably, we declared the types of the inputs a and b as float64[:], because they are 1D arrays. However, what about the output argument? Wasn't it supposed to be a scalar? Yes, however, Numba treats scalar argument as arrays of size 1. That's why it was declared as float64[:]. Similarly, the layout string indicates that we have two arrays of size (n) and the output is a scalar, denoted by empty brackets--(). However, the array out will be passed as an array of size 1. Also, note that we don't return anything from the function; all the output has to be written in the out array. The letter n in the layout string is completely arbitrary; you may choose to use k  or other letters of your liking. Also, if you want to combine arrays of uneven sizes, you can use layouts strings, such as (n, m). Our brand new euclidean function can be conveniently used on arrays of different shapes, as shown in the following example: a = np.random.rand(2) b = np.random.rand(2) c = euclidean(a, b) # Shape: (1,) a = np.random.rand(10, 2) b = np.random.rand(10, 2) c = euclidean(a, b) # Shape: (10,) a = np.random.rand(10, 2) b = np.random.rand(2) c = euclidean(a, b) # Shape: (10,) How does the speed of euclidean compare to standard NumPy? In the following code, we benchmark a NumPy vectorized version with our previously defined euclidean function: a = np.random.rand(10000, 2) b = np.random.rand(10000, 2) %timeit ((a - b)**2).sum(axis=1) 1000 loops, best of 3: 288 µs per loop %timeit euclidean(a, b) 10000 loops, best of 3: 35.6 µs per loop The Numba version, again, beats the NumPy version by a large margin! Summary Numba is a tool that compiles fast, specialized versions of Python functions at runtime. In this article, we learned how to compile, inspect, and analyze functions compiled by Numba. We also learned how to implement fast NumPy universal functions that are useful in a wide array of numerical applications.  Tools such as PyPy allow us to run Python programs unchanged to obtain significant speed improvements. We demonstrated how to set up PyPy, and we assessed the performance improvements on our particle simulator application. Resources for Article: Further resources on this subject: Getting Started with Python Packages [article] Python for Driving Hardware [article] Python Data Science Up and Running [article]
Read more
  • 0
  • 0
  • 2107

article-image-spatial-data
Packt
23 Jun 2017
12 min read
Save for later

Spatial Data

Packt
23 Jun 2017
12 min read
In this article by Dominik Mikiewicz, the author of the book Mastering PostGIS, we will see about exporting data from PostgreSQL/PostGIS to files or other data sources. Sharing data via the Web is no less important, but it has its own specific process. There may be different reasons for having to export data from a database, but certainly sharing it with others is among the most popular ones. Backing the data up or transferring it to other software packages for further processing are other common reasons for learning the export techniques. (For more resources related to this topic, see here.) In this article we'll have a closer look at the following: Exporting data using COPY (and COPY) Exporting vector data using pgsql2shp Exporting vector data using ogr2ogr Exporting data using GIS clients Outputting rasters using GDAL Outputting rasters using psql Using the PostgreSQL backup functionality We just do the steps the other way round. In other words, this article may give you a bit of a déjà vu feeling. Exporting data using COPY in psql When we were importing data, we used the psql COPY FROM command to copy data from a file to a table. This time, we'll do it the other way round—from a table to a file—using the COPY TO command. COPY TO can not only copy a full table, but also the result of a SELECT query, and that means we can actually output filtered sub datasets of the source tables. Similarly to the method we used to import, we can execute COPY or COPY in different scenarios: We'll use psql in interactive and non-interactive mode, and we'll also do the very same thing in PgAdmin. It is worth remembering that COPY can only read/write files that can be accessed by an instance of the server, so usually files that reside on the same machine as the database server. For detailed information on COPY syntax and parameters, type: h copy Exporting data in psql interactively In order to export the data in interactive mode we first need to connect to the database using psql: psql -h localhost -p 5434 -U postgres Then type the following: c mastering_postgis Once connected, we can execute a simple command: copy data_import.earthquakes_csv TO earthquakes.csv WITH DELIMITER ';' CSV HEADER The preceding command exported a data_import. earthquakes_csv table to a file named earthquakes.csv, with ';' as a column separator. A header in the form of column names has also been added to the beginning of the file. The output should be similar to the following: COPY 50 Basically, the database told us how many records have been exported. The content of the exported file should exactly resemble the content of the table we exported from: time;latitude;longitude;depth;mag;magtype;nst;gap;dmin;rms;net;id;updated;place;type;horizontalerror;deptherror;magerror;magnst;status;locationsource;magsource 2016-10-08 14:08:08.71+02;36.3902;-96.9601;5;2.9;mb_lg;;60;0.029;0.52;us;us20007csd;2016-10-08 14:27:58.372+02;15km WNW of Pawnee, Oklahoma;earthquake;1.3;1.9;0.1;26;reviewed;us;us As mentioned, COPY can also output the results of a SELECT query. This means we can tailor the output to very specific needs, as required. In the next example, we'll export data from a 'spatialized' earthquakes table, but the geometry will be converted to a WKT (well-known text) representation. We'll also export only a subset of columns: copy (select id, ST_AsText(geom) FROM data_import.earthquakes_subset_with_geom) TO earthquakes_subset.csv WITH CSV DELIMITER '|' FORCE QUOTE * HEADER Once again, the output just specifies the amount of records exported: COPY 50 The executed command exported only the id column and a WKT-encoded geometry column. The export force wrapped the data into quote symbols, with a pipe (|) symbol used as a delimiter. The file has header: id|st_astext "us20007csd"|"POINT(-96.9601 36.3902)" "us20007csa"|"POINT(-98.7058 36.4314) Exporting data in psql non-interactively If you're still in psql, you can execute a script by simply typing the following: i path/to/the/script.sql For example: i code/psql_export.sql The output will not surprise us, as it will simply state the number of records that were outputted: COPY 50 If you happen to have already quitted psql, the cmd i equivalent is -f, so the command should look like this: Psql -h localhost -p 5434 -U postgres -d mastering_postgis -f code/psql_export.sql Not surprisingly, the cmd output is once again the following: COPY 50 Exporting data in PgAdmin In PgAdmin, the command is COPY rather than COPY. The rest of the code remains the same. Another difference is that we need to use an absolute path, while in psql we can use paths relative to the directory we started psql in. So the first psql query 'translated' to the PgAdmin SQL version looks like this: copy data_import.earthquakes_csv TO 'F:mastering_postgischapter06earthquakes.csv' WITH DELIMITER ';' CSV HEADER The second query looks like this: copy (select id, ST_AsText(geom) FROM data_import.earthquakes_subset_with_geom) TO 'F:mastering_postgischapter06earthquakes_subset.csv' WITH CSV DELIMITER '|' FORCE QUOTE * HEADER Both produce a similar output, but this time it is logged in PgAdmin's query output pane 'Messages' tab: Query returned successfully: 50 rows affected, 55 msec execution time. It is worth remembering that COPY is executed as part of an SQL command, so it is effectively the DB server that tries to write to a file. Therefore, it may be the case that the server is not able to access a specified directory. If your DB server is on the same machine as the directory that you are trying to write to, relaxing directory access permissions should help. Exporting vector data using pgsql2shp pgsql2shp is a command-line tool that can be used to output PostGIS data into shapefiles. Similarly to outgoing COPY, it can either export a full table or the result of a query, so this gives us flexibility when we only need a subset of data to be outputted and we do not want to either modify the source tables or create temporary, intermediate ones. pgsql2sph command line In order to get some help with the tool just type the following in the console: pgsql2shp The general syntax for the tool is as follows: pgsql2shp [<options>] <database> [<schema>.]<table> pgsql2shp [<options>] <database> <query> Shapefile is a format that is made up of a few files. The minimum set is SHP, SHX, and DBF. If PostGIS is able to determine the projection of the data, it will also export a PRJ file that will contain the SRS information, which should be understandable by the software able to consume a shapefile. If a table does not have a geometry column, then only a DBF file that is the equivalent of the table data will be exported. Let's export a full table first: pgsql2shp -h localhost -p 5434 -u postgres -f full_earthquakes_dataset mastering_postgis data_import.earthquakes_subset_with_geom The following output should be expected: Initializing... Done (postgis major version: 2). Output shape: Point Dumping: X [50 rows] Now let's do the same, but this time with the result of a query: pgsql2shp -h localhost -p 5434 -u postgres -f full_earthquakes_dataset mastering_postgis "select * from data_import.earthquakes_subset_with_geom limit 1" To avoid being prompted for a password, try providing it within the command via the -P switch. The output will be very similar to what we have already seen: Initializing... Done (postgis major version: 2). Output shape: Point Dumping: X [1 rows] In the data we previously imported, we do not have examples that would manifest shapefile limitations. It is worth knowing about them, though. You will find a decent description at https://en.wikipedia.org/wiki/Shapefile#Limitations. The most important ones are as follows: Column name length limit: The shapefile can only handle column names with a maximum length of 10 characters; pgsql2shp will not produce duplicate columns, though—if there were column names that would result in duplicates when truncated, then the tool will add a sequence number. Maximum field length: The maximum field length is 255; psql will simply truncate the data upon exporting. In order to demonstrate the preceding limitations, let's quickly create a test PostGIS dataset: Create a schema if an export does not exist: CREATE SCHEMA IF NOT EXISTS data_export; CREATE TABLE IF NOT EXISTS data_export.bad_bad_shp ( id character varying, "time" timestamp with time zone, depth numeric, mag numeric, very_very_very_long_column_that_holds_magtype character varying, very_very_very_long_column_that_holds_place character varying, geom geometry); INSERT INTO data_export.bad_bad_shp select * from data_import.earthquakes_subset_with_geom limit 1; UPDATE data_export.bad_bad_shp SET very_very_very_long_column_that_holds_magtype = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce id mauris eget arcu imperdiet tristique eu sed est. Quisque suscipit risus eu ante vestibulum hendrerit ut sed nulla. Nulla sit amet turpis ipsum. Curabitur nisi ante, luctus nec dignissim ut, imperdiet id tortor. In egestas, tortor ac condimentum sollicitudin, nisi lacus porttitor nibh, a tempus ex tellus in ligula. Donec pharetra laoreet finibus. Donec semper aliquet fringilla. Etiam faucibus felis ac neque facilisis vestibulum. Vivamus scelerisque at neque vel tincidunt. Phasellus gravida, ipsum vulputate dignissim laoreet, augue lacus congue diam, at tempus augue dolor vitae elit.'; Having prepared a vigilante dataset, let's now export it to SHP to see if our SHP warnings were right: pgsql2shp -h localhost -p 5434 -u postgres -f bad_bad_shp mastering_postgis data_export.bad_bad_shp When you now open the exported shapefile in a GIS client of your choice, you will see our very, very long column names renamed to VERY_VERY_ and VERY_VE_01. The content of the very_very_very_long_column_that_holds_magtype field has also been truncated to 255 characters, and is now 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce id mauris eget arcu imperdiet tristique eu sed est. Quisque suscipit risus eu ante vestibulum hendrerit ut sed nulla. Nulla sit amet turpis ipsum. Curabitur nisi ante, luctus nec dignissim ut'. For the sake of completeness, we'll also export a table without geometry so we are certain that pgsql2shp exports only a BDF file: pgsql2shp -h localhost -p 5434 -u postgres -f a_lonely_dbf mastering_postgis "select id, place from data_import.earthquakes_subset_with_geom limit 1" pgsql2shp gui We have already seen the PgAdmin's GUI for importing shapefiles. As you surely remember, the pgsql2shp GUI also has an Export tab. If you happen to encounter difficulties locating the pgsql2shp GUI in pgAdmin 4, try calling it from the shell/command line by executing shp2pgsql-gui. If because of some reason it is not recognized, try to locate the utility in your DB directory under bin/postgisgui/shp2pgsql-gui.exe. In order to export a shapefile from PostGIS, go to the PluginsPostGIS shapefile and DBF loader 2.2 (version may vary); then you have to switch to the Export tab: It is worth mentioning that you have some options to choose from when exporting. They are rather self-explanatory: When you press the Export button, you can choose the output destination. The log is displayed in the 'Log Window' area of the exporter GUI: Exporting vector data using ogr2ogr We have already seen a little preview of ogr2ogr exporting the data when we made sure that our KML import had actually brought in the proper data. This time we'll expand on the subject a bit and also export a few more formats to give you an idea of how sound a tool ogr2ogr is. In order to get some information on the tool, simply type the following in the console: ogr2ogr Alternatively, if you would like to get some more descriptive info, visit http://www.gdal.org/ogr2ogr.html. You could also type the following: ogr2ogr –long-usage The nice thing about ogr2ogr is that the tool is very flexible and offers some options that allow us to export exactly what we are after. You can specify what data you would like to select by specifying the columns in a -select parameter. -where parameter lets you specify the filtering for your dataset in case you want to output only a subset of data. Should you require more sophisticated output preparation logic, you can use a -sql parameter. This is obviously not all there is. The usual gdal/ogr2ogr parameters are available too. You can reproject the data on the fly using the -t_srs parameter, and if, for some reason, the SRS of your data has not been clearly defined, you can use -s_srs to instruct ogr2ogr what the source coordinate system is for the dataset being processed. There are obviously advanced options too. Should you wish to clip your dataset to a specified bounding box, polygon, or coordinate system, have a look at the -clipsrc, -clipdst parameters and their variations. The last important parameter to know is -dsco—dataset creation options. It accepts values in the form of NAME=VALUE. When you want to pass more than one option this way, simply repeat the parameter. The actual dataset creation options depend on the format used, so it is advised that you consult the appropriate format information pages available via the ogr2pgr website. Summary There are many ways of getting the data out of a database. Some are PostgreSQL specific, some are PostGIS specific. The point is that you can use and mix any tools you prefer. There will be scenarios where simple data extraction procedures will do just fine; some other cases will require a more specialized setup, SQL or psql, or even writing custom code in external languages. I do hope this article gives you a toolset you can use with confidence in your daily activities. Resources for Article: Further resources on this subject: Spatial Data Services [article] Data Around Us [article] R ─ Classification and Regression Trees [article]
Read more
  • 0
  • 0
  • 1005

article-image-understanding-microservices
Packt
22 Jun 2017
19 min read
Save for later

Understanding Microservices

Packt
22 Jun 2017
19 min read
This article by Tarek Ziadé, author of the book Python Microservices Development explains the benefits and implementation of microservices with Python. While the microservices architecture looks more complicated than its monolithic counterpart, its advantages are multiple. It offers the following benefits. (For more resources related to this topic, see here.) Separation of concerns First of all, each microservice can be developed independently by a separate team. For instance, building a reservation service can be a full project on its own. The team in charge can make it in whatever programming language and database, as long as it has a well-documented HTTP API. That also means the evolution of the app is more under control than with monoliths. For example, if the payment system changes its underlying interactions with the bank, the impact is localized inside that service and the rest of the application stays stable and under control. This loose coupling improves a lot the overall project velocity as we're applying at the service level a similar philosophy than the single responsibility principle. The single responsibility principle was defined by Robert Martin to explain that a class should have only one reason to change - in other words, each class should be providing a single, well-defined feature. Applied to microservices, it means that we want to make sure that each microservice focuses on a single role. Smaller projects The second benefit is breaking the complexity of the project. When you are adding a feature to an application like the PDF reporting, even if you are doing it cleanly, you are making the base code bigger, more complicated and sometimes slower. Building that feature in a separate application avoids this problem, and makes it easier to write it with whatever tools you want. You can refactor it often and shorten your release cycles, and stay on the top of things. The growth of the application remains under your control. Dealing with a smaller project also reduces risks when improving the application: if a team wants to try out the latest programming language or framework, they can iterate quickly on a prototype that implements the same microservice API, try it out, and decide whether or not to stick with it. One real-life example in mind is the Firefox Sync storage microservice. There are currently some experiments to switch from the current Python+MySQL implementation to a Go based one that stores users data in standalone SQLite databases. That prototype is highly experimental, but since we have isolated the storage feature in a microservice with a well-defined HTTP API, it's easy enough to give it a try with a small subset of the user base. Scaling and deployment Last, having your application split into components makes it easier to scale depending on your constraints. Let's say you are starting to get a lot of customers that are booking hotels daily, and the PDF generation is starting to heat up the CPUs. You can deploy that specific microservice in some servers that have bigger CPUs. Another typical example is RAM-consuming microservices like the ones that are interacting with memory databases like Redis or Memcache. You could tweak your deployments consequently by deploying them on servers with less CPU and a lot more RAM. To summarize microservices benefits: A team can develop each microservice independently, and use whatever technological stack makes sense. They can define a custom release cycle. The tip of the iceberg is its language agnostic HTTP API. Developers break the application complexity into logical components. Each microservice focuses on doing one thing well. Since microservices are standalone applications, there's a finer control on deployments, which makes scaling easier. Microservices architectures are good at solving a lot of the problems that may arise once your application is starting to grow. Although, we need to be aware of some of the new issues they also bring in practice. Implementing microservices with Python Python is an amazingly versatile language. As you probably already know, it's used to build many different kinds of applications, from simple system scripts that perform tasks on a server, to large object-oriented applications that run services for millions of users. According to a study conducted by Philip Guo in 2014, published in the Association for Computing Machinery (ACM) website, Python has surpassed Java in top U.S. universities and is the most popular language to learn Computer Science. This trend is also true in the software industry. Python sits now in the top 5 languages in the TIOBE index (http://www.tiobe.com/tiobe-index/), and it's probably even bigger in the web development land since languages like C are rarely used as main languages to build web applications. However, some developers criticize Python for being slow and unfit for building efficient web services. Python is slow, and this is undeniable. But it's still is a language of choice for building microservices, and many major companies are happily using it. This section will give you some background on the different ways you can write microservices using Python, some insights on asynchronous versus synchronous programming, and conclude with some details on Python performances. It's composed of 4 parts: The WSGI standard Greenlet & Gevent Twisted & Tornado asyncio Language performances The WSGI standard What strikes the most web developers that are starting with Python is how easy it is to get a web application up and running. The Python web community has created a standard inspired from the Common Gateway Interface (CGI) called Web Server Gateway Interface (WSGI) that simplifies a lot how you can write a Python application which goal is to serve HTTP requests. When your code is using that standard, your project can be executed by standard web servers like Apache or NGinx, using WSGI extensions like uwsgi or mod_wsgi. Your application just has to deal with incoming requests and send back JSON responses, and Python includes all that goodness in its standard library. You can create a fully functional microservice that returns the server's local time with a vanilla Python module of fewer than ten lines: import JSON import time def application(environ, start_response): headers = [('Content-type', 'application/json')] start_response('200 OK', headers) return bytes(json.dumps({'time': time.time()}), 'utf8') Since its introduction, the WSGI protocol became an essential standard and the Python web community widely adopted it. Developers wrote middlewares, which are functions you can hook before or after the WSGI application function itself, to do something within the environment. Some web frameworks were created specifically around that standard, like Bottle (http://bottlepy.org) - and soon enough, every framework out there could be used through WSGI in a way or another. The biggest problem with WSGI though is its synchronous nature. The application function you see above is called exactly once per incoming request, and when the function returns, it has to send back the response. That means that every time you are calling the function, it will block until the response is ready. And writing microservices means your code will be waiting for responses from various network resources all the time. In other words, your application will idle and just block the client until everything is ready. That's an entirely okay behavior for HTTP APIs. We're not talking about building bidirectional applications like web socket based ones. But what happens when you have several incoming requests that are calling your application at the same time? WSGI servers will let you run a pool of threads to serve several requests concurrently. But you can't run thousands of them, and as soon as the pool is exhausted, the next request will be blocking even if your microservice is doing nothing but idling and waiting for backend services responses. That's one of the reasons why non-WSGI frameworks like Twisted, Tornado and in Javascript land Node.js became very successful - it's fully async. When you're coding a Twisted application, you can use callbacks to pause and resume the work done to build a response. That means you can accept new requests and start to treat them. That model dramatically reduces the idling time in your process. It can serve thousands of concurrent requests. Of course, that does not mean the application will return each single response faster. It just means one process can accept more concurrent requests and juggle between them as the data is getting ready to be sent back. There's no simple way with the WSGI standard to introduce something similar, and the community has debated for years to come up with a consensus - and failed. The odds are that the community will eventually drop the WSGI standard for something else. In the meantime, building microservices with synchronous frameworks is still possible and completely fine if your deployments take into account the one request == one thread limitation of the WSGI standard. There's, however, one trick to boost synchronous web applications: greenlets. Greenlet & Gevent The general principle of asynchronous programming is that the process deals with several concurrent execution contexts to simulate parallelism. Asynchronous applications are using an event loop that pauses and resumes execution contexts when an event is triggered - only one context is active, and they take turns. Explicit instruction in the code will tell the event loop that this is where it can pause the execution. When that occurs, the process will look for some other pending work to resume. Eventually, the process will come back to your function and continue it where it stopped - moving from an execution context to another is called switching. The Greenlet project (https://github.com/python-greenlet/greenlet) is a package based on the Stackless project, a particular CPython implementation, and provides greenlets. Greenlets are pseudo-threads that are very cheap to instantiate, unlike real threads, and that can be used to call python functions. Within those functions, you can switch and give back the control to another function. The switching is done with an event loop and allows you to write an asynchronous application using a Thread-like interface paradigm. Here's an example from the Greenlet documentation def test1(x, y): z = gr2.switch(x+y) print z def test2(u): print u gr1.switch(42) gr1 = greenlet(test1) gr2 = greenlet(test2) gr1.switch("hello", " world") The two greenlets are explicitly switching from one to the other. For building microservices based on the WSGI standard, if the underlying code was using greenlets we could accept several concurrent requests and just switch from one to another when we know a call is going to block the request - like performing a SQL query. Although, switching from one greenlet to another has to be done explicitly, and the resulting code can quickly become messy and hard to understand. That's where Gevent can become very useful. The Gevent project (http://www.gevent.org/) is built on the top of Greenlet and offers among other things an implicit and automatic way of switching between greenlets. It provides a cooperative version of the socket module that will use greenlets to automatically pause and resume the execution when some data is made available in the socket. There's even a monkey patch feature that will automatically replace the standard lib socket with Gevent's version. That makes your standard synchronous code magically asynchronous every time it uses sockets - with just one extra line. from gevent import monkey; monkey.patch_all() def application(environ, start_response): headers = [('Content-type', 'application/json')] start_response('200 OK', headers) # ...do something with sockets here... return result This implicit magic comes with a price, though. For Gevent to work well, all the underlying code needs to be compatible with the patching Gevent is doing. Some packages from the community will continue to block or even have unexpected results because of this. In particular, if they use C extensions and bypass some of the features of the standard library Gevent patched. But for most cases, it works well. Projects that are playing well with Gevent are dubbed "green," and when a library is not functioning well, and the community asks its authors to "make it green," it usually happens. That's what was used to scale the Firefox Sync service at Mozilla for instance. Twisted and Tornado If you are building microservices where increasing the number of concurrent requests you can hold is important, it's tempting to drop the WSGI standard and just use an asynchronous framework like Tornado (http://www.tornadoweb.org/) or Twisted (https://twistedmatrix.com/trac/). Twisted has been around for ages. To implement the same microservices you need to write a slightly more verbose code: import time from twisted.web import server, resource from twisted.internet import reactor, endpoints class Simple(resource.Resource): isLeaf = True def render_GET(self, request): request.responseHeaders.addRawHeader(b"content-type", b"application/json") return bytes(json.dumps({'time': time.time()}), 'utf8') site = server.Site(Simple()) endpoint = endpoints.TCP4ServerEndpoint(reactor, 8080) endpoint.listen(site) reactor.run() While Twisted is an extremely robust and efficient framework, it suffers from a few problems when building HTTP microservices: You need to implement each endpoint in your microservice with a class derived from a Resource class, and that implements each supported method. For a few simple APIs, it adds a lot of boilerplate code. Twisted code can be hard to understand & debug due to its asynchronous nature. It's easy to fall into callback hell when you're chaining too many functions that are getting triggered successively one after the other - and the code can get messy Properly testing your Twisted application is hard, and you have to use Twisted-specific unit testing model. Tornado is based on a similar model but is doing a better job in some areas. It has a lighter routing system and does everything possible to make the code closer to plain Python. Tornado is also using a callback model, so debugging can be hard. But both frameworks are working hard at bridging the gap to rely on the new async features introduced in Python 3. asyncio When Guido van Rossum started to work on adding async features in Python 3, part of the community pushed for a Gevent-like solution because it made a lot of sense to write applications in a synchronous, sequential fashion - rather than having to add explicit callbacks like in Tornado or Twisted. But Guido picked the explicit technique and experimented in a project called Tulip that Twisted inspired. Eventually, asyncio was born out of that side project and added into Python. In hindsight, implementing an explicit event loop mechanism in Python instead of going the Gevent way makes a lot of sense. The way the Python core developers coded asyncio and how they elegantly extended the language with the async and await keywords to implement coroutines, made asynchronous applications built with vanilla Python 3.5+ code look very elegant and close to synchronous programming. By doing this, Python did a great job at avoiding the callback syntax mess we sometimes see in Node.js or Twisted (Python 2) applications. And beyond coroutines, Python 3 has introduced a full set of features and helpers in the asyncio package to build asynchronous applications, see https://docs.python.org/3/library/asyncio.html. Python is now as expressive as languages like Lua to create coroutine-based applications, and there are now a few emerging frameworks that have embraced those features and will only work with Python 3.5+ to benefit from this. KeepSafe's aiohttp (http://aiohttp.readthedocs.io) is one of them, and building the same microservice, fully asynchronous, with it would simply be these few elegant lines. from aiohttp import web import time async def handle(request): return web.json_response({'time': time.time()}) if __name__ == '__main__': app = web.Application() app.router.add_get('/', handle) web.run_app(app) In this small example, we're very close to how we would implement a synchronous app. The only hint we're async is the async keyword marking the handle function as being a coroutine. And that's what's going to be used at every level of an async Python app going forward. Here's another example using aiopg - a Postgresql lib for asyncio. From the project documentation: import asyncio import aiopg dsn = 'dbname=aiopg user=aiopg password=passwd host=127.0.0.1' async def go(): pool = await aiopg.create_pool(dsn) async with pool.acquire() as conn: async with conn.cursor() as cur: await cur.execute("SELECT 1") ret = [] async for row in cur: ret.append(row) assert ret == [(1,)] loop = asyncio.get_event_loop() loop.run_until_complete(go()) With a few async and await prefixes, the function that's performing a SQL query and send back the result looks a lot like a synchronous function. But asynchronous frameworks and libraries based on Python 3 are still emerging, and if you are using asyncio or a framework like aiohttp, you will need to stick with particular asynchronous implementations for each feature you need. If you require using a library that is not asynchronous in your code, using it from your asynchronous code means you will need to go through some extra and challenging work if you want to prevent blocking the event loop. If your microservices are dealing with a limited number of resources, it could be manageable. But it's probably a safer bet at this point (2017) to stick with a synchronous framework that's been around for a while rather than an asynchronous one. Let's enjoy the existing ecosystem of mature packages, and wait until the asyncio ecosystem gets more sophisticated. And there are many great synchronous frameworks to build microservices with Python, like Bottle, Pyramid with Cornice or Flask. Language performances In the previous sections we've been through the two different ways to write microservices - asynchronous vs. synchronous, and whatever technique you are using, the speed of Python is directly impacting the performance of your microservice. Of course, everyone knows Python is slower than Java or Go - but execution speed is not always the top priority. A microservice is often a thin layer of code that is sitting most of its life waiting for some network responses from other services. Its core speed is usually less important than how fast your SQL queries will take to return from your Postgres server because the latter will represent most of the time spent to build the response. But wanting an application that's as fast as possible is legitimate. One controversial topic in the Python community around speeding up the language is how the Global Interpreter Lock (GIL) mutex can ruin performances because multi-threaded applications cannot use several processes. The GIL has good reasons to exist. It protects non thread-safe parts of the CPython interpreter and exists in other languages like Ruby. And all attempts to remove it so far have failed to produce a faster CPython implementation. Larry Hasting is working on a GIL-free CPython project called Gilectomy - https://github.com/larryhastings/gilectomy - its minimal goal is to come up with a GIL-free implementation that can run a single-threaded application as fast as CPython. As of today (2017), this implementation is still slower that CPython. But it's interesting to follow this work and see if it reaches speed parity one day. That would make a GIL-free CPython very appealing. For microservices, besides preventing the usage of multiple cores in the same process, the GIL will slightly degrade performances on high load, because of the system calls overhead introduced by the mutex. Although, all the scrutiny around the GIL had one beneficial impact: some work has been done in the past years to reduce its contention in the interpreter, and in some area, Python performances have improved a lot. But bear in mind that even if the core team removes the GIL, Python is an interpreted language and the produced code will never be very efficient at execution time. Python provides the dis module if you are interested to see how the interpreter decomposes a function. In the example below, the interpreter will decompose a simple function that yields incremented values from a sequence in no less than 29 steps! >>> def myfunc(data): ... for value in data: ... yield value + 1 ... >>> import dis >>> dis.dis(myfunc) 2 0 SETUP_LOOP 23 (to 26) 3 LOAD_FAST 0 (data) 6 GET_ITER >> 7 FOR_ITER 15 (to 25) 10 STORE_FAST 1 (value) 3 13 LOAD_FAST 1 (value) 16 LOAD_CONST 1 (1) 19 BINARY_ADD 20 YIELD_VALUE 21 POP_TOP 22 JUMP_ABSOLUTE 7 >> 25 POP_BLOCK >> 26 LOAD_CONST 0 (None) 29 RETURN_VALUE A similar function written in a statically compiled language will dramatically reduce the number of operations required to produce the same result. There are ways to speed up Python execution, though. One is to write part of your code into compiled code by building C extensions or using a static extension of the language like Cython (http://cython.org/) - but that makes your code more complicated. Another solution, which is the most promising one, is by simply running your application using the PyPy interpreter (http://pypy.org/). PyPy implements a Just-In-Time compiler (JIT). This compiler is directly replacing at run time pieces of Python with machine code that can be directly used by the CPU. The whole trick for the JIT is to detect in real time, ahead of the execution, when and how to do it. Even if PyPy is always a few Python versions behind CPython, it reached a point where you can use it in production, and its performances can be quite amazing. In one of our projects at Mozilla that needs fast execution, the PyPy version was almost as fast as the Go version, and we've decided to use Python there instead. The Pypy Speed Center website is a great place to look at how PyPy compares to CPython - http://speed.pypy.org/ However, if your program uses C extensions, you will need to recompile them for PyPy, and that can be a problem. In particular, if other developers maintain some of the extensions you are using. But if you are building your microservice with a standard set of libraries, the chances are that will it work out of the box with the PyPy interpreter, so that's worth a try. In any case, for most projects, the benefits of Python and its ecosystem largely surpasses the performances issues described in this section because the overhead in a microservice is rarely a problem. Summary In this article we saw that Python is considered to be one of the best languages to write web applications, and therefore microservices - for the same reasons, it's a language of choice in other areas and also because it provides tons of mature frameworks and packages to do the work. Resources for Article: Further resources on this subject: Inbuilt Data Types in Python [article] Getting Started with Python Packages [article] Layout Management for Python GUI [article]
Read more
  • 0
  • 0
  • 2770
Banner background image

article-image-what-are-microservices
Packt
20 Jun 2017
12 min read
Save for later

What are Microservices?

Packt
20 Jun 2017
12 min read
In this article written by Gaurav Kumar Aroraa, Lalit Kale, Kanwar Manish, authors of the book Building Microservices with .NET Core, we will start with a brief introduction. Then, we will define its predecessors: monolithic architecture and service-oriented architecture (SOA). After this, we will see how microservices fare against both SOA and the monolithic architecture. We will then compare the advantages and disadvantages of each one of these architectural styles. This will enable us to identify the right scenario for these styles. We will understand the problems that arise from having a layered monolithic architecture. We will discuss the solutions available to these problems in the monolithic world. At the end, we will be able to break down a monolithic application into a microservice architecture. We will cover the following topics in this article: Origin of microservices Discussing microservices (For more resources related to this topic, see here.) Origin of microservices The term microservices was used for the first time in mid-2011 at a workshop of software architects. In March 2012, James Lewis presented some of his ideas about microservices. By the end of 2013, various groups from the IT industry started having discussions on microservices, and by 2014, it had become popular enough to be considered a serious contender for large enterprises. There is no official introduction available for microservices. The understanding of the term is purely based on the use cases and discussions held in the past. We will discuss this in detail, but before that, let's check out the definition of microservices as per Wikipedia (https://en.wikipedia.org/wiki/Microservices), which sums it up as: Microservices is a specialization of and implementation approach for SOA used to build flexible, independently deployable software systems. In 2014, James Lewis and Martin Fowler came together and provided a few real-world examples and presented microservices (refer to http://martinfowler.com/microservices/) in their own words and further detailed it as follows: The microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies. It is very important that you see all the attributes James and Martin defined here. They defined it as an architectural style that developers could utilize to develop a single application with the business logic spread across a bunch of small services, each having their own persistent storage functionality. Also, note its attributes: it can be independently deployable, can run in its own process, is a lightweight communication mechanism, and can be written in different programming languages. We want to emphasize this specific definition since it is the crux of the whole concept. And as we move along, it will come together by the time we finish this book. Discussing microservices Until now, we have gone through a few definitions of microservices; now, let's discuss microservices in detail. In short, a microservice architecture removes most of the drawbacks of SOA architectures.  Slicing your application into a number of services is neither SOA nor microservices. However, combining service design and best practices from the SOA world along with a few emerging practices, such as isolated deployment, semantic versioning, providing lightweight services, and service discovery in polyglot programming, is microservices. We implement microservices to satisfy business features and implement them with reduced time to market and greater flexibility. Before we move on to understand the architecture, let's discuss the two important architectures that have led to its existence: The monolithic architecture style SOA Most of us would be aware of the scenario where during the life cycle of an enterprise application development, a suitable architectural style is decided. Then, at various stages, the initial pattern is further improved and adapted with changes that cater to various challenges, such as deployment complexity, large code base, and scalability issues. This is exactly how the monolithic architecture style evolved into SOA, further leading up to microservices. Monolithic architecture The monolithic architectural style is a traditional architecture type and has been widely used in the industry. The term "monolithic" is not new and is borrowed from the Unix world. In Unix, most of the commands exist as a standalone program whose functionality is not dependent on any other program. As seen in the succeeding image, we can have different components in the application such as: User interface: This handles all of the user interaction while responding with HTML or JSON or any other preferred data interchange format (in the case of web services). Business logic: All the business rules applied to the input being received in the form of user input, events, and database exist here. Database access: This houses the complete functionality for accessing the database for the purpose of querying and persisting objects. A widely accepted rule is that it is utilized through business modules and never directly through user-facing components. Software built using this architecture is self-contained. We can imagine a single .NET assembly that contains various components, as described in the following image: As the software is self-contained here, its components are interconnected and interdependent. Even a simple code change in one of the modules may break a major functionality in other modules. This would result in a scenario where we'd need to test the whole application. With the business depending critically on its enterprise application frameworks, this amount of time could prove to be very critical. Having all the components tightly coupled poses another challenge: whenever we execute or compile such software, all the components should be available or the build will fail; refer to the preceding image that represents a monolithic architecture and is a self-contained or a single .NET assembly project. However, monolithic architectures might also have multiple assemblies. This means that even though a business layer (assembly, data access layer assembly, and so on) is separated, at run time, all of them will come together and run as one process.  A user interface depends on other components' direct sale and inventory in a manner similar to all other components that depend upon each other. In this scenario, we will not be able to execute this project in the absence of any one of these components. The process of upgrading any one of these components will be more complex as we may have to consider other components that require code changes too. This results in more development time than required for the actual change. Deploying such an application will become another challenge. During deployment, we will have to make sure that each and every component is deployed properly; otherwise, we may end up facing a lot of issues in our production environments. If we develop an application using the monolithic architecture style, as discussed previously, we might face the following challenges: Large code base: This is a scenario where the code lines outnumber the comments by a great margin. As components are interconnected, we will have to bear with a repetitive code base. Too many business modules: This is in regard to modules within the same system. Code base complexity: This results in a higher chance of code breaking due to the fix required in other modules or services. Complex code deployment: You may come across minor changes that would require whole system deployment. One module failure affecting the whole system: This is in regard to modules that depend on each other. Scalability: This is required for the entire system and not just the modules in it. Intermodule dependency: This is due to tight coupling. Spiraling development time: This is due to code complexity and interdependency. Inability to easily adapt to a new technology: In this case, the entire system would need to be upgraded. As discussed earlier, if we want to reduce development time, ease of deployment, and improve maintainability of software for enterprise applications, we should avoid the traditional or monolithic architecture. Service-oriented architecture In the previous section, we discussed the monolithic architecture and its limitations. We also discussed why it does not fit into our enterprise application requirements. To overcome these issues, we should go with some modular approach where we can separate the components such that they should come out of the self-contained or single .NET assembly. The main difference between SOA & monolithic is not one or multiple assembly. But as the service in SOA runs as separate process, SOA scales better compared to monolithic. Let's discuss the modular architecture, that is, SOA. This is a famous architectural style using which the enterprise applications are designed with a collection of services as its base. These services may be RESTful or ASMX Web services. To understand SOA in more detail, let's discuss "service" first. What is service? Service, in this case, is an essential concept of SOA. It can be a piece of code, program, or software that provides some functionality to other system components. This piece of code can interact directly with the database or indirectly through another service. Furthermore, it can be consumed by clients directly, where the client may either be a website, desktop app, mobile app, or any other device app. Refer to the following diagram: Service refers to a type of functionality exposed for consumption by other systems (generally referred to as clients/client applications). As mentioned earlier, it can be represented by a piece of code, program, or software. Such services are exposed over the HTTP transport protocol as a general practice. However, the HTTP protocol is not a limiting factor, and a protocol can be picked as deemed fit for the scenario. In the following image, Service – direct selling is directly interacting with Database, and three different clients, namely Web, Desktop, and Mobile, are consuming the service. On the other hand, we have clients consuming Service – partner selling, which is interacting with Service – channel partners for database access. A product selling service is a set of services that interacts with client applications and provides database access directly or through another service, in this case, Service – Channel partner.  In the case of Service – direct selling, shown in the preceding example, it is providing some functionality to a Web Store, a desktop application, and a mobile application. This service is further interacting with the database for various tasks, namely fetching data, persisting data, and so on. Normally, services interact with other systems via some communication channel, generally the HTTP protocol. These services may or may not be deployed on the same or single servers. In the preceding image, we have projected an SOA example scenario. There are many fine points to note here, so let's get started. Firstly, our services can be spread across different physical machines. Here, Service-direct selling is hosted on two separate machines. It is a possible scenario that instead of the entire business functionality, only a part of it will reside on Server 1 and the remaining on Server 2. Similarly, Service – partner selling appears to be having the same arrangement on Server 3 and Server 4. However, it doesn't stop Service – channel partners being hosted as a complete set on both the servers: Server 5 and Server 6. A system that uses a service or multiple services in a fashion mentioned in the preceding figure is called an SOA. We will discuss SOA in detail in the following sections. Let's recall the monolithic architecture. In this case, we did not use it because it restricts code reusability; it is a self-contained assembly, and all the components are interconnected and interdependent. For deployment, in this case, we will have to deploy our complete project after we select the SOA (refer to preceding image and subsequent discussion). Now, because of the use of this architectural style, we have the benefit of code reusability and easy deployment. Let's examine this in the wake of the preceding figure: Reusability: Multiple clients can consume the service. The service can also be simultaneously consumed by other services. For example, OrderService is consumed by web and mobile clients. Now, OrderService can also be used by the Reporting Dashboard UI. Stateless: Services do not persist any state between requests from the client, that is, the service doesn't know, nor care, that the subsequent request has come from the client that has/hasn't made the previous request. Contract-based: Interfaces make it technology-agnostic on both sides of implementation and consumption. It also serves to make it immune to the code updates in the underlying functionality. Scalability: A system can be scaled up; SOA can be individually clustered with appropriate load balancing. Upgradation: It is very easy to roll out new functionalities or introduce new versions of the existing functionality. The system doesn't stop you from keeping multiple versions of the same business functionality. Summary In this article, we discussed what the microservice architectural style is in detail, its history, and how it differs from its predecessors: monolithic and SOA. We further defined the various challenges that monolithic faces when dealing with large systems. Scalability and reusability are some definite advantages that SOA provides over monolithic. We also discussed the limitations of the monolithic architecture, including scaling problems, by implementing a real-life monolithic application. The microservice architecture style resolves all these issues by reducing code interdependency and isolating the dataset size that any one of the microservices works upon. We utilized dependency injection and database refactoring for this. We further explored automation, CI, and deployment. These easily allow the development team to let the business sponsor choose what industry trends to respond to first. This results in cost benefits, better business response, timely technology adoption, effective scaling, and removal of human dependency. Resources for Article: Further resources on this subject: Microservices and Service Oriented Architecture [article] Breaking into Microservices Architecture [article] Microservices – Brave New World [article]
Read more
  • 0
  • 0
  • 3249

article-image-introduction-nfrs
Packt
20 Jun 2017
14 min read
Save for later

Introduction to NFRs

Packt
20 Jun 2017
14 min read
In this article by Sameer Paradkar, the author of the book Mastering Non-Functional Requirements, we will learn the non-functional requirements are those aspects of the IT system that, while not directly affect the business functionality of the application but have a profound impact on the efficiency and effectiveness of business systems for end users as well as the people responsible for supporting the program. The definition of these requirements is an essential factor in developing a total customer solution that delivers business goals. Non-functional requirements are used primarily to drive the operational aspects of the architecture, in other words, to address major operational and technical areas of the system to ensure the robustness and ruggedness of the application. Benchmark or Proof-of-Concept can be used to verify if the implementation meets these requirements or indicate if a corrective action is necessary. Ideally, a series of tests should be planned that maps to the development schedule and grows in complexity. The topics that are covered in this article are as follows: Definition of NFRs NFR KPIs and metrics (For more resources related to this topic, see here.) Introducing NFR The following pointers state the definition of NFR: To define requirements and constraints on the IT system As a basis for cost estimates and early system sizing To assess the viability of the proposed IT system NFRs are an important determining factor of the architecture and design of the operational models As an guideline to design phase to meet NFRs such as performance, scalability, availability The NFRs foreach of the domains e.g. scalability, availability and so on,must be understood to facilitate the design and development of the target operating model. These include the servers, networks, and platforms including the application runtime environments. These are critical for the execution of benchmark tests. They also affect the design of technical and application components. End users have expectations about the effectiveness of the application. These characteristics include ease of software use, speed, reliability, and recoverability when unexpected conditions arise. The NFRs define these aspects of the IT system. The non-functional requirements should be defined precisely and involves quantifying them. NFRs should provide measurements the application must meet. For example, the maximum number of time allowed to execute a process, the number of hours in a day an application must be available, the maximum size of a database on disk, and the number of concurrent users supported are typical NFRs the software must implement. Figure 1: Key Non-Functional Requirements There are many kinds of non-functional requirements, including: Performance Performance is the responsiveness of the application to perform specific actions in a given time span. Performance is scored in terms of throughput or latency. Latency is the time taken by the application to respond to an event. Throughput is the number of events scored in a given time interval. An application’s performance can directly impact its scalability. Enhancing application’s performance often enhances scalability by reducing contention for shared resources. Performance attributes specify the timing characteristics of the application. Certain features are more time-sensitive than others; the NFRs should identify such software tasks that have constraints on their performance. Response time relates to the time needed to complete specific business processes, batch or interactive, within the target business system. The system must be designed to fulfil the agreed upon response time requirements, while supporting the defined workload mapped against the given static baseline, on a system platform that does not exceed the stated utilization. The following attributes are: Throughput: The ability of the system to execute a given number of transactions within a given unit of time Response times: The distribution of time which the system takes to respond to the request Scalability Scalability is the ability to handle an increase in the work load without impacting the performance, or the ability to quickly expand the architecture. Itis the ability to expand the architecture to accommodate more users, more processes, more transactions, additional systems and services as the business requirements change and the systems evolve to meet the future business demands. This permits existing systems to be extended without replacing them. Thisdirectly affects the architecture and the selection of software components and hardware. The solution must allow the hardware and the deployed software services and components to be scaled horizontally as well as vertically. Horizontal scaling involves replicating the same functionality across additional nodes vertical scaling involves the same functionality across bigger and more powerful nodes. Scalability definitions measure volumes of users and data the system should support. There are two key techniques for improving both vertical and horizontal scalability. Vertical Scaling is also known as scaling up and includes adding more resources such as memory, CPUand hard disk to a system. Horizontal scaling is also know as scaling out and includes adding more nodes to a cluster forwork load sharing. The following attributes are: Throughput: Number of maximum transactions your system needs to handle. E.g., thousand a day or A million Storage: Amount  of data you going to need to store Growth requirements: Data growth in the next 3-5 years Availability Availability is the time frame in which the system functions normally and without failures. Availability is measured as the percentage of total application downtime over a defined time period. Availability is affected by failures, exceptions, infrastructure issues, malicious attacks, and maintenance and upgrades. It is the uptime or the amount of time the system is operational and available for use. This is specified because some systems are architected with expected downtime for activities like database upgrades and backups. Availability also conveys the number of hours or days per week or weeks per year the application will be available to its end customers, as well as how rapidly it can recover from faults. Since the architecture establishes software, hardware, and networking entities, this requirement extends to all of them. Hardware availability, recoverability, and reliability definitions measure system up-time. For example, it is specified in terms of mean time between failures or “MTBF”. The following attributes are: Availability: Application availability considering the weekends, holidays and maintenance times and failures. Locations of operation: Geographic location, Connection requirements and the restrictions of the network prevail. Offline Requirement: Time available for offline operations including batch processing & system maintenance. Length of time between failures Recoverability: Time required by the system can resume operation in the event of failure. Resilience: The reliability characteristics of the system and sub-components Capacity This non-functional requirement defines the ways in which the system is expected to scale-up by increasing capacity, hardware or adding machines based on business objectives. Capacity is delivering enough functionality required for the end users.  A request for a web service to provide 1,000 requests per second when the server is only capable of 100 requests a second, may not succeed.  While this sounds like an availability issue, it occurs because the server is unable to handle the requisite capacity. A single node may not be able to provide enough capacity, and one needs to deploy multiple nodes with a similar configuration to meet organizational capacity requirements. Capacity to identify a failing node and restart it on another machine or VM is a non-functional requirement. The following attributes are: Throughput is the number of peak transactions the system needs to handle Storage: Volume of data the system can persist at run time to disk and relates to the memory/disk Year-on-yeargrowthrequirements (users, processing and storage) e-channel growth projections Different types of things (for example, activities or transactions supported, and so on) For each type of transaction, volumes on an hourly, daily, weekly, monthly, and so on During the specific time of the day (for example, at lunch), week, month or year are volumes significantly higher Transaction volume growth expected and additional volumes you will be able to handle Security Security is the ability of an application to avoid malicious incidences and events outside of the designed system usage, and prevent disclosure or loss of information. Improving security increases the reliability of application by reducing the likelihood of an attack succeeding and impairing operations. Adding security controls protects assets and prevents unauthorized access and manipulation of critical information. The factors that affect an application security are confidentiality and integrity. The key security controls used to secure systems are authorization, authentication, encryption, auditing, and logging. Definition and monitoring of effectiveness in meeting the security requirements of the system, for example, to avoid financial harm in accounting systems, is critical. Integrityrequirements are restrictingaccess to functionality or data to certain users and protecting the privacyof data entered into the software. The following attributes are: Authentication: Correct identification of parties attempting to access systems and protection of systems from unauthorized parties Authorization: Mechanism required to authorize users to perform different functions within the systems Encryption(data at rest or data in flight): All external communications between the data server and clients must beencrypted Data confidentiality: All data must be protectively marked, stored and protected Compliance: The process to confirm systems compliance with the organization's security standards and policies  Maintainability Maintainability is the ability of any application to go through modifications and updates with a degree of ease. This is the degree of flexibility with which the application can be modified, whether for bug fixes or to update functionality. These changes may impact any of the components, services, functionality, or interfaces in the application landscape while modifying to fix errors, or to meet changing business requirements. This is also a degree of time it takes to restore the system to its normal state following a failure or fault. Improving maintainability can improve the availability and reduce the run-time defects. Application’s maintainability is dependent on the overall quality attributes. It is critical as a large chunk of the IT budget is spent on maintenance of systems. The more maintainable a system is the lower the total cost of ownership. The following attributes are: Conformance to design standards, coding standards, best practices, reference architectures, and frameworks. Flexibility: The degree to which the system is intended to support change Release support: The way in which the system supports the introduction of initial release, phased rollouts and future releases Manageability Manageability is the ease with which the administrators can manage the application, through useful instrumentation exposed for monitoring. It is the ability of the system or the group of the system to provide key information to the operations and support team to be able to debug, analyze and understand the root cause of failures. It deals with compliance/governance with the domain frameworks and polices. The key is to design the application that is easy to manage, by exposing useful instrumentation for monitoring systems and for understanding the cause of failures. The following attributes are: System must maintain total traceability of transactions Businessobjectsand database fields are part of auditing User and transactional timestamps. File characteristics include size before, size after and structure Getting events and alerts as thresholds (for example, memory, storage, processor) are breached Remotely manage applications and create new virtual instances at the click of a button Rich graphical dashboard for all key applications metrics and KPI Reliability Reliability is the ability of the application to maintain its integrity and veracity over a time span and also in the event of faults or exceptions. It is measured as the probability that the software will not fail and that it will continue functioning for a defined time interval. It alsospecifies the ability of the system to maintain its performance over a time span. Unreliable software is prone to failures anda few processes may be more sensitive to failure than others, because such processes may not be able to recover from a fault or exceptions. The following attributes are: The characteristic of a system to perform its functions under stated conditions for a specificperiod of time. Mean Time To Recovery: Time is available to get the system back up online. Mean Time Between Failures – Acceptable threshold for downtime Data integrity is also known as referential integrity in database tables and interfaces Application Integrity and Information Integrity: during transactions Fault trapping (I/O): Handling failures and recovery Extensibility Extensibility is the ability of a system to cater to future changes through flexible architecture, design or implementation. Extensible applications have excellent endurance, which prevents the expensive processes of procuring large inflexible applications and retiring them due to changes in business needs. Extensibility enables organizations to take advantage of opportunities and respond to risks. While there is a significant difference extensibility is often tangled with modifiability quality. Modifiability means that is possible to change the software whereas extensibility means that change has been planned and will be effortless. Adaptability is at times erroneously leveraged with extensibility. However, adaptability deals with how the user interactions with the system are managed and governed. Extensibilityallows a system, people, technology, information, and processes all working together to achieve following objectives: The following attributes are: Handle new information types Manage new or changed business entities Consume or provide new feeds Recovery In the event of a natural calamity for example, flood or hurricane, the entire facility where the application is hosted may become inoperable or inaccessible. Business-critical applications should have a strategy to recover from such disasters within a reasonable amount of time frame. The solution implementing various processes must be integrated with the existing enterprise disaster recovery plan. The processes must be analysed to understand the criticality of each process to the business, the impact of loss to the business in case of non-availability of the process. Based on this analysis, appropriate disaster procedures must be developed, and plans should be outlined. As part of disaster recovery, electronic backups of data and procedures must be maintained at the recovery location and be retrievable within the appropriate time frames for system function restoration. In the case of high criticality, real-time mirroring to a mirror site should be deployed. The following attributes are: Recoveryprocess: Recovery Time Objectives(RTO) / Recovery Point Objectives(RPO) Restore time: Time required switching to the secondary site when the primary fails RPO/Backup time: Time it takes to back your data Backup frequencies: Frequency of backing-up the transaction data, configuration data and code Interoperability Interoperability is the ability to exchange information and communicate with internal and external applications and systems. Interoperable systems make it easier to exchange information both internally and externally. The data formats, transport protocols and interfaces are the key attributes for architecting interoperable systems. Standardization of data formats, transport protocols and interfaces are the key aspect to be considered when architecting interoperable system. Interoperability is achieved through: Publishing and describing interfaces Describing the syntax used to communicate Describing the semantics of information it produces and consumes Leveraging open standards to communicate with external systems Loosely coupled with external systems The following attributes are: Compatibility with shared applications: Other system it needs to integrate Compatibility with 3rd party applications: Other systems it has to live with amicably Compatibility with various OS: Different OS compatibility Compatibility on different platforms: Hardware platforms it needs to work on Usability Usability measures characteristics such as consistency and aesthetics in the user interface. Consistency is the constant use of mechanisms employed in the user interface while Aesthetics refers to the artistic, visual quality of the user interface. It is the ease at which the users operate the system and make productive use of it. Usability is discussed with relation to the system interfaces, but it can just as well be applied to any tool, device, or rich system. This addresses the factors that establish the ability of the software to be understood, used, and learned by its intended users. The application interfaces must be designed with end users in mind so that they are intuitive to use, are localized, provide access for differently abled users, and provide an excellent overall user experience. The following attributes are: Look and feel standards: Layout and flow, screen element density, keyboard shortcuts, UI metaphors, colors. Localization/Internationalization requirements: Keyboards, paper sizes, languages, spellings, and so on Summary It explains he introduction of NFRs and why NFRs are a critical for building software systems. The article also explained various KPI for each of the key of NFRs i.e. scalability, availability, reliability and do on.  Resources for Article: Further resources on this subject: Software Documentation with Trac [article] The Software Task Management Tool - Rake [article] Installing Software and Updates [article]
Read more
  • 0
  • 0
  • 3984

article-image-understanding-basics-rxjava
Packt
20 Jun 2017
15 min read
Save for later

Understanding the Basics of RxJava

Packt
20 Jun 2017
15 min read
In this article, by Tadas Subonis author of the book Reactive Android Programming, will go through the core basics of RxJava so that we can fully understand what it is, what are the core elements, and how they work. Before that, let's take a step back and briefly discuss how RxJava is different from other approaches. RxJava is about reacting to results. It might be an item that originated from some source. It can also be an error. RxJava provides a framework to handle these items in a reactive way and to create complicated manipulation and handling schemes in a very easy-to-use interface. Things like waiting for an arrival of an item before transforming it become very easy with RxJava. To achieve all this, RxJava provides some basic primitives: Observables: A source of data Subscriptions: An activated handle to the Observable that receives data Schedulers: A means to define where (on which Thread) the data is processed First of all, we will cover Observables--the source of all the data and the core structure/class that we will be working with. We will explore how are they related to Disposables (Subscriptions). Furthermore, the life cycle and hook points of an Observable will be described, so we will actually know what's happening when an item travels through an Observable and what are the different stages that we can tap into. Finally, we will briefly introduce Flowable--a big brother of Observable that lets you handle big amounts of data with high rates of publishing. To summarize, we will cover these aspects: What is an Observable? What are Disposables (formerly Subscriptions)? How items travel through the Observable? What is backpressure and how we can use it with Flowable? Let's dive into it! (For more resources related to this topic, see here.) Observables Everything starts with an Observable. It's a source of data that you can observe for emitted data (hence the name). In almost all cases, you will be working with the Observable class. It is possible to (and we will!) combine different Observables into one Observable. Basically, it is a universal interface to tap into data streams in a reactive way. There are lots of different ways of how one can create Observables. The simplest way is to use the .just() method like we did before: Observable.just("First item", "Second item"); It is usually a perfect way to glue non-Rx-like parts of the code to Rx compatible flow. When an Observable is created, it is not usually defined when it will start emitting data. If it was created using simple tools such as.just(), it won't start emitting data until there is a subscription to the observable. How do you create a subscription? It's done by calling .subscribe() : Observable.just("First item", "Second item") .subscribe(); Usually (but not always), the observable be activated the moment somebody subscribes to it. So, if a new Observable was just created, it won't magically start sending data "somewhere". Hot and Cold Observables Quite often, in the literature and documentation terms, Hot and Cold Observables can be found. Cold Observable is the most common Observable type. For example, it can be created with the following code: Observable.just("First item", "Second item") .subscribe(); Cold Observable means that the items won't be emitted by the Observable until there is a Subscriber. This means that before the .subscribe() is called, no items will be produced and thus none of the items that are intended to be omitted will be missed, everything will be processed. Hot Observable is an Observable that will begin producing (emitting) items internally as soon as it is created. The status updates are produced constantly and it doesn't matter if there is something that is ready to receive them (like Subscription). If there were no subscriptions to the Observable, it means that the updates will be lost. Disposables A disposable (previously called Subscription in RxJava 1.0) is a tool that can be used to control the life cycle of an Observable. If the stream of data that the Observable is producing is boundless, it means that it will stay active forever. It might not be a problem for a server-side application, but it can cause some serious trouble on Android. Usually, this is the common source of memory leaks. Obtaining a reference to a disposable is pretty simple: Disposable disposable = Observable.just("First item", "Second item") .subscribe(); Disposable is a very simple interface. It has only two methods: dispose() and isDisposed() .  dispose() can be used to cancel the existing Disposable (Subscription). This will stop the call of .subscribe()to receive any further items from Observable, and the Observable itself will be cleaned up. isDisposed() has a pretty straightforward function--it checks whether the subscription is still active. However, it is not used very often in regular code as the subscriptions are usually unsubscribed and forgotten. The disposed subscriptions (Disposables) cannot be re-enabled. They can only be created anew. Finally, Disposables can be grouped using CompositeDisposable like this: Disposable disposable = new CompositeDisposable( Observable.just("First item", "Second item").subscribe(), Observable.just("1", "2").subscribe(), Observable.just("One", "Two").subscribe() ); It's useful in the cases when there are many Observables that should be canceled at the same time, for example, an Activity being destroyed. Schedulers As described in the documentation, a scheduler is something that can schedule a unit of work to be executed now or later. In practice, it means that Schedulers control where the code will actually be executed and usually that means selecting some kind of specific thread. Most often, Subscribers are used to executing long-running tasks on some background thread so that it wouldn't block the main computation or UI thread. This is especially relevant on Android when all long-running tasks must not be executed on MainThread. Schedulers can be set with a simple .subscribeOn() call: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .subscribe(); There are only a few main Schedulers that are commonly used: Schedulers.io() Schedulers.computation() Schedulers.newThread() AndroidSchedulers.mainThread() The AndroidSchedulers.mainThread() is only used on Android systems. Scheduling examples Let's explore how schedulers work by checking out a few examples. Let's run the following code: Observable.just("First item", "Second item") .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); The output will be as follows: on-next:main:First item subscribe:main:First item on-next:main:Second item subscribe:main:Second item Now let's try changing the code to as shown: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); Now, the output should look like this: on-next:RxCachedThreadScheduler-1:First item subscribe:RxCachedThreadScheduler-1:First item on-next:RxCachedThreadScheduler-1:Second item subscribe:RxCachedThreadScheduler-1:Second item We can see how the code was executed on the main thread in the first case and on a new thread in the next. Android requires that all UI modifications should be done on the main thread. So, how can we execute a long-running process in the background but process the result on the main thread? That can be done with .observeOn() method: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .observeOn(AndroidSchedulers.mainThread()) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); The output will be as illustrated: on-next:RxCachedThreadScheduler-1:First item on-next:RxCachedThreadScheduler-1:Second item subscribe:main:First item subscribe:main:Second item You will note that the items in the doOnNext block were executed on the "RxThread", and the subscribe block items were executed on the main thread. Investigating the Flow of Observable The logging inside the steps of an Observable is a very powerful tool when one wants to understand how they work. If you are in doubt at any point as to what's happening, add logging and experiment. A few quick iterations with logs will definitely help you understand what's going on under the hood. Let's use this technique to analyze a full flow of an Observable. We will start off with this script: private void log(String stage, String item) { Log.d("APP", stage + ":" + Thread.currentThread().getName() + ":" + item); } private void log(String stage) { Log.d("APP", stage + ":" + Thread.currentThread().getName()); } Observable.just("One", "Two") .subscribeOn(Schedulers.io()) .doOnDispose(() -> log("doOnDispose")) .doOnComplete(() -> log("doOnComplete")) .doOnNext(e -> log("doOnNext", e)) .doOnEach(e -> log("doOnEach")) .doOnSubscribe((e) -> log("doOnSubscribe")) .doOnTerminate(() -> log("doOnTerminate")) .doFinally(() -> log("doFinally")) .observeOn(AndroidSchedulers.mainThread()) .subscribe(e -> log("subscribe", e)); It can be seen that it has lots of additional and unfamiliar steps (more about this later). They represent different stages during the processing of an Observable. So, what's the output of the preceding script?: doOnSubscribe:main doOnNext:RxCachedThreadScheduler-1:One doOnEach:RxCachedThreadScheduler-1 doOnNext:RxCachedThreadScheduler-1:Two doOnEach:RxCachedThreadScheduler-1 doOnComplete:RxCachedThreadScheduler-1 doOnEach:RxCachedThreadScheduler-1 doOnTerminate:RxCachedThreadScheduler-1 doFinally:RxCachedThreadScheduler-1 subscribe:main:One subscribe:main:Two doOnDispose:main Let's go through some of the steps. First of all, by calling .subscribe() the doOnSubscribe block was executed. This started the emission of items from the Observable as we can see on the doOnNext and doOnEach lines. Finally, the stream finished at termination life cycle was activated--the doOnComplete, doOnTerminate and doOnFinally. Also, the reader will note that the doOnDispose block was called on the main thread along with the subscribe block. The flow will be a little different if .subscribeOn() and .observeOn() calls won't be there: doOnSubscribe:main doOnNext:main:One doOnEach:main subscribe:main:One doOnNext:main:Two doOnEach:main subscribe:main:Two doOnComplete:main doOnEach:main doOnTerminate:main doOnDispose:main doFinally:main You will readily note that now, the doFinally block was executed after doOnDispose while in the former setup, doOnDispose was the last. This happens due to the way Android Looper schedulers code blocks for execution and the fact that we used two different threads in the first case. The takeaway here is that whenever you are unsure of what is going on, start logging actions (and the thread they are running on) to see what's actually happening. Flowable Flowable can be regarded as a special type of Observable (but internally it isn't). It has almost the same method signature like the Observable as well. The difference is that Flowable allows you to process items that emitted faster from the source than some of the following steps can handle. It might sound confusing, so let's analyze an example. Assume that you have a source that can emit a million items per second. However, the next step uses those items to do a network request. We know, for sure, that we cannot do more than 50 requests per second: That poses a problem. What will we do after 60 seconds? There will be 60 million items in the queue waiting to be processed. The items are accumulating at a rate of 1 million items per second between the first and the second steps because the second step processes them at a much slower rate. Clearly, the problem here is that the available memory will be exhausted and the programming will fail with an OutOfMemory (OOM) exception. For example, this script will cause an excessive memory usage because the processing step just won't be able to keep up with the pace the items are emitted at. PublishSubject<Integer> observable = PublishSubject.create(); observable .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); for (int i = 0; i < 1000000; i++) { observable.onNext(i); } private void log(Throwable throwable) { Log.e("APP", "Error", throwable); } By converting this to a Flowable, we can start controlling this behavior: observable.toFlowable(BackpressureStrategy.MISSING) .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); Since we have chosen not to specify how we want to handle items that cannot be processed (it's called Backpressuring), it will throw a MissingBackpressureException. However, if the number of items was 100 instead of a million, it would have been just fine as it wouldn't hit the internal buffer of Flowable. By default, the size of the Flowable queue (buffer) is 128. There are a few Backpressure strategies that will define how the excessive amount of items should be handled. Drop Items Dropping means that if the downstream processing steps cannot keep up with the pace of the source Observable, just drop the data that cannot be handled. This can only be used in the cases when losing data is okay, and you care more about the values that were emitted in the beginning. There are a few ways in which items can be dropped. The first one is just to specify Backpressure strategy, like this: observable.toFlowable(BackpressureStrategy.DROP) Alternatively, it will be like this: observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureDrop() A similar way to do that would be to call .sample(). It will emit items only periodically, and it will take only the last value that's available (while BackpressureStrategy.DROP drops it instantly unless it is free to push it down the stream). All the other values between "ticks" will be dropped: observable.toFlowable(BackpressureStrategy.MISSING) .sample(10, TimeUnit.MILLISECONDS) .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); Preserve Latest Item Preserving the last items means that if the downstream cannot cope with the items that are being sent to them, stop emitting values and wait until they become available. While waiting, keep dropping all the values except the last one that arrived and when the downstream becomes available to send the last message that's currently stored. Like with Dropping, the "Latest" strategy can be specified while creating an Observable: observable.toFlowable(BackpressureStrategy.LATEST) Alternatively, by calling .onBackpressure(): observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureLatest() Finally, a method, .debounce(), can periodically take the last value at specific intervals: observable.toFlowable(BackpressureStrategy.MISSING) .debounce(10, TimeUnit.MILLISECONDS) Buffering It's usually a poor way to handle different paces of items being emitted and consumed as it often just delays the problem. However, this can work just fine if there is just a temporal slowdown in one of the consumers. In this case, the items emitted will be stored until later processing and when the slowdown is over, the consumers will catch up. If the consumers cannot catch up, at some point the buffer will run out and we can see a very similar behavior to the original Observable with memory running out. Enabling buffers is, again, pretty straightforward by calling the following: observable.toFlowable(BackpressureStrategy.BUFFER) or observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureBuffer() If there is a need to specify a particular value for the buffer, one can use .buffer(): observable.toFlowable(BackpressureStrategy.MISSING) .buffer(10) Completable, Single, and Maybe Types Besides the types of Observable and Flowable, there are three more types that RxJava provides: Completable: It represents an action without a result that will be completed in the future Single: It's just like Observable (or Flowable) that returns a single item instead of a stream Maybe: It stands for an action that can complete (or fail) without returning any value (like Completable) but can also return an item like Single However, all these are used quite rarely. Let's take a quick look at the examples. Completable Since Completable can basically process just two types of actions--onComplete and onError--we will cover it very briefly. Completable has many static factory methods available to create it but, most often, it will just be found as a return value in some other libraries. For example, the Completable can be created by calling the following: Completable completable = Completable.fromAction(() -> { log("Let's do something"); }); Then, it is to be subscribed with the following: completable.subscribe(() -> { log("Finished"); }, throwable -> { log(throwable); }); Single Single provides a way to represent an Observable that will return just a single item (thus the name). You might ask, why it is worth having it at all? These types are useful to tell the developers about the specific behavior that they should expect. To create a Single, one can use this example: Single.just("One item") The Single and the Subscription to it can be created with the following: Single.just("One item") .subscribe((item) -> { log(item); }, (throwable) -> { log(throwable); }); Make a note that this differs from Completable in that the first argument to the .subscribe() action now expects to receive an item as a result. Maybe Finally, the Maybe type is very similar to the Single type, but the item might not be returned to the subscriber in the end. The Maybe type can be created in a very similar fashion as before: Maybe.empty(); or like Maybe.just("Item"); However, the .subscribe() can be called with arguments dedicated to handling onSuccess (for received items), onError (to handle errors), and onComplete (to do a final action after the item is handled): Maybe.just("Item") .subscribe( s -> log("success: " + s), throwable -> log("error"), () -> log("onComplete") ); Summary In this article, we covered the most essentials parts of RxJava. Resources for Article: Further resources on this subject: The Art of Android Development Using Android Studio [article] Drawing and Drawables in Android Canvas [article] Optimizing Games for Android [article]
Read more
  • 0
  • 0
  • 4983
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $15.99/month. Cancel anytime
article-image-manipulating-functions-functional-programming
Packt
20 Jun 2017
6 min read
Save for later

Manipulating functions in functional programming

Packt
20 Jun 2017
6 min read
In this article by Wisnu Anggoro, author of the book Learning C++ Functional Programming, you will learn to apply functional programming techniques to C++ to build highly modular, testable, and reusable code. In this article, you will learn the following topics: Applying a first-class function in all functions Passing a function as other functions parameter Assigning a function to a variable Storing a function in the container (For more resources related to this topic, see here.) Applying a first-class function in all functions There's nothing special with the first-class function since it's a normal class. We can treat the first-class function like any other data type. However, in the language that supports the first-class function, we can perform the following tasks without invoking the compiler recursively: Passing a function as other function parameters Assigning functions to a variable Storing functions in collections Fortunately, C++ can be used to solve the preceding tasks. We will discuss it in depth in the following topics. Passing a function as other functions parameter Let's start passing a function as the function parameter. We will choose one of four functions and invoke the function from its main function. The code will look as follows: /* first-class-1.cpp */ #include <functional> #include <iostream> using namespace std; typedef function<int(int, int)> FuncType; int addition(int x, int y) { return x + y; } int subtraction(int x, int y) { return x - y; } int multiplication(int x, int y) { return x * y; } int division(int x, int y) { return x / y; } void PassingFunc(FuncType fn, int x, int y) { cout << "Result = " << fn(x, y) << endl; } int main() { int i, a, b; FuncType func; cout << "Select mode:" << endl; cout << "1. Addition" << endl; cout << "2. Subtraction" << endl; cout << "3. Multiplication" << endl; cout << "4. Division" << endl; cout << "Choice: "; cin >> i; cout << "a = "; cin >> a; cout << "b = "; cin >> b; switch(i) { case 1: PassingFunc(addition, a, b); break; case 2: PassingFunc(subtraction, a, b); break; case 3: PassingFunc(multiplication, a, b); break; case 4: PassingFunc(division, a, b); break; } return 0; } From the preceding code, we can see that we have four functions, and we want the user to choose one from them, and then run it. In the switch statement, we will invoke one of the four functions based on the choice of the user. We will pass the selected function to PassingFunc(), as we can see in the following code snippet: case 1: PassingFunc(addition, a, b); break; case 2: PassingFunc(subtraction, a, b); break; case 3: PassingFunc(multiplication, a, b); break; case 4: PassingFunc(division, a, b); break; The result we get on the screen should look like the following screenshot: The preceding screenshot shows that we selected the Subtraction mode and gave 8 to a and 3 to b. As we expected, the code gives us 5 as a result. Assigning a function to variable We can also assign a function to the variable so that we can call the function by calling the variable. We will refactor first-class-1.cpp, and it will be as follows: /* first-class-2.cpp */ #include <functional> #include <iostream> using namespace std; // Adding the addition, subtraction, multiplication, and // division function as we've got in first-class-1.cpp int main() { int i, a, b; function<int(int, int)> func; cout << "Select mode:" << endl; cout << "1. Addition" << endl; cout << "2. Subtraction" << endl; cout << "3. Multiplication" << endl; cout << "4. Division" << endl; cout << "Choice: "; cin >> i; cout << "a = "; cin >> a; cout << "b = "; cin >> b; switch(i) { case 1: func = addition; break; case 2: func = subtraction; break; case 3: func = multiplication; break; case 4: func = division; break; } cout << "Result = " << func(a, b) << endl; return 0; } We will now assign the four functions based on the user choice. We will store the selected function in func variable inside the switch statement, as follows: case 1: func = addition; break; case 2: func = subtraction; break; case 3: func = multiplication; break; case 4: func = division; break; After the func variable is assigned with the user's choice, the code will just call the variable like calling the function as follows: cout << "Result = " << func(a, b) << endl; Moreover, we will obtain the same output on the console if we run the code. Storing a function in the container Now, let's save the function to the container. Here, we will use the vector as the container. The code is as follows: /* first-class-3.cpp */ #include <vector> #include <functional> #include <iostream> using namespace std; typedef function<int(int, int)> FuncType; // Adding the addition, subtraction, multiplication, and // division function as we've got in first-class-1.cpp int main() { vector<FuncType> functions; functions.push_back(addition); functions.push_back(subtraction); functions.push_back(multiplication); functions.push_back(division); int i, a, b; function<int(int, int)> func; cout << "Select mode:" << endl; cout << "1. Addition" << endl; cout << "2. Subtraction" << endl; cout << "3. Multiplication" << endl; cout << "4. Division" << endl; cout << "Choice: "; cin >> i; cout << "a = "; cin >> a; cout << "b = "; cin >> b; cout << "Result = " << functions.at(i - 1)(a, b) << endl; return 0; } From the preceding code, we can see that we created a new vector named functions, then stored four different functions to it. Same with our two previous code samples, we ask the user to select the mode as well. However, now the code becomes simpler since we don't need to add the switch statement as we can select the function directly by selecting the vector index, as we can see in the following line of code: cout << "Result = " << functions.at(i - 1)(a, b) << endl; However, since the vector is a zero-based index, we have to adjust the index with the menu choice. The result will be the same with our two previous code samples. Summary In this article, we discussed that there are some techniques to manipulate a function to produce the various purpose on it. Since we can implement the first-class function in C++ language, we can pass a function as other functions parameter. We can treat a function as a data object so we can assign it to a variable and store it in the container. Resources for Article: Further resources on this subject: Introduction to the Functional Programming [article] Functional Programming in C# [article] Putting the Function in Functional Programming [article]
Read more
  • 0
  • 0
  • 1926

article-image-thread-synchronization-and-communication
Packt
19 Jun 2017
20 min read
Save for later

Thread synchronization and communication

Packt
19 Jun 2017
20 min read
In this article by Maya Posch, the author of the book Mastering C++ Multithreading, we will learn to work through and understand a basic multithreaded C++ application. While generally threads are used to work on a task more or less independently from other threads, there are many occasions where one would want to pass data between threads, or even control other threads, such as from a central task scheduler thread. This article looks at how such tasks are accomplished. Topics covered in this article include: Use of mutexes, locks and similar synchronization structures. The use of condition variables and signals to control threads. Safely passing and sharing data between threads. (For more resources related to this topic, see here.) Safety first The central problem with concurrency is that of ensuring safe access to shared resources, including when communicating between threads. There is also the issue of threads being able to communicate and synchronize themselves. What makes multithreaded programming such a challenge is to be able to keep track of each interaction between threads, ensure that each and every form of access is secured while not falling into the traps of deadlocks and data races. In this article we will look at a fairly complex example involving a task scheduler. This is a form of high-concurrency, high-throughput situation where many different requirements come together, with many potential traps, as we will see in a moment. The scheduler A good example of multithreading with a significant amount of synchronization and communication between threads is the scheduling of tasks. Hereby the goal is to accept incoming tasks and assign them to work threads as quickly as possible. Hereby a number of different approaches are possible. Often one has worker threads running in an active loop, constantly polling a central queue for new tasks. Disadvantages of this approach include the wasting of processor cycles on said polling and the congestion which forms at the synchronization mechanism used, generally a mutex.Furthermore, this active polling approach scales very poorly when the number of worker threads increase. Ideally, each worker thread would idly wait until it is needed again. To accomplish this, we have to approach the problem from the other side: not from the perspective of the worker threads, but of that of the queue. Much like the scheduler of of an operating system, it is the scheduler which is aware of both tasks which require processing, as well as available worker threads. In this approach, a central scheduler instance would accept new tasks and actively assign them to worker threads. Said scheduler instance may also manage these worker threads, such as their number and priority depending on the number of incoming tasks and the type of task or other properties. High-level view At its core, our scheduleror dispatcher is quite simple, functioning like a queue with all of the scheduling logic built into it: As one can see from the high-level view, there really isn't much to it. As we'll see in a moment, the actual implementation does however have a number of complications. Implementation As is usual, we start off with the main function, contained in main.cpp: #include "dispatcher.h" #include "request.h" #include <iostream> #include <string> #include <csignal> #include <thread> #include <chrono> using namespace std; sig_atomic_t signal_caught = 0; mutex logMutex; The custom headers we include are those for our dispatcher implementation, as well as the Request class we'll be using. Globally we define an atomic variable to be used with the signal handler, as well as a mutex which will synchronize the output (on the standard output) from our logging method. void sigint_handler(int sig) { signal_caught = 1; } Our signal handler function (for SIGINT signals) simply sets the global atomic variable we defined earlier. void logFnc(string text) { logMutex.lock(); cout << text <<"n"; logMutex.unlock(); } In our logging function we use the global mutex to ensure writing to the standard output is synchronized. int main() { signal(SIGINT, &sigint_handler); Dispatcher::init(10); In the main function we install the signal handler for SIGINT to allow us to interrupt the execution of the application. We also call the static init() function on the Dispatcher class to initialize it. cout <<"Initialised.n"; int cycles = 0; Request* rq = 0; while (!signal_caught && cycles < 50) { rq = new Request(); rq->setValue(cycles); rq->setOutput(&logFnc); Dispatcher::addRequest(rq); cycles++; } Next we set up the loop in which we will create new requests. In each cycle we create a new Request instance and use its setValue() function to set an integer value (current cycle number). We also set our logging function on the request instance before adding this new request to the Dispatcher using its static addRequest() function. This loop will continue until the maximum number of cycles have been reached, or SIGINT has been signaled, using Ctrl+C or similar. his_thread::sleep_for(chrono::seconds(5)); Dispatcher::stop(); cout <<"Clean-up done.n"; return 0; } Finally we wait for five seconds, using the thread's sleep_for() function and the chrono::seconds() function from the chrono STL header. We also call the stop() function on the Dispatcher before returning. Request class A request for the Dispatcher always derives from the pure virtual AbstractRequest class: #pragma once #ifndef ABSTRACT_REQUEST_H #define ABSTRACT_REQUEST_H class AbstractRequest { // public: virtual void setValue(int value) = 0; virtual void process() = 0; virtual void finish() = 0; }; #endif This class defines an API with three functions which a deriving class always has to implement, of which the process() and finish() functions are the most generic and likely to be used in any practical implementation. The setValue() function is specific to this demonstration implementation and would likely be adapted or extended to fit a real-life scenario. The advantage of using an abstract class as the basis for a request is that it allows the Dispatcher class to handle many different types of requests, as long as they all adhere to this same basic API. Using this abstract interface, we implement a basic Request class: #pragma once #ifndef REQUEST_H #define REQUEST_H #include "abstract_request.h" #include <string> using namespace std; typedef void (*logFunction)(string text); class Request : public AbstractRequest { int value; logFunction outFnc; public: void setValue(int value) { this->value = value; } void setOutput(logFunction fnc) { outFnc = fnc; } void process(); void finish(); }; #endif In its header file we first define the logging function pointer's format. After this we implement the request API, adding the setOutput() function to the base API, which accepts a function pointer for logging. Both setter functions merely assign the provided parameter to their respective private class members. Next, the class function implementations: #include "request.h" void Request::process() { outFnc("Starting processing request " + std::to_string(value) + "..."); // } void Request::finish() { outFnc("Finished request " + std::to_string(value)); }Both of these implementations are very basic, merely using the function pointer to output a string indicating the status of the worker thread. In a practical implementation, one would add the business logic to the process()function, with the finish() function containing any functionality to finish up a request, such as writing a map into a string. Worker class Next, the Worker class. This contains the logic which will be called by the dispatcher in order to process a request: #pragma once #ifndef WORKER_H #define WORKER_H #include "abstract_request.h" #include <condition_variable> #include <mutex> using namespace std; class Worker { condition_variable cv; mutex mtx; unique_lock<mutex> ulock; AbstractRequest* request; bool running; bool ready; public: Worker() { running = true; ready = false; ulock = unique_lock<mutex>(mtx); } void run(); void stop() { running = false; } void setRequest(AbstractRequest* request) { this->request = request; ready = true; } void getCondition(condition_variable* &cv); }; #endif Whereas the adding of a request to the dispatcher does not require any special logic, the Worker class does require the use of condition variables to synchronize itself with the dispatcher. For the C++11 threads API, this requires a condition variable, a mutex and a unique lock. The unique lock encapsulates the mutex and will ultimately be used with the condition variable as we will see in a moment. Beyond this we define methods to start and stop the worker, to set a new request for processing and to obtain access to its internal condition variable. Moving on, the rest of the implementation: #include "worker.h" #include "dispatcher.h" #include <chrono> using namespace std; void Worker::getCondition(condition_variable* &cv) { cv = &(this)->cv; } void Worker::run() { while (running) { if (ready) { ready = false; request->process(); request->finish(); } if (Dispatcher::addWorker(this)) { // Use the ready loop to deal with spurious wake-ups. while (!ready && running) { if (cv.wait_for(ulock, chrono::seconds(1)) == cv_status::timeout) { // We timed out, but we keep waiting unless // the worker is // stopped by the dispatcher. } } } } } Beyond the getter function for the condition variable, we define the run() function, which the dispatcher will run for each worker thread upon starting it. Its main loop merely checks that the stop() function hasn't been called yet, which would have set the running boolean value to false and ended the work thread. This is used by the dispatcher when shutting down, allowing it to terminate the worker threads. Since boolean values are generally atomic, setting and checking can be done simultaneously without risk or requiring a mutex. Moving on, the check of the ready variable is to ensure that a request is actually waiting when the thread is first run. On the first run of the worker thread, no request will be waiting and thus attempting to process one would result in a crash. Upon the dispatcher setting a new request, this boolean variable will be set to true. If a request is waiting, the ready variable will be set to false again, after which the request instance will have its process() and finish() functions called. This will run the business logic of the request on the worker thread's thread and finalize it. Finally, the worker thread adds itself to the dispatcher using its static addWorker() function. This function will return false if no new request was available, causing the worker thread to wait until a new request has become available. Otherwise the worker thread will continue with the processing of the new request that the dispatcher will have set on it. If asked to wait, we enter a new loop which will ensure that upon waking up from waiting for the condition variable to be signaled, we woke up because we got signaled by the dispatcher (ready variable set to true), and not because of a spurious wake-up. Last of all, we enter the actual wait() function of the condition variable, using the unique lock instance we created before, along with a timeout. If a timeout occurs, we can either terminate the thread, or keep waiting. Here we choose to do nothing and just re-enter the waiting loop. Dispatcher As the last item, we have the Dispatcher class itself: #pragma once #ifndef DISPATCHER_H #define DISPATCHER_H #include "abstract_request.h" #include "worker.h" #include <queue> #include <mutex> #include <thread> #include <vector> using namespace std; class Dispatcher { static queue<AbstractRequest*> requests; static queue<Worker*> workers; static mutex requestsMutex; static mutex workersMutex; static vector<Worker*> allWorkers; static vector<thread*> threads; public: static bool init(int workers); static bool stop(); static void addRequest(AbstractRequest* request); static bool addWorker(Worker* worker); }; #endif Most of this should look familiar by now. As one should have surmised by now, this is a fully static class. Moving on with its implementation: #include "dispatcher.h" #include <iostream> using namespace std; queue<AbstractRequest*> Dispatcher::requests; queue<Worker*> Dispatcher::workers; mutex Dispatcher::requestsMutex; mutex Dispatcher::workersMutex; vector<Worker*> Dispatcher::allWorkers; vector<thread*> Dispatcher::threads; bool Dispatcher::init(int workers) { hread* t = 0; Worker* w = 0; for (int i = 0; i < workers; ++i) { w = new Worker; allWorkers.push_back(w); = new thread(&Worker::run, w); hreads.push_back(t); } } After setting up the static class members, the init() function is defined. It starts the specified number of worker threads, keeping a reference to each worker and thread instance in their respective vector data structures. bool Dispatcher::stop() { for (int i = 0; i < allWorkers.size(); ++i) { allWorkers[i]->stop(); } cout <<"Stopped workers.n"; for (int j = 0; j < threads.size(); ++j) { hreads[j]->join(); cout <<"Joined threads.n"; } } In the stop() function each worker instance has its stop() function called. This will cause each worker thread to terminate, as we saw earlier in the Worker class description. Finally, we wait for each thread to join (that is, finish), prior to returning. void Dispatcher::addRequest(AbstractRequest* request) { workersMutex.lock(); if (!workers.empty()) { Worker* worker = workers.front(); worker->setRequest(request); condition_variable* cv; worker->getCondition(cv); cv->notify_one(); workers.pop(); workersMutex.unlock(); } else { workersMutex.unlock(); requestsMutex.lock(); requests.push(request); requestsMutex.unlock(); } } The addRequest() function is where things get interesting. In this one function a new request is added. What happens next to it depends on whether a worker thread is waiting for a new request or not. If no worker thread is waiting (worker queue is empty), the request is added to the request queue. The use of mutexes ensures that the access to these queues occurs safely, as the worker threads will simultaneously try to access both queues as well. An import gotcha to note here is the possibility of a deadlock. That is, a situation where two threads will hold the lock on a resource, with the other thread waiting for the first thread to release its lock before releasing its own. Every situation where more than one mutex is used in a single scope holds this potential. In this function the potential for deadlock lies in the releasing of the lock on the workers mutex and when the lock on the requests mutex is obtained. In the case that this function holds the workers mutex and tries to obtain the requests lock (when no worker thread is available), there is a chance that another thread holds the requests mutex (looking for new requests to handle), while simultaneously trying to obtain the workers mutex (finding no requests and adding itself to the workers queue). The solution hereby is simple: release a mutex before obtaining the next one. In the situation where one feels that more than one mutex lock has to be held it is paramount to examine and test one's code for potential deadlocks. In this particular situation the workers mutex lock is explicitly released when it is no longer needed, or before the requests mutex lock is obtained, preventing a deadlock. Another important aspect of this particular section of code is the way it signals a worker thread. As one can see in the first section of the if/else block, when the workers queue is not empty, a worker is fetched from the queue, has the request set on it and then has its condition variable referenced and signaled, or notified. Internally the condition variable uses the mutex we handed it before in the Worker class definition to guarantee only atomic access to it. When the notify_one() function (generally called signal() in other APIs) is called on the condition variable, it will notify the first thread in the queue of threads waiting for the condition variable to return and continue. In the Worker class'run() function we would be waiting for this notification event. Upon receiving it, the worker thread would continue and process the new request. The thread reference will then be removed from the queue until it adds itself again once it is done processing the request. bool Dispatcher::addWorker(Worker* worker) { bool wait = true; requestsMutex.lock(); if (!requests.empty()) { AbstractRequest* request = requests.front(); worker->setRequest(request); requests.pop(); wait = false; requestsMutex.unlock(); } else { requestsMutex.unlock(); workersMutex.lock(); workers.push(worker); workersMutex.unlock(); } return wait; } With this function a worker thread will add itself to the queue once it is done processing a request. It is similar to the earlier function in that the incoming worker is first actively matched with any request which may be waiting in the request queue. If none are available, the worker is added to the worker queue. Important to note here is that we return a boolean value which indicates whether the calling thread should wait for a new request, or whether it already has received a new request while trying to add itself to the queue. While this code is less complex than that of the previous function, it still holds the same potential deadlock issue due to the handling of two mutexes within the same scope. Here, too, we first release the mutex we hold before obtaining the next one. Makefile The Makefile for this dispatcher example is very basic again, gathering all C++ source files in the current folder and compiling them into a binary using g++: GCC := g++ OUTPUT := dispatcher_demo SOURCES := $(wildcard *.cpp) CCFLAGS := -std=c++11 -g3 all: $(OUTPUT) $(OUTPUT): $(GCC) -o $(OUTPUT) $(CCFLAGS) $(SOURCES) clean: rm $(OUTPUT) .PHONY: all Output After compiling the application, running it produces the following output for the fifty total requests: $ ./dispatcher_demo.exe Initialised. Starting processing request 1... Starting processing request 2... Finished request 1 Starting processing request 3... Finished request 3 Starting processing request 6... Finished request 6 Starting processing request 8... Finished request 8 Starting processing request 9... Finished request 9 Finished request 2 Starting processing request 11... Finished request 11 Starting processing request 12... Finished request 12 Starting processing request 13... Finished request 13 Starting processing request 14... Finished request 14 Starting processing request 7... Starting processing request 10... Starting processing request 15... Finished request 7 Finished request 15 Finished request 10 Starting processing request 16... Finished request 16 Starting processing request 17... Starting processing request 18... Starting processing request 0… At this point we we can already clearly see that even with each request taking almost no time to process, the requests are clearly being executed in parallel. The first request (request 0) only starts being processed after the 16th request, while the second request already finishes after the ninth request, long before this. The factors which determine which thread and thus which request is processed first depends on the OS scheduler and hardware-based scheduling. This clearly shows just how few assumptions one can be made about how a multithreaded application will be executed, even on a single platform. Starting processing request 5... Finished request 5 Starting processing request 20... Finished request 18 Finished request 20 Starting processing request 21... Starting processing request 4... Finished request 21 Finished request 4 Here the fourth and fifth requests also finish in a rather delayed fashion. Starting processing request 23... Starting processing request 24... Starting processing request 22... Finished request 24 Finished request 23 Finished request 22 Starting processing request 26... Starting processing request 25... Starting processing request 28... Finished request 26 Starting processing request 27... Finished request 28 Finished request 27 Starting processing request 29... Starting processing request 30... Finished request 30 Finished request 29 Finished request 17 Finished request 25 Starting processing request 19... Finished request 0 At this point the first request finally finishes. This may indicate that the initialization time for the first request will always delay it relative to the successive requests. Running the application multiple times can confirm this. It's important that if the order of processing is relevant, that this randomness does not negatively affect one's application. Starting processing request 33... Starting processing request 35... Finished request 33 Finished request 35 Starting processing request 37... Starting processing request 38... Finished request 37 Finished request 38 Starting processing request 39... Starting processing request 40... Starting processing request 36... Starting processing request 31... Finished request 40 Finished request 39 Starting processing request 32... Starting processing request 41... Finished request 32 Finished request 41 Starting processing request 42... Finished request 31 Starting processing request 44... Finished request 36 Finished request 42 Starting processing request 45... Finished request 44 Starting processing request 47... Starting processing request 48... Finished request 48 Starting processing request 43... Finished request 47 Finished request 43 Finished request 19 Starting processing request 34... Finished request 34 Starting processing request 46... Starting processing request 49... Finished request 46 Finished request 49 Finished request 45 Request 19 also became fairly delayed, showing once again just how unpredictable a multithreaded application can be. If we were processing a large data set in parallel here, with chunks of data in each request, we might have to pause at some points to account for these delays as otherwise our output cache might grow too large. As doing so would negatively affect an application's performance, one might have to look at low-level optimizations, as well as the scheduling of threads on specific processor cores in order to prevent this from happening. Stopped workers. Joined threads. Joined threads. Joined threads. Joined threads. Joined threads. Joined threads. Joined threads. Joined threads. Joined threads. Joined threads. Clean-up done. All ten worker threads which were launched in the beginning terminate here as we call the stop() function of the Dispatcher. Sharing data In this article's example we saw how to share information between threads in addition to the synchronizing of threads. This in the form of the requests we passed from the main thread into the dispatcher, from which each request gets passed on to a different thread. The essential idea behind the sharing of data between threads is that the data to be shared exists somewhere in a way which is accessible to two threads or more. After this we have to ensure that only one thread can modify the data, and that the data does not get modified while it's being read. Generally we would use mutexes or similar to ensure this. Using R/W-locks Readers-writer locks are a possible optimization here, because they allow multiple threads to read simultaneously from a single data source. If one has an application in which multiple worker threads read the same information repeatedly, it would be more efficient to use read-write locks than basic mutexes, because the attempts to read the data will not block the other threads. A read-write lock can thus be used as a more advanced version of a mutex, namely as one which adapts its behavior to the type of access. Internally it builds on mutexes (or semaphores) and condition variables. Using shared pointers First available via the Boost library and introduced natively with C++11, shared pointers are an abstraction of memory management using reference counting for heap-allocated instances. They are partially thread-safe, in that creating multiple shared pointer instances can be created, but the referenced object itself is not thread-safe. Depending on the application this may suffice, however. To make them properly thread-safe one can use atomics. Summary In this article we looked at how to pass data between threads in a safe manner as part of a fairly complex scheduler implementation. We also looked at the resulting asynchronous processing of said scheduler and considered some potential alternatives and optimizations for passing data between threads. At this point one should be able to safely pass data between threads, as well as synchronize the access to other shared resources. In the next article we will be looking at the native C++ threading and primitives API.  Resources for Article: Further resources on this subject: Multithreading with Qt [article] Understanding the Dependencies of a C++ Application [article] Boost.Asio C++ Network Programming [article]
Read more
  • 0
  • 0
  • 6537

article-image-basics-python-absolute-beginners
Packt
19 Jun 2017
5 min read
Save for later

Basics of Python for Absolute Beginners

Packt
19 Jun 2017
5 min read
In this article by Bhaskar Das and Mohit Raj, authors of the book, Learn Python in 7 days, we will learn basics of Python. The Python language had a humble beginning in the late 1980s when a Dutchman, Guido Von Rossum, started working on a fun project that would be a successor to the ABC language with better exception handling and capability to interface with OS Amoeba at Centrum Wiskunde and Informatica. It first appeared in 1991. Python 2.0 was released in the year 2000 and Python 3.0 was released in the year 2008. The language was named Python after the famous British television comedy show Monty Python's Flying Circus, which was one of the favorite television programmes of Guido. Here, we will see why Python has suddenly influenced our lives, various applications that use Python, and Python's implementations. In this article, you will be learning the basic installation steps required to perform on different platforms (that is Windows, Linux and Mac), about environment variables, setting up environment variables, file formats, Python interactive shell, basic syntaxes, and, finally, printing out the formatted output. (For more resources related to this topic, see here.) Why Python? Now you might be suddenly bogged with the question, why Python? According to the Institute of Electrical and Electronics Engineers (IEEE) 2016 ranking, Python ranked third after C and Java. As per Indeed.com's data of 2016, Python job market search ranked fifth. Clearly, all the data points to the ever-rising demand in the job market for Python. It's a cool language if you want to learn it just for fun. Also, you will adore the language if you want to build your career around Python. At the school level, many schools have started including Python programming for kids. With new technologies taking the market by surprise, Python has been playing a dominant role. Whether it's cloud platform, mobile app development, BigData, IoT with Raspberry Pi, or the new Blockchain technology, Python is being seen as a niche language platform to develop and deliver scalable and robust applications. Some key features of the language are: Python programs can run on any platform, you can carry code created in a Windows machine and run it on Mac or Linux Python has a large inbuilt library with prebuilt and portable functionality, known as the standard library Python is an expressive language Python is free and open source Python code is about one third of the size of equivalent C++ and Java code. Python can be both dynamically and strongly typed In dynamically typed, the type of a variable is interpreted at runtime, which means that there is no need to define the type (int, float) of a variable in Python Python applications One of the most famous platform where Python is extensively used is YouTube. Other places where you will find Python being extensively used are special effects in Hollywood movies, drug evolution and discovery, traffic control systems, ERP systems, cloud hosting, e-commerce platform, CRM systems, and whichever field you can think of. Versions At the time of writing this book, the two main versions of the Python programming language available in the market were Python 2.x and Python 3.x. The stable releases at the time of writing this book were Python 2.7.13 and Python 3.6.0. Implementations of Python Major implementations include CPython, Jython, IronPython, MicroPython and PyPy. Installation Here, we will look forward to the installation of Python on three different OS platforms, namely Windows, Linux, and Mac OS. Let's begin with the Windows platform. Installation on Windows platform Python 2.x can be downloaded from https://www.python.org/downloads. The installer is simple and easy to install. Follow these steps to install the setup: Once you click on the setup installer, you will get a small window on your desktop screen as shown. Click onNext: Provide a suitable installation folder to install Python. If you don't provide the installation folder, then the installer will automatically create an installation folder for you as shown in the screenshot shown. Click on Next: After the completion of Step 2, you will get a window to customize Python as shown in the following screenshot. Note that theAdd python.exe to Path option has been markedx. Select this option to add it to system path variable. Click onNext: Finally, clickFinish to complete the installation: Summary So far, we did a walk through on the beginning and brief history of Python. We looked at the various implementations and flavors of Python. You also learned about installing on Windows OS. Hope this article has incited enough interest in Python and serves as your first step in the kingdom of Python, with enormous possibilities! Resources for Article: Further resources on this subject: Layout Management for Python GUI [article] Putting the Fun in Functional Python [article] Basics of Jupyter Notebook and Python [article]
Read more
  • 0
  • 0
  • 2874

Packt
16 Jun 2017
9 min read
Save for later

Streaming and the Actor Model – Akka Streams!

Packt
16 Jun 2017
9 min read
In this article by Piyush Mishra, author of the Akka Cookbook, we will learn about the streaming and the actor model with Akka streams. (For more resources related to this topic, see here.) Akka is a popular toolkit designed to ease the pain of dealing with concurrency and distributed systems. It provides easy APIs to create reactive, fault-tolerant, scalable, and concurrent applications, thanks to the actor model. The actor model was introduced by Carl Hewitt in the 70s, and it has been successfully implemented by different programming languages, frameworks, or toolkits, such as Erlang or Akka. The concepts around the actor model are simple. All actors are created inside an actor system. Every actor has a unique address within the actor system, a mailbox, a state (in the case of being a stateful actor) and a behavior. The only way of interacting with an actor is by sending messages to it using its address. Messages will be stored in the mailbox until the actor is ready to process them. Once it is ready, the actor will pick one message at a time and will execute its behavior against the message. At this point, the actor might update its state, create new actors, or send messages to other already-created actors. Akka provides all this and many other features, thanks to the vast ecosystem around the core component, such as Akka Cluster, Akka Cluster Sharding, Akka Persistence, Akka HTTP, or Akka Streams. We will dig a bit more into the later one. Streaming framework and toolkits are gaining momentum lately. This is motivated by the massive number of connected devices that are generating new data constantly that needs to be consumed, processed, analyzed, and stored. This is basically the idea of Internet of Things (IoT) or the newer term Internet of Everything. Some time ago, the Akka team decided that they could build a Streaming library leveraging all the power of Akka and the actor model: Akka Streams. Akka Streams uses Akka actors as its foundation to provide a set of easy APIs to create back-pressured streams. Each stream consists of one or more sources, zero or more flows, and one or more sinks. All these different modules are also known as stages in the Akka Streams terminology. The best way to understand how a stream works is to think about it as a graph. Each stage (source, flow, or sink) has zero or more input ports and zero or more output ports. For instance, a source has zero input ports and one output port. A flow has one input port and one output port. And finally, a sink has one input port and zero output ports. To have a runnable stream, we need to ensure that all ports of all our stages are connected. Only then, we can run our stream to process some elements: Akka Streams provides a rich set of predefined stages to cover the most common streaming functions. However, if a use case requires a new custom stage, it is also possible to create it from scratch or extend an existing one. The full list of predefined stages can be found at http://doc.akka.io/docs/akka/current/scala/stream/stages-overview.html. Now that we know about the different components Akka Streams provides, it is a good moment to introduce the actor materializer. As we mentioned earlier, Akka is the foundation of Akka Streams. This means the code you define in the high-level API is eventually run inside an actor. The actor materializer is the entity responsible to create these low-level actors. By default, all processing stages get created within the same actor. This means only one element at a time can be processed by your stream. It is also possible to indicate that you want to have a different actor per stage, therefore having the possibility to process multiple messages at the same time. You can indicate this to the materializer by calling the async method in the proper stage. There are also asynchronous predefined stages. For performance reasons, Akka Streams batches messages when pushing them to the next stage to reduce overhead. After this quick introduction, let's start putting together some code to create and run a stream. We will use the Scala build tool (famously known as sbt) to retrieve the Akka dependencies and run our code. To begin with, we need a build.sbt file with the following content: name := "akka-async-streams" version := "1.0" scalaVersion := "2.11.7" libraryDependencies += "com.typesafe.akka" % "akka-actor_2.11" % "2.4.17" libraryDependencies += "com.typesafe.akka" % "akka-stream_2.11" % "2.4.17" Once we have the file ready, we need to run sbt update to let sbt fetch the required dependencies. Our first stream will push a list of words, capitalize each of them, and log the resulting values. This can easily be achieved by doing the following: implicit val actorSystem = ActorSystem() implicit val actorMaterializer = ActorMaterializer() val stream = Source(List("hello","from","akka","streams!")) .map(_.capitalize) .to(Sink.foreach(actorSystem.log.info)) stream.run() In this small code snippet, we can see how our stream has one source with a list of strings, one flow that is capitalizing each stream, and finally one sink logging the result. If we run our code, we should see the following in the output: [INFO] [default-akka.actor.default-dispatcher-3] [akka.actor.ActorSystemImpl(default)] Hello [INFO] [default-akka.actor.default-dispatcher-3] [akka.actor.ActorSystemImpl(default)] From [INFO] [default-akka.actor.default-dispatcher-3] [akka.actor.ActorSystemImpl(default)] Akka [INFO] [default-akka.actor.default-dispatcher-3] [akka.actor.ActorSystemImpl(default)] Streams! The execution of this stream is happening synchronously and ordered. In our next example, we will do the same stream; however, we can see how all stages are modular: implicit val actorSystem = ActorSystem() implicit val actorMaterializer = ActorMaterializer() val source = Source(List("hello","from","akka","streams!")) val sink = Sink.foreach(actorSystem.log.info) val capitalizer = Flow[String].map(_.capitalize) val stream = source.via(capitalizer).to(sink) stream.run() In this code snippet, we can see how stages can be treated as immutable modules. We see that we can use the via helper method to provide a flow stage in a stream. This stream is still running synchronously. To run it asynchronously, we can take advantage of the mapAsync flow. For this, let's create a small actor that will do the capitalization for us: class Capitalizer extends Actor with ActorLogging { def receive = { case str : String => log.info(s"Capitalizing $str") sender ! str.capitalize } } Once we have our actor defined, we can set up our asynchronous stream. For this, we will create a round robin pool of capitalizer actors. Then, we will use the ask pattern to send a message to an actor and wait for a response. This happens using the operator? The stream definition will be something like this: implicit val actorSystem = ActorSystem() implicit val actorMaterializer = ActorMaterializer() implicit val askTimeout = Timeout(5 seconds) val capitalizer = actorSystem.actorOf(Props[Capitalizer].withRouter(RoundRobinPool(10))) val source = Source(List("hello","from","akka","streams!")) val sink = Sink.foreach(actorSystem.log.info) val flow = Flow[String].mapAsync(parallelism = 5)(elem => (capitalizer ? elem).mapTo[String]) val stream = source.via(flow).to(sink) stream.run() If we execute this small piece of code, we can see something similar: [INFO] [default-akka.actor.default-dispatcher-16] [akka://default/user/$a/$a] Capitalizing hello [INFO] [default-akka.actor.default-dispatcher-15] [akka://default/user/$a/$b] Capitalizing from [INFO] [default-akka.actor.default-dispatcher-6] [akka://default/user/$a/$c] Capitalizing akka [INFO] [default-akka.actor.default-dispatcher-14] [akka://default/user/$a/$d] Capitalizing streams! [INFO] [default-akka.actor.default-dispatcher-14] [akka.actor.ActorSystemImpl(default)] Hello [INFO] [default-akka.actor.default-dispatcher-14] [akka.actor.ActorSystemImpl(default)] From [INFO] [default-akka.actor.default-dispatcher-14] [akka.actor.ActorSystemImpl(default)] Akka [INFO] [default-akka.actor.default-dispatcher-14] [akka.actor.ActorSystemImpl(default)] Streams! We can see how each word is being processed by a different capitalizer actor ($a/$b/$c/$d) and by different threads (default-dispatcher 16,15,6 and 14). Even if these executions are happening asynchronously in the pool of actors, the stream is still maintaining the order of the elements. If we do not need to maintain order and we are looking for a faster approach, where an element can be pushed to the next stage in the stream as soon as it is ready, we can use mapAsyncUnordered: implicit val actorSystem = ActorSystem() implicit val actorMaterializer = ActorMaterializer() implicit val askTimeout = Timeout(5 seconds) val capitalizer = actorSystem.actorOf(Props[Capitalizer].withRouter(RoundRobinPool(10))) val source = Source(List("hello","from","akka","streams!")) val sink = Sink.foreach(actorSystem.log.info) val flow = Flow[String].mapAsyncUnordered(parallelism = 5)(elem => (capitalizer ? elem).mapTo[String]) val stream = source.via(flow).to(sink) stream.run() When running this code, we can see that the order is not preserved and the capitalized words arrive to the sink differently every time we execute our code. Consider the following example: [INFO] [default-akka.actor.default-dispatcher-10] [akka://default/user/$a/$b] Capitalizing from [INFO] [default-akka.actor.default-dispatcher-4] [akka://default/user/$a/$d] Capitalizing streams! [INFO] [default-akka.actor.default-dispatcher-13] [akka://default/user/$a/$c] Capitalizing akka [INFO] [default-akka.actor.default-dispatcher-14] [akka://default/user/$a/$a] Capitalizing hello [INFO] [default-akka.actor.default-dispatcher-12] [akka.actor.ActorSystemImpl(default)] Akka [INFO] [default-akka.actor.default-dispatcher-12] [akka.actor.ActorSystemImpl(default)] From [INFO] [default-akka.actor.default-dispatcher-12] [akka.actor.ActorSystemImpl(default)] Hello [INFO] [default-akka.actor.default-dispatcher-12] [akka.actor.ActorSystemImpl(default)] Streams! Akka Streams also provides a graph DSL to define your stream. In this DSL, it is possible to connect stages just using the ~> operator: implicit val actorSystem = ActorSystem() implicit val actorMaterializer = ActorMaterializer() implicit val askTimeout = Timeout(5 seconds) val capitalizer = actorSystem.actorOf(Props[Capitalizer].withRouter(RoundRobinPool(10))) val graph = RunnableGraph.fromGraph(GraphDSL.create() { implicit b => import GraphDSL.Implicits._ val source = Source(List("hello","from","akka","streams!")) val sink = Sink.foreach(actorSystem.log.info) val flow = Flow[String].mapAsyncUnordered(parallelism = 5)(elem => (capitalizer ? elem).mapTo[String]) source ~> flow ~> sink ClosedShape }) graph.run() These code snippets show only a few features of the vast available options inside the Akka Streams framework. Actors can be seamlessly integrated with streams. This brings a whole new set of possibilities to process things in a stream fashion. We have seen how we can preserve or avoid order of elements, either synchronously or asynchronously. In addition, we saw how to use the graph DSL to define our stream. Summary In this article, we covered the concept of the actor model and the core components of Akka. We also described the stages in Akka Streams and created an example code for stream. If you want to learn more about Akka, Akka Streams, and all other modules around them, you can find useful and handy recipes like these ones in the Akka Cookbook at https://www.packtpub.com/application-development/akka-cookbook.  Resources for Article: Further resources on this subject: Creating First Akka Application [article] Working with Entities in Google Web Toolkit 2 [article] Skinner's Toolkit for Plone 3 Theming (Part 1) [article]
Read more
  • 0
  • 0
  • 3534
article-image-exploring-functions
Packt
16 Jun 2017
12 min read
Save for later

Exploring Functions

Packt
16 Jun 2017
12 min read
In this article by Marius Bancila, author of the book Modern C++ Programming Cookbook covers the following recipes: Defaulted and deleted functions Using lambdas with standard algorithms (For more resources related to this topic, see here.) Defaulted and deleted functions In C++, classes have special members (constructors, destructor and operators) that may be either implemented by default by the compiler or supplied by the developer. However, the rules for what can be default implemented are a bit complicated and can lead to problems. On the other hand, developers sometimes want to prevent objects to be copied, moved or constructed in a particular way. That is possible by implementing different tricks using these special members. The C++11 standard has simplified many of these by allowing functions to be deleted or defaulted in the manner we will see below. Getting started For this recipe, you need to know what special member functions are, and what copyable and moveable means. How to do it... Use the following syntax to specify how functions should be handled: To default a function use =default instead of the function body. Only special class member functions that have defaults can be defaulted. struct foo { foo() = default; }; To delete a function use =delete instead of the function body. Any function, including non-member functions, can be deleted. struct foo { foo(foo const &) = delete; }; void func(int) = delete; Use defaulted and deleted functions to achieve various design goals such as the following examples: To implement a class that is not copyable, and implicitly not movable, declare the copy operations as deleted. class foo_not_copiable { public: foo_not_copiable() = default; foo_not_copiable(foo_not_copiable const &) = delete; foo_not_copiable& operator=(foo_not_copiable const&) = delete; }; To implement a class that is not copyable, but it is movable, declare the copy operations as deleted and explicitly implement the move operations (and provide any additional constructors that are needed). class data_wrapper { Data* data; public: data_wrapper(Data* d = nullptr) : data(d) {} ~data_wrapper() { delete data; } data_wrapper(data_wrapper const&) = delete; data_wrapper& operator=(data_wrapper const &) = delete; data_wrapper(data_wrapper&& o) :data(std::move(o.data)) { o.data = nullptr; } data_wrapper& operator=(data_wrapper&& o) { if (this != &o) { delete data; data = std::move(o.data); o.data = nullptr; } return *this; } }; To ensure a function is called only with objects of a specific type, and perhaps prevent type promotion, provide deleted overloads for the function (the example below with free functions can also be applied to any class member functions). template <typename T> void run(T val) = delete; void run(long val) {} // can only be called with long integers How it works... A class has several special members that can be implemented by default by the compiler. These are the default constructor, copy constructor, move constructor, copy assignment, move assignment and destructor. If you don't implement them, then the compiler does it, so that instances of a class can be created, moved, copied and destructed. However, if you explicitly provide one or more, then the compiler will not generate the others according to the following rules: If a user defined constructor exists, the default constructor is not generated by default. If a user defined virtual destructor exists, the default constructor is not generated by default. If a user-defined move constructor or move assignment operator exist, then the copy constructor and copy assignment operator are not generated by default. If a user defined copy constructor, move constructor, copy assignment operator, move assignment operator or destructor exist, then the move constructor and move assignment operator are not generated by default. If a user defined copy constructor or destructor exists, then the copy assignment operator is generated by default. If a user-defined copy assignment operator or destructor exists, then the copy constructor is generated by default. Note that the last two are deprecated rules and may no longer be supported by your compiler. Sometimes developers need to provide empty implementations of these special members or hide them in order to prevent the instances of the class to be constructed in a specific manner. A typical example is a class that is not supposed to be copyable. The classical pattern for this is to provide a default constructor and hide the copy constructor and copy assignment operators. While this works, the explicitly defined default constructor makes the class to no longer be considered trivial and therefore a POD type (that can be constructed with reinterpret_cast). The modern alternative to this is using deleted function as shown in the previous section. When the compiler encounters the =default in the definition of a function it will provide the default implementation. The rules for special member functions mentioned earlier still apply. Functions can be declared =default outside the body of a class if and only if they are inlined. class foo     {      public:      foo() = default;      inline foo& operator=(foo const &);     };     inline foo& foo::operator=(foo const &) = default;     When the compiler encounters the =delete in the definition of a function it will prevent the calling of the function. However, the function is still considered during overload resolution and only if the deleted function is the best match the compiler generates an error. For example, giving the previously defined overloads for function run() only calls with long integers are possible. Calls with arguments of any other type, including int, for which an automatic type promotion to long exists, would determine a deleted overload to be considered the best match and therefore the compiler will generate an error: run(42); // error, matches a deleted overload     run(42L); // OK, long integer arguments are allowed     Note that previously declared functions cannot be deleted, as the =delete definition must be the first declaration in a translation unit: void forward_declared_function();     // ...     void forward_declared_function() = delete; // error     The rule of thumb (also known as The Rule of Five) for class special member functions is: if you explicitly define any of copy constructor, move constructor, copy assignment, move assignment or destructor then you must either explicitly define or default all of them. Using lambdas with standard algorithms One of the most important modern features of C++ is lambda expressions, also referred as lambda functions or simply lambdas. Lambda expressions enable us to define anonymous function objects that can capture variables in the scope and be invoked or passed as arguments to functions. Lambdas are useful for many purposes and in this recipe, we will see how to use them with standard algorithms. Getting ready In this recipe, we discuss standard algorithms that take an argument that is a function or predicate that is applied to the elements it iterates through. You need to know what unary and binary functions are, and what are predicates and comparison functions. You also need to be familiar with function objects because lambda expressions are syntactic sugar for function objects. How to do it... Prefer to use lambda expressions to pass callbacks to standard algorithms instead of functions or function objects: Define anonymous lambda expressions in the place of the call if you only need to use the lambda in a single place. auto numbers = std::vector<int>{ 0, 2, -3, 5, -1, 6, 8, -4, 9 }; auto positives = std::count_if( std::begin(numbers), std::end(numbers), [](int const n) {return n > 0; }); Define a named lambda, that is, assigned to a variable (usually with the auto specifier for the type), if you need to call the lambda in multiple places. auto ispositive = [](int const n) {return n > 0; }; auto positives = std::count_if( std::begin(numbers), std::end(numbers), ispositive); Use generic lambda expressions if you need lambdas that only differ in their argument types (available since C++14). auto positives = std::count_if( std::begin(numbers), std::end(numbers), [](auto const n) {return n > 0; }); How it works... The non-generic lambda expression shown above takes a constant integer and returns true if it is greater than 0, or false otherwise. The compiler defines an unnamed function object with the call operator having the signature of the lambda expression. struct __lambda_name__     {     bool operator()(int const n) const { return n > 0; }     };     The way the unnamed function object is defined by the compiler depends on the way we define the lambda expression, that can capture variables, use the mutable specifier or exception specifications or may have a trailing return type. The __lambda_name__ function object shown earlier is actually a simplification of what the compiler generates because it also defines a default copy and move constructor, a default destructor, and a deleted assignment operator. It must be well understood that the lambda expression is actually a class. In order to call it, the compiler needs to instantiate an object of the class. The object instantiated from a lambda expression is called a lambda closure. In the next example, we want to count the number of elements in a range that are greater or equal to 5 and less or equal than 10. The lambda expression, in this case, will look like this: auto numbers = std::vector<int>{ 0, 2, -3, 5, -1, 6, 8, -4, 9 };     auto start{ 5 };     auto end{ 10 };     auto inrange = std::count_if(      std::begin(numbers), std::end(numbers),      [start,end](int const n) {return start <= n && n <= end;});     This lambda captures two variables, start and end, by copy (that is, value). The result unnamed function object created by the compiler looks very much like the one we defined above. With the default and deleted special members mentioned earlier, the class looks like this: class __lambda_name_2__     {    int start_; int end_; public: explicit __lambda_name_2__(int const start, int const end) : start_(start), end_(end) {}    __lambda_name_2__(const __lambda_name_2__&) = default;    __lambda_name_2__(__lambda_name_2__&&) = default;    __lambda_name_2__& operator=(const __lambda_name_2__&)     = delete;    ~__lambda_name_2__() = default;      bool operator() (int const n) const    { return start_ <= n && n <= end_; }     };     The lambda expression can capture variables by copy (or value) or by reference, and different combinations of the two are possible. However, it is not possible to capture a variable multiple times and it is only possible to have & or = at the beginning of the capture list. A lambda can only capture variables from an enclosing function scope. It cannot capture variables with static storage duration (that means variables declared in namespace scope or with the static or external specifier). The following table shows various combinations for the lambda captures semantics. Lambda Description [](){} Does not capture anything [&](){} Captures everything by reference [=](){} Captures everything by copy [&x](){} Capture only x by reference [x](){} Capture only x by copy [&x...](){} Capture pack extension x by reference [x...](){} Capture pack extension x by copy [&, x](){} Captures everything by reference except for x that is captured by copy [=, &x](){} Captures everything by copy except for x that is captured by reference [&, this](){} Captures everything by reference except for pointer this that is captured by copy (this is always captured by copy) [x, x](){} Error, x is captured twice [&, &x](){} Error, everything is captured by reference, cannot specify again to capture x by reference [=, =x](){} Error, everything is captured by copy, cannot specify again to capture x by copy [&this](){} Error, pointer this is always captured by copy [&, =](){} Error, cannot capture everything both by copy and by reference The general form of a lambda expression, as of C++17, looks like this:  [capture-list](params) mutable constexpr exception attr -> ret { body }    All parts shown in this syntax are actually optional except for the capture list, that can, however, be empty, and the body, that can also be empty. The parameter list can actually be omitted if no parameters are needed. The return type does not need to be specified as the compiler can infer it from the type of the returned expression. The mutable specifier (that tells the compiler the lambda can actually modify variables captured by copy), the constexpr specifier (that tells the compiler to generate a constexpr call operator) and the exception specifiers and attributes are all optional. The simplest possible lambda expression is []{}, though it is often written as [](){}. There's more... There are cases when lambda expressions only differ in the type of their arguments. In this case, the lambdas can be written in a generic way, just like templates, but using the auto specifier for the type parameters (no template syntax is involved). Summary Functions are a fundamental concept in programming; regardless the topic we discussed we end up writing functions. This article contains recipes related to functions. This article, however, covers modern language features related to functions and callable objects. Resources for Article: Further resources on this subject: Understanding the Dependencies of a C++ Application [article] Boost.Asio C++ Network Programming [article] Application Development in Visual C++ - The Tetris Application [article]
Read more
  • 0
  • 0
  • 1057

article-image-hands-service-fabric
Packt
06 Apr 2017
12 min read
Save for later

Hands on with Service Fabric

Packt
06 Apr 2017
12 min read
In this article by Rahul Rai and Namit Tanasseri, authors of the book Microservices with Azure, explains that Service Fabric as a platform supports multiple programming models. Each of which is best suited for specific scenarios. Each programming model offers different levels of integration with the underlying management framework. Better integration leads to more automation and lesser overheads. Picking the right programming model for your application or services is the key to efficiently utilize the capabilities of Service Fabric as a hosting platform. Let's take a deeper look into these programming models. (For more resources related to this topic, see here.) To start with, let's look at the least integrated hosting option: Guest Executables. Native windows applications or application code using Node.js or Java can be hosted on Service Fabric as a guest executable. These executables can be packaged and pushed to a Service Fabric cluster like any other services. As the cluster manager has minimal knowledge about the executable, features like custom health monitoring, load reporting, state store and endpoint registration cannot be leveraged by the hosted application. However, from a deployment standpoint, a guest executable is treated like any other service. This means that for a guest executable, Service Fabric cluster manager takes care of high availability, application lifecycle management, rolling updates, automatic failover, high density deployment and load balancing. As an orchestration service, Service Fabric is responsible for deploying and activating an application or application services within a cluster. It is also capable of deploying services within a container image. This programming model is addressed as Guest Containers. The concept of containers is best explained as an implementation of operating system level virtualization. They are encapsulated deployable components running on isolated process boundaries sharing the same kernel. Deployed applications and their runtime dependencies are bundles within the container with an isolated view of all operating system constructs. This makes containers highly portable and secure. Guest container programming model is usually chosen when this level of isolation is required for the application. As containers don't have to boot an operating system, they have fast boot up time and are comparatively small in size. A prime benefit of using Service Fabric as a platform is the fact that it supports heterogeneous operating environments. Service Fabric supports two types of containers to be deployed as guest containers: Docker containers on Linux and Windows server containers. Container images for Docker containers are stored in Docker Hub and Docker APIs are used to create and manage the containers deployed on Linux kernel. Service Fabric supports two different types of containers in Windows Server 2016 with different levels of isolation. They are: Windows Server containers and Windows Hyper-V containers Windows Server containers are similar to Docker containers in terms of the isolation they provide. Windows Hyper-V containers offer higher degree of isolation and security by not sharing the operating system kernel across instances. These are ideally used when a higher level of security isolation is required such as systems requiring hostile multitenant hosts. The following figure illustrates the different isolation levels achieved by using these containers. Container isolation levels Service Fabric application model treats containers as an application host which can in turn host service replicas. There are three ways of utilizing containers within a Service Fabric application mode. Existing applications like Node.js, JavaScript application of other executables can be hosted within a container and deployed on Service Fabric as a Guest Container. A Guest Container is treated similar to a Guest Executable by Service Fabric runtime. The second scenario supports deploying stateless services inside a container hosted on Service Fabric. Stateless services using Reliable Services and Reliable actors can be deployed within a container. The third option is to deploy stateful services in containers hosted on Service Fabric. This model also supports Reliable Services and Reliable Actors. Service Fabric offers several features to manage containerized Microservices. These include container deployment and activation, resource governance, repository authentication, port mapping, container discovery and communication and ability to set environment variables. While containers offer a good level of isolation it is still heavy in terms of deployment footprint. Service Fabric offers a simpler, powerful programming model to develop your services which they call Reliable Services. Reliable services let you develop stateful and stateless services which can be directly deployed on Service Fabric clusters. For stateful services, the state can be stored close to the compute by using Reliable Collections. High availability of the state store and replication of the state is taken care by the Service Fabric cluster management services. This contributes substantially to the performance of the system by improving the latency of data access. Reliable services come with a built-in pluggable communication model which supports HTTP with Web API, WebSockets and custom TCP protocols out of the box. A Reliable service is addressed as stateless if it does not maintain any state within it or if the scope of the state stored is limited to a service call and is entirely disposable. This means that a stateless service does not require to persist, synchronize or replicate state. A good example for this service is a weather service like MSN weather service. A weather service can be queried to retrieve weather conditions associated with a specific geographical location. The response is totally based on the parameters supplied to the service. This service does not store any state. Although stateless services are simpler to implement, most of the services in real life are not stateless. They either store state in an external state store or an internal one. Web front end hosting APIs or web applications are good use cases to be hosted as stateless services. A stateful service persists states. The outcome of a service call made to a stateful service is usually influenced by the state persisted by the service. A service exposed by a bank to return the balance on an account is a good example for a stateful service. The state may be stored in an external data store such as Azure SQL Database, Azure Blobs or Azure Table store. Most services prefer to store the state externally considering the challenges around reliability, availability, scalability and consistency of the data store. With Service Fabric, state can be stored close to the compute by using reliable collections. To makes things more lightweight, Service Fabric also offers a programming model based on Virtual actor pattern. This programming model is called Reliable Actors. The Reliable Actors programming model is built on top of Reliable Services. This guarantees the scalability and reliability of the services. An Actor can be defined as an isolated, independent unit of compute and state with single-threaded execution. Actors can be created, managed and disposed independent of each other. Large number of actors can coexist and execute at a time. Service Fabric Reliable Actors are a good fit for systems which are highly distributed and dynamic by nature. Every actor is defined as an instance of an actor type; the same way an object is an instance of a class. Each actor is uniquely identified by an actor ID. The lifetime of Service Fabric Actors is not tied to their in-memory state. As a result, Actors are automatically created the first time a request for them is made. Reliable Actor's garbage collector takes care of disposing unused Actors in memory. Now that we understand the programming models, let's take a look at how the services deployed on Service Fabric are discovered and how the communication between services takes place. Service Fabric discovery and communication An application built on top of Microservices is usually composed of multiple services, each of which runs multiple replicas. Each service is specialized in a specific task. To achieve an end to end business use case, multiple services will need to be stitched together. This requires services to communicate to each other. A simple example would be web front end service communicating with the middle tier services which in turn connects to the back end services to handle a single user request. Some of these middle tier services can also be invoked by external applications. Services deployed on Service Fabric are distributed across multiple nodes in a cluster of virtual machines. The services can move across dynamically. This distribution of services can wither be triggered by a manual action of be result of Service Fabric cluster manager re-balancing services to achieve optimal resource utilization. This makes communication a challenge as services are not tied to a particular machine. Let's understand how Service Fabric solved this challenge for its consumers. Service protocols Service Fabric, as a hosting platform for Microservices does not interfere in the implementation of the service. On top of this, it also lets services decide on the communication channels they want to open. These channels are addressed as service endpoints. During service initiation, Service Fabric provides the opportunity for the services to set up the endpoints for incoming request on any protocol or communication stack. The endpoints are defined according to common industry standards, that is IP:Port. It is possible that multiple service instances share a single host process. In which case, they either have to use different ports or a port sharing mechanism. This will ensure that every service instance is uniquely addressable. Service endpoints Service discovery Service Fabric can rebalance services deployed on a cluster as a part of orchestration activities. This can be caused by resource balancing activities, failovers, upgrades, scale outs or scale ins. This will result in change in service endpoint addresses as the services move across different virtual machines. Service distribution The Service Fabric Naming Service is responsible for abstracting this complexity from the consuming service or application. Naming service takes care of service discovery and resolution. All service instances in Services Fabric are identified by a unique URL like fabric:/MyMicroServiceApp/AppService1. This name stays constant across the lifetime of the service although the endpoint addresses which physically host the service may change. Internally, Service Fabric manages a map between the service names and the physical location where the service is hosted. This is similar to the DNS service which is used to resolve Website URLs to IP addresses. The following figure illustrates the name resolution process for a service hosted on Service Fabric: Name resolution Connections from applications external to Service Fabric Service communications to or between services hosted in Service Fabric can be categorized as internal or external. Internal communication among services hosted on Service Fabric is easily achieved using the Naming Service. External communication, originated from an application or a user outside the boundaries of Service Fabric will need some extra work. To understand how this works, let's dive deeper in to the logical network layout of a typical Service Fabric cluster. Service Fabric cluster is always placed behind an Azure Load Balancer. The Load Balancer acts like a gateway to all traffic which needs to pass to the Service Fabric cluster. The Load Balancer is aware of every post open on every node of a cluster. When a request hits the Load Balancer, it identifies the port the request is looking for and randomly routes the request to one of the nodes which has the requested port open. The Load Balancer is not aware of the services running on the nodes or the ports associated with the services. The following figure illustrates request routing in action. Request routing Configuring ports and protocols The protocol and the ports to be opened by a Service Fabric cluster can be easily configured through the portal. Let's take an example to understand the configuration in detail. If we need a web application to be hosted on a Service Fabric cluster which should have port 80 opened on HTTP to accept incoming traffic, the following steps should be performed. Configuring service manifest Once a service listening to port 80 is authored, we need to configure port 80 in the service manifest to open a listener in the service. This can be done by editing the Service Manifest.xml. <Resources> <Endpoints> <Endpoint Name="WebEndpoint" Protocol="http" Port="80" /> </Endpoints> </Resources> Configuring custom end point On the Service Fabric cluster, configure port 80 as a custom endpoint. This can be easily done through the Azure Management portal. Configuring custom port Configure Azure Load Balancer Once the cluster is configured and created, the Azure Load Balancer can be instructed to forward the traffic to port 80. If the Service Fabric cluster is created through the portal, this step is automatically taken care for every port which is configured on the cluster configuration. Configuring Azure Load Balancer Configure health check Azure Load Balancer probes the ports on the nodes for their availability to ensure reliability of the service. The probes can be configured on the Azure portal. This is an optional step as a default probe configuration is applied for each endpoint when a cluster is created. Configuring probe Built-in Communication API Service Fabric offers many built-in communication options to support inter service communications. Service Remoting is one of them. This option allows strong typed remote procedure calls between Reliable Services and Reliable Actors. This option is very easy to set up and operate with as Service Remoting handles resolution of service addresses, connection, retry and error handling. Service Fabric also supports HTTP for language-agnostic communication. Service Fabric SDK exposes ICommunicationClient and ServicePartitionClient classes for service resolution, HTTP connections, and retry loops. WCF is also supported by Service Fabric as a communication channel to enable legacy workload to be hosted on it. The SDK exposed WcfCommunicationListener for the server side and WcfCommunicationClient and ServicePartitionClient classes for the client to ease programming hurdles. Resources for Article: Further resources on this subject: Installing Neutron [article] Designing and Building a vRealize Automation 6.2 Infrastructure [article] Insight into Hyper-V Storage [article]
Read more
  • 0
  • 0
  • 3983

article-image-layout-management-python-gui
Packt
05 Apr 2017
13 min read
Save for later

Layout Management for Python GUI

Packt
05 Apr 2017
13 min read
In this article written by Burkhard A. Meierwe, the author of the book Python GUI Programming Cookbook - Second Edition, we will lay out our GUI using Python 3.6 and above: Arranging several labels within a label frame widget Using padding to add space around widgets How widgets dynamically expand the GUI Aligning the GUI widgets by embedding frames within frames (For more resources related to this topic, see here.)  In this article, we will explore how to arrange widgets within widgets to create our Python GUI. Learning the fundamentals of GUI layout design will enable us to create great looking GUIs. There are certain techniques that will help us to achieve this layout design. The grid layout manager is one of the most important layout tools built into tkinter that we will be using. We can very easily create menu bars, tabbed controls (aka Notebooks) and many more widgets using tkinter . Arranging several labels within a label frame widget The LabelFrame widget allows us to design our GUI in an organized fashion. We are still using the grid layout manager as our main layout design tool, yet by using LabelFrame widgets we get much more control over our GUI design. Getting ready We are starting to add more and more widgets to our GUI, and we will make the GUI fully functional in the coming recipes. Here we are starting to use the LabelFrame widget. Add the following code just above the main event loop towards the bottom of the Python module. Running the code will result in the GUI looking like this: Uncomment line 111 and notice the different alignment of the LabelFrame. We can easily align the labels vertically by changing our code, as shown next. Note that the only change we had to make was in the column and row numbering. Now the GUI LabelFrame looks as such: How it works... In line 109 we create our first ttk LabelFrame widget and assign the resulting instance to the variable buttons_frame. The parent container is win, our main window. In lines 114 -116, we create labels and place them in the LabelFrame. buttons_frame is the parent of the labels. We are using the important grid layout tool to arrange the labels within the LabelFrame. The column and row properties of this layout manager give us the power to control our GUI layout. The parent of our labels is the buttons_frame instance variable of the LabelFrame, not the win instance variable of the main window. We can see the beginning of a layout hierarchy here. We can see how easy it is to change our layout via the column and row properties. Note how we change the column to 0, and how we layer our labels vertically by numbering the row values sequentially. The name ttk stands for themed tk. The tk-themed widget set was introduced in Tk 8.5. There's more... In a recipe later in this article we will embed LabelFrame widgets within LabelFrame widgets, nesting them to control our GUI layout. Using padding to add space around widgets Our GUI is being created nicely. Next, we will improve the visual aspects of our widgets by adding a little space around them, so they can breathe. Getting ready While tkinter might have had a reputation for creating ugly GUIs, this has dramatically changed since version 8.5. You just have to know how to use the tools and techniques that are available. That's what we will do next. tkinter version 8.6 ships with Python 3.6. How to do it... The procedural way of adding spacing around widgets is shown first, and then we will use a loop to achieve the same thing in a much better way. Our LabelFrame looks a bit tight as it blends into the main window towards the bottom. Let's fix this now. Modify line 110 by adding padx and pady: And now our LabelFrame got some breathing space: How it works... In tkinter, adding space horizontally and vertically is done by using the built-in properties named padx and pady. These can be used to add space around many widgets, improving horizontal and vertical alignments, respectively. We hard-coded 20 pixels of space to the left and right of the LabelFrame, and we added 40 pixels to the top and bottom of the frame. Now our LabelFrame stands out more than it did before. The screenshot above only shows the relevant change. We can use a loop to add space around the labels contained within the LabelFrame: Now the labels within the LabelFrame widget have some space around them too: The grid_configure() function enables us to modify the UI elements before the main loop displays them. So, instead of hard-coding values when we first create a widget, we can work on our layout and then arrange spacing towards the end of our file, just before the GUI is being created. This is a neat technique to know. The winfo_children() function returns a list of all the children belonging to the buttons_frame variable. This enables us to loop through them and assign the padding to each label. One thing to notice is that the spacing to the right of the labels is not really visible. This is because the title of the LabelFrame is longer than the names of the labels. We can experiment with this by making the names of the labels longer. Now our GUI looks like this. Note how there is now some space added to the right of the long label next to the dots. The last dot does not touch the LabelFrame which it otherwise would without the added space. We can also remove the name of the LabelFrame to see the effect padx has on positioning our labels. By setting the text property to an empty string, we remove the name that was previously displayed for the LabelFrame. How widgets dynamically expand the GUI You probably noticed in previous screenshots and by running the code that widgets have a capability to extend themselves to the space they need to visually display their text. Java introduced the concept of dynamic GUI layout management. In comparison, visual development IDEs like VS.NET lay out the GUI in a visual manner, and are basically hard-coding the x and y coordinates of UI elements. Using tkinter, this dynamic capability creates both an advantage and a little bit of a challenge, because sometimes our GUI dynamically expands when we would prefer it rather not to be so dynamic! Well, we are dynamic Python programmers so we can figure out how to make the best use of this fantastic behavior! Getting ready At the beginning of the previous recipe we added a LabelFrame widget. This moved some of our controls to the center of column 0. We might not wish this modification to our GUI layout. Next we will explore some ways to solve this. How to do it... Let us first become aware of the subtle details that are going on in our GUI layout in order to understand it better. We are using the grid layout manager widget and it lays out our widgets in a zero-based grid. This is very similar to an Excel spreadsheet or a database table. Grid Layout Manager Example with 2 Rows and 3 Columns: Row 0; Col 0 Row 0; Col 1 Row 0; Col 2 Row 1; Col 0 Row 1; Col 1 Row 1; Col 2 Using the grid layout manager, what is happening is that the width of any given column is determined by the longest name or widget in that column. This affects all rows. By adding our LabelFrame widget and giving it a title that is longer than some hard-coded size widget like the top-left label and the text entry below it, we dynamically move those widgets to the center of column 0, adding space to the left and right sides of those widgets. Incidentally, because we used the sticky property for the Checkbutton and ScrolledText widgets, those remain attached to the left side of the frame. Let's look in more detail at the screenshot from the first recipe of this article. We added the following code to create the LabelFrame and then placed labels into this frame. Since the text property of the LabelFrame, which is displayed as the title of the LabelFrame, is longer than both our Enter a name: label and the text box entry below it, those two widgets are dynamically centered within the new width of column 0. The Checkbutton and Radiobutton widgets in column 0 did not get centered because we used the sticky=tk.W property when we created those widgets. For the ScrolledText widget we used sticky=tk.WE, which binds the widget to both the west (aka left) and east (aka right) side of the frame. Let's remove the sticky property from the ScrolledText widget and observe the effect this change has. Now our GUI has new space around the ScrolledText widget both on the left and right sides. Because we used the columnspan=3 property, our ScrolledText widget still spans all three columns. If we remove columnspan=3 we get the following GUI which is not what we want. Now our ScrolledText only occupies column 0 and, because of its size, it stretches the layout. One way to get our layout back to where we were before adding the LabelFrame is to adjust the grid column position. Change the column value from 0 to 1. Now our GUI looks like this: How it works... Because we are still using individual widgets, our layout can get messed up. By moving the column value of the LabelFrame from 0 to 1, we were able to get the controls back to where they used to be and where we prefer them to be. At least the left-most label, text, Checkbutton, ScrolledText, and Radiobutton widgets are now located where we intended them to be. The second label and text Entry located in column 1 aligned themselves to the center of the length of the Labels in a Frame widget so we basically moved our alignment challenge one column to the right. It is not so visible because the size of the Choose a number: label is almost the same as the size of the Labels in a Frame title, and so the column width was already close to the new width generated by the LabelFrame. There's more... In the next recipe we will embed frames within frames to avoid the accidental misalignment of widgets we just experienced in this recipe. Aligning the GUI widgets by embedding frames within frames We have much better control of our GUI layout if we embed frames within frames. This is what we will do in this recipe. Getting ready The dynamic behavior of Python and its GUI modules can create a little bit of a challenge to really get our GUI looking the way we want. Here we will embed frames within frames to get more control of our layout. This will establish a stronger hierarchy among the different UI elements, making the visual appearance easier to achieve. We will continue to use the GUI we created in the previous recipe. How to do it... Here, we will create a top-level frame that will contain other frames and widgets. This will help us to get our GUI layout just the way we want. In order to do so, we will have to embed our current controls within a central ttk.LabelFrame. This ttk.LabelFrame is a child of the main parent window and all controls will be children of this ttk.LabelFrame. Up to this point in our recipes we have assigned all widgets to our main GUI frame directly. Now we will only assign our LabelFrame to our main window and after that we will make this LabelFrame the parent container for all the widgets. This creates the following hierarchy in our GUI layout: In this diagram, win is the variable that holds a reference to our main GUI tkinter window frame; mighty is the variable that holds a reference to our LabelFrame and is a child of the main window frame (win); and Label and all other widgets are now placed into the LabelFrame container (mighty). Add the following code towards the top of our Python module: Next we will modify all the following controls to use mighty as the parent, replacing win. Here is an example of how to do this: Note how all the widgets are now contained in the Mighty Python LabelFrame which surrounds all of them with a barely visible thin line. Next, we can reset the Labels in a Frame widget to the left without messing up our GUI layout: Oops - maybe not. While our frame within another frame aligned nicely to the left, it again pushed our top widgets into the center (a default). In order to align them to the left, we have to force our GUI layout by using the sticky property. By assigning it 'W' (West) we can control the widget to be left-aligned. How it works... Note how we aligned the label, but not the text box below it. We have to use the sticky property for all the controls we want to left-align. We can do that in a loop, using the winfo_children() and grid_configure(sticky='W') properties, as we did before in the second recipe of this article. The winfo_children() function returns a list of all the children belonging to the parent. This enables us to loop through all of the widgets and change their properties. Using tkinter to force left, right, top, bottom the naming is very similar to Java: west, east, north and south, abbreviated to: 'W' and so on. We can also use the following syntax: tk.W instead of 'W'. This requires having imported the tkinter module aliased as tk. In a previous recipe we combined both 'W' and 'E' to make our ScrolledText widget attach itself both to the left and right sides of its container using 'WE'. We can add more combinations: 'NSE' will stretch our widget to the top, bottom and right side. If we have only one widget in our form, for example a button, we can make it fill in the entire frame by using all options: 'NSWE'. We can also use tuple syntax: sticky=(tk.N, tk.S, tk.W, tk.E). Let's align the entry in column 0 to the left. Now both the label and the  Entry are aligned towards the West (left). In order to separate the influence that the length of our Labels in a Frame LabelFrame has on the rest of our GUI layout we must not place this LabelFrame into the same LabelFrame as the other widgets but assign it directly to the main GUI form (win). Summary We have learned the layout management for Python GUI using the following reciepes: Arranging several labels within a label frame widget Using padding to add space around widgets How widgets dynamically expand the GUI Aligning the GUI widgets by embedding frames within frames Resources for Article:  Further resources on this subject: Python Scripting Essentials [article] An Introduction to Python Lists and Dictionaries [article] Test all the things with Python [article]
Read more
  • 0
  • 0
  • 10787
article-image-understanding-dependencies-c-application
Packt
05 Apr 2017
9 min read
Save for later

Understanding the Dependencies of a C++ Application

Packt
05 Apr 2017
9 min read
This article by Richard Grimes, author of the book, Beginning C++ Programming explains the dependencies of a C++ application. A C++ project will produce an executable or library, and this will be built by the linker from object files. The executable or library is dependent upon these object files. An object file will be compiled from a C++ source file (and potentially one or more header files). The object file is dependent upon these C++ source and header files. Understanding dependencies is important because it helps you understand the order to compile the files in your project, and it allows you to make your project builds quicker by only compiling those files that have changed. (For more resources related to this topic, see here.) Libraries When you include a file within your source file the code within that header file will be accessible to your code. Your include file may contain whole function or class definitions (these will be covered in later chapters) but this will result in a problem: multiple definitions of a function or class. Instead, you can declare a class or function prototype, which indicates how calling code will call the function without actually defining it. Clearly the code will have to be defined elsewhere, and this could be a source file or a library, but the compiler will be happy because it only sees one definition. A library is code that has already been defined, it has been fully debugged and tested, and therefore users should not need to have access to the source code. The C++ Standard Library is mostly shared through header files, which helps you when you debug your code, but you must resist any temptation to edit these files. Other libraries will be provided as compiled libraries. There are essentially two types of compiled libraries: static libraries and dynamic link libraries. If you use a static library then the compiler will copy the compiled code that you use from the static library and place it in your executable. If you use a dynamic link (or shared) library then the linker will add information used during runtime (it may be when the executable is loaded, or it may even be delayed until the function is called) to load the shared library into memory and access the function. Windows uses the extension lib for static libraries and dll for dynamic link libraries. GNU gcc uses the extension a for static libraries and so for shared libraries. If you use library code in a static or dynamic link library the compiler will need to know that you are calling a function correctly—to make sure your code calls a function with the correct number of parameters and correct types. This is the purpose of a function prototype—it gives the compiler the information it needs to know about calling the function without providing the actual body of the function, the function definition. In general, the C++ Standard Library will be included into your code through the standard header files. The C Runtime Library (which provides some code for the C++ Standard Library) will be static linked, but if the compiler provides a dynamic linked version you will have a compiler option to use this. Pre-compiled Headers When you include a file into your source file the preprocessor will include the contents of that file (after taking into account any conditional compilation directives) and recursively any files included by that file. As illustrated earlier, this could result in thousands of lines of code. As you develop your code you will often compile the project so that you can test the code. Every time you compile your code the code defined in the header files will also be compiled even though the code in library header files will not have changed. With a large project this can make the compilation take a long time. To get around this problem compilers often offer an option to pre-compile headers that will not change. Creating and using precompiled headers is compiler specific. For example, with gcc you compile a header as if it is a C++ source file (with the /x switch) and the compiler creates a file with an extension of gch. When gcc compiles source files that use the header it will search for the gch file and if it finds the precompiled header it will use that, otherwise it will use the header file. In Visual C++ the process is a bit more complicated because you have to specifically tell the compiler to look for a precompiled header when it compiles a source file. The convention in Visual C++ projects is to have a source file called stdafx.cpp which has a single line that includes the file stdafx.h. You put all your stable header file includes in stdafx.h. Next, you create a precompiled header by compiling stdafx.cpp using the /Yc compiler option to specify that stdafx.h contains the stable headers to compile. This will create a pch file (typically, Visual C++ will name it after your project) containing the code compiled up to the point of the inclusion of the stdafx.h header file. Your other source files must include the stdafx.h header file as the first header file, but it may also include other files. When you compile your source files you use the /Yu switch to specify the stable header file (stdafx.h) and the compiler will use the precompiled header pch file instead of the header. When you examine large projects you will often find precompiled headers are used, and as you can see, it alters the file structure of the project. The example later in this chapter will show how to create and use precompiled headers. Project Structure It is important to organize your code into modules to enable you to maintain it effectively. Even if you are writing C-like procedural code (that is, your code involves calls to functions in a linear way) you will also benefit from organizing it into modules. For example, you may have functions that manipulate strings and other functions that access files, so you may decide to put the definition of the string functions in one source file, string.cpp, and the definition of the file functions in another file, file.cpp. So that other modules in the project can use these files you must declare the prototypes of the functions in a header file and include that header in the module that uses the functions. There is no absolute rule in the language about the relationship between the header files and the source files that contain the definition of the functions. You may have a header file called string.h for the functions in string.cpp and a header file called file.h for the functions in file.cpp. Or you may have just one file called utilities.h that contains the declarations for all the functions in both files. The only rule that you have to abide by is that at compile time the compiler must have access to a declaration of the function in the current source file, either through a header file, or the function definition itself. The compiler will not look forward in a source file, so if a function calls another function in the same source file that called function must have already been defined before the calling function, or there must be a prototype declaration. This leads to a typical convention of having a header file associated with each source file that contains the prototypes of the functions in the source file, and the source file includes this header. This convention becomes more important when you write classes. Managing Dependencies When a project is built with a building tool, checks are performed to see if the output of the build exist and if not, perform the appropriate actions to build it. Common terminology is that the output of a build step is called a target and the inputs of the build step (for example, source files) are the dependencies of that target. Each target's dependencies are the files used to make them. The dependencies may themselves be a target of a build action and have their own dependencies. For example, the following picture shows the dependencies in a project: In this project there are three source files (main.cpp, file1.cpp, file2.cpp) each of these includes the same header utils.h which is precompiled (and hence why there is a fourth source file, utils.cpp, that only contains utils.h). All of the source files depend on utils.pch, which in turn depends upon utils.h. The source file main.cpp has the main function and calls functions in the other two source files (file1.cpp and file2.cpp), and accesses the functions through the associated header files file1.h and file2.h. On the first compilation the build tool will see that the executable depends on the four object files and so it will look for the rule to build each one. In the case of the three C++ source files this means compiling the cpp files, but since utils.obj is used to support the precompiled header, the build rule will be different to the other files. When the build tool has made these object files it will then link them together along with any library code (not shown here). Subsequently, if you change file2.cpp and build the project, the build tool will see that only file2.cpp has changed and since only file2.obj depends on file2.cpp all the make tool needs to do is compile file2.cpp and then link the new file2.obj with the existing object files to create the executable. If you change the header file, file2.h, the build tool will see that two files depend on this header file, file2.cpp and main.cpp and so the build tool will compile these two source files and link the new two object files file2.obj and main.obj with the existing object files to form the executable. If, however, the precompiled header source file, util.h, changes it means that all of the source files will have to be compiled. Summary For a small project, dependencies are easy to manage, and as you have seen, for a single source file project you do not even have to worry about calling the linker because the compiler will do that automatically. As a C++ project gets bigger, managing dependencies gets more complex and this is where development environments like Visual C++ become vital. Resources for Article: Further resources on this subject: Introduction to C# and .NET [article] Preparing to Build Your Own GIS Application [article] Writing a Fully Native Application [article]
Read more
  • 0
  • 0
  • 16341

article-image-getting-started-c-features
Packt
05 Apr 2017
7 min read
Save for later

Getting started with C++ Features

Packt
05 Apr 2017
7 min read
In this article by Jacek Galowicz author of the book C++ STL Cookbook, we will learn new C++ features and how to use structured bindings to return multiple values at once. (For more resources related to this topic, see here.) Introduction C++ got a lot of additions in C++11, C++14, and most recently C++17. By now, it is a completely different language than it was just a decade ago. The C++ standard does not only standardize the language, as it needs to be understood by the compilers, but also the C++ standard template library (STL). We will see how to access individual members of pairs, tuples, and structures comfortably with structured bindings, and how to limit variable scopes with the new if and switch variable initialization capabilities. The syntactical ambiguities, which were introduced by C++11 with the new bracket initialization syntax, which looks the same for initializer lists, were fixed by new bracket initializer rules. The exact type of template class instances can now be deduced from the actual constructor arguments, and if different specializations of a template class shall result in completely different code, this is now easily expressible with constexpr-if. The handling of variadic parameter packs in template functions became much easier in many cases with the new fold expressions. At last, it became more comfortable to define static globally accessible objects in header-only libraries with the new ability to declare inline variables, which was only possible for functions before. Using structured bindings to return multiple values at once C++17 comes with a new feature which combines syntactic sugar and automatic type deduction: Structured bindings. These help assigning values from pairs, tuples, and structs into individual variables. How to do it... Applying a structured binding in order to assign multiple variables from one bundled structure is always one step: Accessing std::pair: Imagine we have a mathematical function divide_remainder, which accepts a dividend and a divisor parameter, and returns the fraction of both as well as the remainder. It returns those values using an std::pair bundle: std::pair<int, int> divide_remainder(int dividend, int divisor); Instead of accessing the individual values of the resulting pair like this: const auto result (divide_remainder(16, 3)); std::cout << "16 / 3 is " << result.first << " with a remainder of " << result.second << "n"; We can now assign them to individual variables with expressive names, which is much better to read: auto [fraction, remainder] = divide_remainder(16, 3); std::cout << "16 / 3 is " << fraction << " with a remainder of " << remainder << "n"; Structured bindings also work with std::tuple: Let's take the following example function, which gets us online stock information: std::tuple<std::string, std::chrono::time_point, double> stock_info(const std::string &name); Assigning its result to individual variables looks just like in the example before: const auto [name, valid_time, price] = stock_info("INTC"); Structured bindings also work with custom structures: Let's assume a structure like the following: struct employee { unsigned id; std::string name; std::string role; unsigned salary; }; Now we can access these members using structured bindings. We will even do that in a loop, assuming we have a whole vector of those: int main() { std::vector<employee> employees {/* Initialized from somewhere */}; for (const auto &[id, name, role, salary] : employees) { std::cout << "Name: " << name << "Role: " << role << "Salary: " << salary << "n"; } } How it works... Structured bindings are always applied with the same pattern: auto [var1, var2, ...] = <pair, tuple, struct, or array expression>; The list of variables var1, var2, ... must exactly match the number of variables which are contained by the expression being assigned from. The <pair, tuple, struct, or array expression> must be one of the following: An std::pair. An std::tuple. A struct. All members must be non-static and be defined in the same base class. An array of fixed size. The type can be auto, const auto, const auto& and even auto&&. Not only for the sake of performance, always make sure to minimize needless copies by using references when appropriate. If we write too many or not enough variables between the square brackets, the compiler will error out, telling us about our mistake: std::tuple<int, float, long> tup {1, 2.0, 3}; auto [a, b] = tup; This example obviously tries to stuff a tuple variable with three members into only two variables. The compiler immediately chokes on this and tells us about our mistake: error: type 'std::tuple<int, float, long>' decomposes into 3 elements, but only 2 names were provided auto [a, b] = tup; There's more... A lot of fundamental data structures from the STL are immediately accessible using structured bindings without us having to change anything. Consider for example a loop, which prints all items of an std::map: std::map<std::string, size_t> animal_population { {"humans", 7000000000}, {"chickens", 17863376000}, {"camels", 24246291}, {"sheep", 1086881528}, /* … */ }; for (const auto &[species, count] : animal_population) { std::cout << "There are " << count << " " << species << " on this planet.n"; } This particular example works, because when we iterate over a std::map container, we get std::pair<key_type, value_type> items on every iteration step. And exactly those are unpacked using the structured bindings feature (Assuming that the species string is the key, and the population count the value being associated with the key), in order to access them individually in the loop body. Before C++17, it was possible to achieve a similar effect using std::tie: int remainder; std::tie(std::ignore, remainder) = divide_remainder(16, 5); std::cout << "16 % 5 is " << remainder << "n"; This example shows how to unpack the result pair into two variables. std::tie is less powerful than structured bindings in the sense that we have to define all variables we want to bind to before. On the other hand, this example shows a strength of std::tie which structured bindings do not have: The value std::ignore acts as a dummy variable. The fraction part of the result is assigned to it, which leads to that value being dropped because we do not need it in that example. Back in the past, the divide_remainder function would have been implemented the following way, using output parameters: bool divide_remainder(int dividend, int divisor, int &fraction, int &remainder); Accessing it would have looked like the following: int fraction, remainder; const bool success {divide_remainder(16, 3, fraction, remainder)}; if (success) { std::cout << "16 / 3 is " << fraction << " with a remainder of " << remainder << "n"; } A lot of people will still prefer this over returning complex structures like pairs, tuples, and structs, arguing that this way the code would be faster, due to avoided intermediate copies of those values. This is not true any longer for modern compilers, which optimize intermediate copies away. Apart from the missing language features in C, returning complex structures via return value was considered slow for a long time, because the object had to be initialized in the returning function, and then copied into the variable which shall contain the return value on the caller side. Modern compilers support return value optimization (RVO), which enables for omitting intermediate copies. Summary Thus we successfully studied how to use structured bindings to return multiple values at once in C++ 17 using code examples. Resources for Article: Further resources on this subject: Creating an F# Project [article] Hello, C#! Welcome, .NET Core! [article] Exploring Structure from Motion Using OpenCV [article]
Read more
  • 0
  • 0
  • 2257