How-To Tutorials

article-image-ninject-patterns-and-anti-patterns

30 Sep 2013

7 min read

Ninject Patterns and Anti-patterns

30 Sep 2013

(For more resources related to this topic, see here.) Dependencies can be injected in a consumer class using different patterns and injecting them into a constructor is just one of them. While there are some patterns that can be followed for injecting dependencies, there are also some patterns that are recommended to be avoided, as they usually lead to undesirable results. In this article, we will examine only those patterns and antipatterns that are somehow relevant to Ninject features. Constructor Injection Constructor Injection is the most common and recommended pattern for injecting dependencies in a class. Generally this pattern should always be used as the primary injection pattern unless we have to use other ones. In this pattern, a list of all class dependencies should be introduced in the constructor. The question is what if the class has more than one constructor. Although Ninject's strategy for selecting constructor is customizable, its default behavior is selecting the constructor with more parameters, provided all of them are resolvable by Ninject. So, although in the following code the second constructor introduces more parameters, Ninject will select the first one if it cannot resolve IService2 and it will even use the default constructor if IService1 is not registered either. But if both dependencies are registered and resolvable, Ninject will select the second constructor because it has more parameters: public class Consumer { private readonly IService1 dependency1; private readonly IService2 dependency2; public Consumer(IService1 dependency1) { this.dependency1 = dependency1; } public Consumer(IService1 dependency1, IService2 dependency2) { this.dependency1 = dependency1; this.dependency2 = dependency2; } } If the preceding class had another constructor with two resolvable parameters, Ninject would throw an ActivationException exception notifying that several constructors had the same priority. There are two approaches to override this default behavior and explicitly select a constructor. The first approach is to indicate the desired constructor in a binding as follows: Bind<Consumer>().ToConstructor(arg => new Consumer(arg.Inject<IService1>())); In the preceding example, we explicitly selected the first constructor. Using the Inject<T> method that the arg argument provides, we requested Ninject to resolve IService1 in order to be injected into the specified constructor. The second method is to indicate the desired constructor using the [Inject] attribute: [Inject] public Consumer(IService1 dependency1) { this.dependency1 = dependency1; } In the preceding example, we applied the Ninject's [Inject] attribute on the first constructor to explicitly specify that we need to initialize the class by injecting dependencies into this constructor; even though the second constructor has more parameters and the default strategy of Ninject would be to select the second one. Note that applying this attribute on more than one constructor will result in the ActivationException. Ninject is highly customizable and it is even possible to substitute the default [Inject] attribute with another one, so that we don't need to add reference to the Ninject library from our consumer classes just because of an attribute: kernel.Settings.Set("InjectAttribute",typeof(MyAttribute)); Initializer methods and properties Apart from constructor injection, Ninject supports the injection of dependencies using initializer methods and property setters. We can specify as many methods and properties as required using the [Inject] attribute to inject dependencies. Although the dependencies will be injected to them as soon as the class is constructed, it is not possible to predict in which order they will receive their dependencies. The following code shows how to specify a property for injection: [Inject]public IService Service{ get { return dependency; } set { dependency = value; }} Here is an example of injecting dependencies using an injector method: [Inject]public void Setup(IService dependency){ this.dependency = dependency;} Note that only public members and constructors will be injected and even the internals will be ignored unless Ninject is configured to inject nonpublic members. In Constructor Injection, the constructor is a single point where we can consume all of the dependencies as soon as the class is activated. But when we use initializer methods the dependencies will be injected via multiple points in an unpredictable order, so we cannot decide in which method all of the dependencies will be ready to consume. In order to solve this problem, Ninject offers the IInitializable interface. This interface has an Initialize method which will be called once all of the dependencies have been injected: public class Consumer:IInitializable{ private IService1 dependency1; private IService2 dependency2; [Inject] public IService Service1 { get { return dependency1; } set { dependency1 = value; } } [Inject] public IService Service2 { get { return dependency2; } set { dependency2 = value; } } public void Initialize() { // Consume all dependencies here }} Although Ninject supports injection using properties and methods, Constructor Injection should be the superior approach. First of all, Constructor Injection makes the class more reusable, because a list of all class dependencies are visible, while in the initializer property or method the user of the class should investigate all of the class members or go through the class documentations (if any), to discover its dependencies. Initialization of the class is easier while using Constructor Injection because all the dependencies get injected at the same time and we can easily consume them at the same place where the constructor initializes the class. As we have seen in the preceding examples the only case where the backing fields could be readonly was in the Constructor Injection scenario. As the readonly fields are initializable only in the constructor, we need to make them writable to be able to use initializer methods and properties. This can lead to potential mutation of backing fields. Service Locator Service Locator is a design pattern introduced by Martin Fowler regarding which there have been some controversies. Although it can be useful in particular circumstances, it is generally considered as an antipattern and preferably should be avoided. Ninject can easily be misused as a Service Locator if we are not familiar to this pattern. The following example demonstrates misusing the Ninject kernel as a Service Locator rather than a DI container: public class Consumer{ public void Consume() { var kernel = new StandardKernel(); var depenency1 = kernel.Get<IService1>(); var depenency2 = kernel.Get<IService2>(); ... }} There are two significant downsides with the preceding code. The first one is that although we are using a DI container, we are not at all implementing DI. The class is tied to the Ninject kernel while it is not really a dependency of this class. This class and all of its prospective consumers will always have to drag their unnecessary dependency on the kernel object and Ninject library. On the other hand, the real dependencies of class (IService1 and IService2) are invisible from the consumers, and this reduces its reusability. Even if we change the design of this class to the following one, the problems still exist: public class Consumer{ private readonly IKernel kernel; public Consumer(IKernel kernel) { this.kernel = kernel; } public void Consume() { var depenency1 = kernel.Get<IService1>(); var depenency2 = kernel.Get<IService2>(); ... }} The preceding class still depends on the Ninject library while it doesn't have to and its actual dependencies are still invisible to its consumers. It can easily be refactored using the Constructor Injection pattern: public Consumer(IService1 dependency1, IService2 dependency2){ this.dependency1 = dependency1; this.dependency2 = dependency2;} Summary In this article we studied the most common DI patterns and anti-patterns related to Ninject. Resources for Article: Further resources on this subject: Introduction to JBoss Clustering [Article] Configuring Clusters in GlassFish [Article] Designing Secure Java EE Applications in GlassFish [Article]

0
0
7864

article-image-creating-graph-application-python-neo4j-gephi-linkuriousjs

Greg Roberts

12 Oct 2015

13 min read

Creating a graph application with Python, Neo4j, Gephi & Linkurious.js

Greg Roberts

12 Oct 2015

13 min read

I love Python, and to celebrate Packt Python week, I’ve spent some time developing an app using some of my favorite tools. The app is a graph visualization of Python and related topics, as well as showing where all our content fits in. The topics are all StackOverflow tags, related by their co-occurrence in questions on the site. The app is available to view at http://gregroberts.github.io/ and in this blog, I’m going to discuss some of the techniques I used to construct the underlying dataset, and how I turned it into an online application. Graphs, not charts Graphs are an incredibly powerful tool for analyzing and visualizing complex data. In recent years, many different graph database engines have been developed to make use of this novel manner of representing data. These databases offer many benefits over traditional, relational databases because of how the data is stored and accessed. Here at Packt, I use a Neo4j graph to store and analyze data about our business. Using the Cypher query language, it’s easy to express complicated relations between different nodes succinctly. It’s not just the technical aspect of graphs which make them appealing to work with. Seeing the connections between bits of data visualized explicitly as in a graph helps you to see the data in a different light, and make connections that you might not have spotted otherwise. This graph has many uses at Packt, from customer segmentation to product recommendations. In the next section, I describe the process I use to generate recommendations from the database. Make the connection For product recommendations, I use what’s known as a hybrid filter. This considers both content based filtering (product x and y are about the same topic) and collaborative filtering (people who bought x also bought y). Each of these methods has strengths and weaknesses, so combining them into one algorithm provides a more accurate signal. The collaborative aspect is straightforward to implement in Cypher. For a particular product, we want to find out which other product is most frequently bought alongside it. We have all our products and customers stored as nodes, and purchases are stored as edges. Thus, the Cypher query we want looks like this: MATCH (n:Product {title:’Learning Cypher’})-[r:purchased*2]-(m:Product) WITH m.title AS suggestion, count(distinct r)/(n.purchased+m.purchased) AS alsoBought WHERE m<>n RETURN* ORDER BY alsoBought DESC and will very efficiently return the most commonly also purchased product. When calculating the weight, we divide by the total units sold of both titles, so we get a proportion returned. We do this so we don’t just get the titles with the most units; we’re effectively calculating the size of the intersection of the two titles’ audiences relative to their overall audience size. The content side of the algorithm looks very similar: MATCH (n:Product {title:’Learning Cypher’})-[r:is_about*2]-(m:Product) WITH m.title AS suggestion, count(distinct r)/(length(n.topics)+length(m.topics)) AS alsoAbout WHERE m<>n RETURN * ORDER BY alsoAbout DESC Implicit in this algorithm is knowledge that a title is_about a topic of some kind. This could be done manually, but where’s the fun in that? In Packt’s domain there already exists a huge, well moderated corpus of technology concepts and their usage: StackOverflow. The tagging system on StackOverflow not only tells us about all the topics developers across the world are using, it also tells us how those topics are related, by looking at the co-occurrence of tags in questions. So in our graph, StackOverflow tags are nodes in their own right, which represent topics. These nodes are connected via edges, which are weighted to reflect their co-occurrence on StackOverflow: edge_weight(n,m) = (Number of questions tagged with both n & m)/(Number questions tagged with n or m) So, to find topics related to a given topic, we could execute a query like this: MATCH (n:StackOverflowTag {name:'Matplotlib'})-[r:related_to]-(m:StackOverflowTag) RETURN n.name, r.weight, m.name ORDER BY r.weight DESC LIMIT 10 Which would return the following: | n.name | r.weight | m.name ----+------------+----------+-------------------- 1 | Matplotlib | 0.065699 | Plot 2 | Matplotlib | 0.045678 | Numpy 3 | Matplotlib | 0.029667 | Pandas 4 | Matplotlib | 0.023623 | Python 5 | Matplotlib | 0.023051 | Scipy 6 | Matplotlib | 0.017413 | Histogram 7 | Matplotlib | 0.015618 | Ipython 8 | Matplotlib | 0.013761 | Matplotlib Basemap 9 | Matplotlib | 0.013207 | Python 2.7 10 | Matplotlib | 0.012982 | Legend There are many, more complex relationships you can define between topics like this, too. You can infer directionality in the relationship by looking at the local network, or you could start constructing Hyper graphs using the extensive StackExchange API. So we have our topics, but we still need to connect our content to topics. To do this, I’ve used a two stage process. Step 1 – Parsing out the topics We take all the copy (words) pertaining to a particular product as a document representing that product. This includes the title, chapter headings, and all the copy on the website. We use this because it’s already been optimized for search, and should thus carry a fair representation of what the title is about. We then parse this document and keep all the words which match the topics we’ve previously imported. #...code for fetching all the copy for all the products key_re = 'W(%s)W' % '|'.join(re.escape(i) for i in topic_keywords) for i in documents tags = re.findall(key_re, i[‘copy’]) i['tags'] = map(lambda x: tag_lookup[x],tags) Having done this for each product, we have a bag of words representing each product, where each word is a recognized topic. Step 2 – Finding the information From each of these documents, we want to know the topics which are most important for that document. To do this, we use the tf-idf algorithm. tf-idf stands for term frequency, inverse document frequency. The algorithm takes the number of times a term appears in a particular document, and divides it by the proportion of the documents that word appears in. The term frequency factor boosts terms which appear often in a document, whilst the inverse document frequency factor gets rid of terms which are overly common across the entire corpus (for example, the term ‘programming’ is common in our product copy, and whilst most of the documents ARE about programming, this doesn’t provide much discriminating information about each document). To do all of this, I use python (obviously) and the excellent scikit-learn library. Tf-idf is implemented in the class sklearn.feature_extraction.text.TfidfVectorizer. This class has lots of options you can fiddle with to get more informative results. import sklearn.feature_extraction.text as skt tagger = skt.TfidfVectorizer(input = 'content', encoding = 'utf-8', decode_error = 'replace', strip_accents = None, analyzer = lambda x: x, ngram_range = (1,1), max_df = 0.8, min_df = 0.0, norm = 'l2', sublinear_tf = False) It’s a good idea to use the min_df & max_df arguments of the constructor so as to cut out the most common/obtuse words, to get a more informative weighting. The ‘analyzer’ argument tells it how to get the words from each document, in our case, the documents are already lists of normalized words, so we don’t need anything additional done. #create vectors of all the documents vectors = tagger.fit_transform(map(lambda x: x['tags'],rows)).toarray() #get back the topic names to map to the graph t_map = tagger.get_feature_names() jobs = [] for ind, vec in enumerate(vectors): features = filter(lambda x: x[1]>0, zip(t_map,vec)) doc = documents[ind] for topic, weight in features: job = ‘’’MERGE (n:StackOverflowTag {name:’%s’}) MERGE (m:Product {id:’%s’}) CREATE UNIQUE (m)-[:is_about {source:’tf_idf’,weight:%d}]-(n) ’’’ % (topic, doc[‘id’], weight) jobs.append(job) We then execute all of the jobs using Py2neo’s Batch functionality. Having done all of this, we can now relate products to each other in terms of what topics they have in common: MATCH (n:Product {isbn10:'1783988363'})-[r:is_about]-(a)-[q:is_about]-(m:Product {isbn10:'1783289007'}) WITH a.name as topic, r.weight+q.weight AS weight RETURN topic ORDER BY weight DESC limit 6 Which returns: | topic ---+------------------ 1 | Machine Learning 2 | Image 3 | Models 4 | Algorithm 5 | Data 6 | Python Huzzah! I now have a graph into which I can throw any piece of content about programming or software, and it will fit nicely into the network of topics we’ve developed. Take a breath So, that’s how the graph came to be. To communicate with Neo4j from Python, I use the excellent py2neo module, developed by Nigel Small. This module has all sorts of handy abstractions to allow you to work with nodes and edges as native Python objects, and then update your Neo instance with any changes you’ve made. The graph I’ve spoken about is used for many purposes across the business, and has grown in size and scope significantly over the last year. For this project, I’ve taken from this graph everything relevant to Python. I started by getting all of our content which is_about Python, or about a topic related to python: titles = [i.n for i in graph.cypher.execute('''MATCH (n)-[r:is_about]-(m:StackOverflowTag {name:'Python'}) return distinct n''')] t2 = [i.n for i in graph.cypher.execute('''MATCH (n)-[r:is_about]-(m:StackOverflowTag)-[:related_to]-(m:StackOverflowTag {name:'Python'}) where has(n.name) return distinct n''')] titles.extend(t2) then hydrated this further by going one or two hops down each path in various directions, to get a large set of topics and content related to Python. Visualising the graph Since I started working with graphs, two visualisation tools I’ve always used are Gephi and Sigma.js. Gephi is a great solution for analysing and exploring graphical data, allowing you to apply a plethora of different layout options, find out more about the statistics of the network, and to filter and change how the graph is displayed. Sigma.js is a lightweight JavaScript library which allows you to publish beautiful graph visualizations in a browser, and it copes very well with even very large graphs. Gephi has a great plugin which allows you to export your graph straight into a web page which you can host, share and adapt. More recently, Linkurious have made it their mission to bring graph visualization to the masses. I highly advise trying the demo of their product. It really shows how much value it’s possible to get out of graph based data. Imagine if your Customer Relations team were able to do a single query to view the entire history of a case or customer, laid out as a beautiful graph, full of glyphs and annotations. Linkurious have built their product on top of Sigma.js, and they’ve made available much of the work they’ve done as the open source Linkurious.js. This is essentially Sigma.js, with a few changes to the API, and an even greater variety of plugins. On Github, each plugin has an API page in the wiki and a downloadable demo. It’s worth cloning the repository just to see the things it’s capable of! Publish It! So here’s the workflow I used to get the Python topic graph out of Neo4j and onto the web. Use Py2neo to graph the subgraph of content and topics pertinent to Python, as described above Add to this some other topics linked to the same books to give a fuller picture of the Python “world” Add in topic-topic edges and product-product edges to show the full breadth of connections observed in the data Export all the nodes and edges to csv files Import node and edge tables into Gephi. The reason I’m using Gephi as a middle step is so that I can fiddle with the visualisation in Gephi until it looks perfect. The layout plugin in Sigma is good, but this way the graph is presentable as soon as the page loads, the communities are much clearer, and I’m not putting undue strain on browsers across the world! The layout of the graph has been achieved using a number of plugins. Instead of using the pre-installed ForceAtlas layouts, I’ve used the OpenOrd layout, which I feel really shows off the communities of a large graph. There’s a really interesting and technical presentation about how this layout works here. Export the graph into gexf format, having applied some partition and ranking functions to make it more clear and appealing. Now it’s all down to Linkurious and its various plugins! You can explore the source code of the final page to see all the details, but here I’ll give an overview of the different plugins I’ve used for the different parts of the visualisation: First instantiate the graph object, pointing to a container (note the CSS of the container, without this, the graph won’t display properly: <style type="text/css"> #container { max-width: 1500px; height: 850px; margin: auto; background-color: #E5E5E5; } </style> … <div id="container"></div> … <script> s= new sigma({ container: 'container', renderer: { container: document.getElementById('container'), type: 'canvas' }, settings: { … } }); sigma.parsers.gexf - used for (trivially!) importing a gexf file into a sigma instance sigma.parsers.gexf( 'static/data/Graph1.gexf', s, function(s) { //callback executed once the data is loaded, use this to set up any aspects of the app which depend on the data }); sigma.plugins.filter - Adds the ability to very simply hide nodes/edges based on a callback function which returns a boolean. This powers the filtering widgets on the page. <input class="form-control" id="min-degree" type="range" min="0" max="0" value="0"> … function applyMinDegreeFilter(e) { var v = e.target.value; $('#min-degree-val').textContent = v; filter .undo('min-degree') .nodesBy( function(n, options) { return this.graph.degree(n.id) >= options.minDegreeVal; },{ minDegreeVal: +v }, 'min-degree' ) .apply(); }; $('#min-degree').change(applyMinDegreeFilter); sigma.plugins.locate - Adds the ability to zoom in on a single node or collection of nodes. Very useful if you’re filtering a very large initial graph function locateNode (nid) { if (nid == '') { locate.center(1); } else { locate.nodes(nid); } }; sigma.renderers.glyphs - Allows you to add custom glyphs to each node. Useful if you have many types of node. Outro This application has been a very fun little project to build. The improvements to Sigma wrought by Linkurious have resulted in an incredibly powerful toolkit to rapidly generate graph based applications with a great degree of flexibility and interaction potential. None of this would have been possible were it not for Python. Python is my right (left, I’m left handed) hand which I use for almost everything. Its versatility and expressiveness make it an incredibly robust Swiss army knife in any data-analysts toolkit.

0
0
7863

Packt

03 Mar 2015

17 min read

Basics of Programming in Julia

Packt

03 Mar 2015

17 min read

In this article by Ivo Balbaert, author of the book Getting Started with Julia Programming, we will explore how Julia interacts with the outside world, reading from standard input and writing to standard output, files, networks, and databases. Julia provides asynchronous networking I/O using the libuv library. We will see how to handle data in Julia. We will also discover the parallel processing model of Julia. In this article, the following topics are covered: Working with files (including the CSV files) Using DataFrames (For more resources related to this topic, see here.) Working with files To work with files, we need the IOStream type. IOStream is a type with the supertype IO and has the following characteristics: The fields are given by names(IOStream) 4-element Array{Symbol,1}: :handle :ios :name :mark The types are given by IOStream.types (Ptr{None}, Array{Uint8,1}, String, Int64) The file handle is a pointer of the type Ptr, which is a reference to the file object. Opening and reading a line-oriented file with the name example.dat is very easy: // code in Chapter 8io.jl fname = "example.dat" f1 = open(fname) fname is a string that contains the path to the file, using escaping of special characters with when necessary; for example, in Windows, when the file is in the test folder on the D: drive, this would become d:\test\example.dat. The f1 variable is now an IOStream(<file example.dat>) object. To read all lines one after the other in an array, use data = readlines(f1), which returns 3-element Array{Union(ASCIIString,UTF8String),1}: "this is line 1.rn" "this is line 2.rn" "this is line 3." For processing line by line, now only a simple loop is needed: for line in data println(line) # or process line end close(f1) Always close the IOStream object to clean and save resources. If you want to read the file into one string, use readall. Use this only for relatively small files because of the memory consumption; this can also be a potential problem when using readlines. There is a convenient shorthand with the do syntax for opening a file, applying a function process, and closing it automatically. This goes as follows (file is the IOStream object in this code): open(fname) do file process(file) end The do command creates an anonymous function, and passes it to open. Thus, the previous code example would have been equivalent to open(process, fname). Use the same syntax for processing a file fname line by line without the memory overhead of the previous methods, for example: open(fname) do file for line in eachline(file) print(line) # or process line end end Writing a file requires first opening it with a "w" flag, then writing strings to it with write, print, or println, and then closing the file handle that flushes the IOStream object to the disk: fname = "example2.dat" f2 = open(fname, "w") write(f2, "I write myself to a filen") # returns 24 (bytes written) println(f2, "even with println!") close(f2) Opening a file with the "w" option will clear the file if it exists. To append to an existing file, use "a". To process all the files in the current folder (or a given folder as an argument to readdir()), use this for loop: for file in readdir() # process file end Reading and writing CSV files A CSV file is a comma-separated file. The data fields in each line are separated by commas "," or another delimiter such as semicolons ";". These files are the de-facto standard for exchanging small and medium amounts of tabular data. Such files are structured so that one line contains data about one data object, so we need a way to read and process the file line by line. As an example, we will use the data file Chapter 8winequality.csv that contains 1,599 sample measurements, 12 data columns, such as pH and alcohol per sample, separated by a semicolon. In the following screenshot, you can see the top 20 rows: In general, the readdlm function is used to read in the data from the CSV files: # code in Chapter 8csv_files.jl: fname = "winequality.csv" data = readdlm(fname, ';') The second argument is the delimiter character (here, it is ;). The resulting data is a 1600x12 Array{Any,2} array of the type Any because no common type could be found: "fixed acidity" "volatile acidity" "alcohol" "quality" 7.4 0.7 9.4 5.0 7.8 0.88 9.8 5.0 7.8 0.76 9.8 5.0 … If the data file is comma separated, reading it is even simpler with the following command: data2 = readcsv(fname) The problem with what we have done until now is that the headers (the column titles) were read as part of the data. Fortunately, we can pass the argument header=true to let Julia put the first line in a separate array. It then naturally gets the correct datatype, Float64, for the data array. We can also specify the type explicitly, such as this: data3 = readdlm(fname, ';', Float64, 'n', header=true) The third argument here is the type of data, which is a numeric type, String or Any. The next argument is the line separator character, and the fifth indicates whether or not there is a header line with the field (column) names. If so, then data3 is a tuple with the data as the first element and the header as the second, in our case, (1599x12 Array{Float64,2}, 1x12 Array{String,2}) (There are other optional arguments to define readdlm, see the help option). In this case, the actual data is given by data3[1] and the header by data3[2]. Let's continue working with the variable data. The data forms a matrix, and we can get the rows and columns of data using the normal array-matrix syntax). For example, the third row is given by row3 = data[3, :] with data: 7.8 0.88 0.0 2.6 0.098 25.0 67.0 0.9968 3.2 0.68 9.8 5.0, representing the measurements for all the characteristics of a certain wine. The measurements of a certain characteristic for all wines are given by a data column, for example, col3 = data[ :, 3] represents the measurements of citric acid and returns a column vector 1600-element Array{Any,1}: "citric acid" 0.0 0.0 0.04 0.56 0.0 0.0 … 0.08 0.08 0.1 0.13 0.12 0.47. If we need columns 2-4 (volatile acidity to residual sugar) for all wines, extract the data with x = data[:, 2:4]. If we need these measurements only for the wines on rows 70-75, get these with y = data[70:75, 2:4], returning a 6 x 3 Array{Any,2} outputas follows: 0.32 0.57 2.0 0.705 0.05 1.9 … 0.675 0.26 2.1 To get a matrix with the data from columns 3, 6, and 11, execute the following command: z = [data[:,3] data[:,6] data[:,11]] It would be useful to create a type Wine in the code. For example, if the data is to be passed around functions, it will improve the code quality to encapsulate all the data in a single data type, like this: type Wine fixed_acidity::Array{Float64} volatile_acidity::Array{Float64} citric_acid::Array{Float64} # other fields quality::Array{Float64} end Then, we can create objects of this type to work with them, like in any other object-oriented language, for example, wine1 = Wine(data[1, :]...), where the elements of the row are splatted with the ... operator into the Wine constructor. To write to a CSV file, the simplest way is to use the writecsv function for a comma separator, or the writedlm function if you want to specify another separator. For example, to write an array data to a file partial.dat, you need to execute the following command: writedlm("partial.dat", data, ';') If more control is necessary, you can easily combine the more basic functions from the previous section. For example, the following code snippet writes 10 tuples of three numbers each to a file: // code in Chapter 8tuple_csv.jl fname = "savetuple.csv" csvfile = open(fname,"w") # writing headers: write(csvfile, "ColName A, ColName B, ColName Cn") for i = 1:10 tup(i) = tuple(rand(Float64,3)...) write(csvfile, join(tup(i),","), "n") end close(csvfile) Using DataFrames If you measure n variables (each of a different type) of a single object of observation, then you get a table with n columns for each object row. If there are m observations, then we have m rows of data. For example, given the student grades as data, you might want to know "compute the average grade for each socioeconomic group", where grade and socioeconomic group are both columns in the table, and there is one row per student. The DataFrame is the most natural representation to work with such a (m x n) table of data. They are similar to pandas DataFrames in Python or data.frame in R. A DataFrame is a more specialized tool than a normal array for working with tabular and statistical data, and it is defined in the DataFrames package, a popular Julia library for statistical work. Install it in your environment by typing in Pkg.add("DataFrames") in the REPL. Then, import it into your current workspace with using DataFrames. Do the same for the packages DataArrays and RDatasets (which contains a collection of example datasets mostly used in the R literature). A common case in statistical data is that data values can be missing (the information is not known). The DataArrays package provides us with the unique value NA, which represents a missing value, and has the type NAtype. The result of the computations that contain the NA values mostly cannot be determined, for example, 42 + NA returns NA. (Julia v0.4 also has a new Nullable{T} type, which allows you to specify the type of a missing value). A DataArray{T} array is a data structure that can be n-dimensional, behaves like a standard Julia array, and can contain values of the type T, but it can also contain the missing (Not Available) values NA and can work efficiently with them. To construct them, use the @data macro: // code in Chapter 8dataarrays.jl using DataArrays using DataFrames dv = @data([7, 3, NA, 5, 42]) This returns 5-element DataArray{Int64,1}: 7 3 NA 5 42. The sum of these numbers is given by sum(dv) and returns NA. One can also assign the NA values to the array with dv[5] = NA; then, dv becomes [7, 3, NA, 5, NA]). Converting this data structure to a normal array fails: convert(Array, dv) returns ERROR: NAException. How to get rid of these NA values, supposing we can do so safely? We can use the dropna function, for example, sum(dropna(dv)) returns 15. If you know that you can replace them with a value v, use the array function: repl = -1 sum(array(dv, repl)) # returns 13 A DataFrame is a kind of an in-memory database, versatile in the ways you can work with the data. It consists of columns with names such as Col1, Col2, Col3, and so on. Each of these columns are DataArrays that have their own type, and the data they contain can be referred to by the column names as well, so we have substantially more forms of indexing. Unlike two-dimensional arrays, columns in a DataFrame can be of different types. One column might, for instance, contain the names of students and should therefore be a string. Another column could contain their age and should be an integer. We construct a DataFrame from the program data as follows: // code in Chapter 8dataframes.jl using DataFrames # constructing a DataFrame: df = DataFrame() df[:Col1] = 1:4 df[:Col2] = [e, pi, sqrt(2), 42] df[:Col3] = [true, false, true, false] show(df) Notice that the column headers are used as symbols. This returns the following 4 x 3 DataFrame object: We could also have used the full constructor as follows: df = DataFrame(Col1 = 1:4, Col2 = [e, pi, sqrt(2), 42], Col3 = [true, false, true, false]) You can refer to the columns either by an index (the column number) or by a name, both of the following expressions return the same output: show(df[2]) show(df[:Col2]) This gives the following output: [2.718281828459045, 3.141592653589793, 1.4142135623730951,42.0] To show the rows or subsets of rows and columns, use the familiar splice (:) syntax, for example: To get the first row, execute df[1, :]. This returns 1x3 DataFrame. | Row | Col1 | Col2 | Col3 | |-----|------|---------|------| | 1 | 1 | 2.71828 | true | To get the second and third row, execute df [2:3, :] To get only the second column from the previous result, execute df[2:3, :Col2]. This returns [3.141592653589793, 1.4142135623730951]. To get the second and third column from the second and third row, execute df[2:3, [:Col2, :Col3]], which returns the following output: 2x2 DataFrame | Row | Col2 | Col3 | |---- |----- -|-------| | 1 | 3.14159 | false | | 2 | 1.41421 | true | The following functions are very useful when working with DataFrames: The head(df) and tail(df) functions show you the first six and the last six lines of data respectively. The names function gives the names of the columns names(df). It returns 3-element Array{Symbol,1}: :Col1 :Col2 :Col3. The eltypes function gives the data types of the columns eltypes(df). It gives the output as 3-element Array{Type{T<:Top},1}: Int64 Float64 Bool. The describe function tries to give some useful summary information about the data in the columns, depending on the type, for example, describe(df) gives for column 2 (which is numeric) the min, max, median, mean, number, and percentage of NAs: Col2 Min 1.4142135623730951 1st Qu. 2.392264761937558 Median 2.929937241024419 Mean 12.318522011105483 3rd Qu. 12.856194490192344 Max 42.0 NAs 0 NA% 0.0% To load in data from a local CSV file, use the method readtable. The returned object is of type DataFrame: // code in Chapter 8dataframes.jl using DataFrames fname = "winequality.csv" data = readtable(fname, separator = ';') typeof(data) # DataFrame size(data) # (1599,12) Here is a fraction of the output: The readtable method also supports reading in gzipped CSV files. Writing a DataFrame to a file can be done with the writetable function, which takes the filename and the DataFrame as arguments, for example, writetable("dataframe1.csv", df). By default, writetable will use the delimiter specified by the filename extension and write the column names as headers. Both readtable and writetable support numerous options for special cases. Refer to the docs for more information (refer to http://dataframesjl.readthedocs.org/en/latest/). To demonstrate some of the power of DataFrames, here are some queries you can do: Make a vector with only the quality information data[:quality] Give the wines with alcohol percentage equal to 9.5, for example, data[ data[:alcohol] .== 9.5, :] Here, we use the .== operator, which does element-wise comparison. data[:alcohol] .== 9.5 returns an array of Boolean values (true for datapoints, where :alcohol is 9.5, and false otherwise). data[boolean_array, : ] selects those rows where boolean_array is true. Count the number of wines grouped by quality with by(data, :quality, data -> size(data, 1)), which returns the following: 6x2 DataFrame | Row | quality | x1 | |-----|---------|-----| | 1 | 3 | 10 | | 2 | 4 | 53 | | 3 | 5 | 681 | | 4 | 6 | 638 | | 5 | 7 | 199 | | 6 | 8 | 18 | The DataFrames package contains the by function, which takes in three arguments: A DataFrame, here it takes data A column to split the DataFrame on, here it takes quality A function or an expression to apply to each subset of the DataFrame, here data -> size(data, 1), which gives us the number of wines for each quality value Another easy way to get the distribution among quality is to execute the histogram hist function hist(data[:quality]) that gives the counts over the range of quality (2.0:1.0:8.0,[10,53,681,638,199,18]). More precisely, this is a tuple with the first element corresponding to the edges of the histogram bins, and the second denoting the number of items in each bin. So there are, for example, 10 wines with quality between 2 and 3, and so on. To extract the counts as a variable count of type Vector, we can execute _, count = hist(data[:quality]); the _ means that we neglect the first element of the tuple. To obtain the quality classes as a DataArray class, we will execute the following: class = sort(unique(data[:quality])) We can now construct a df_quality DataFrame with the class and count columns as df_quality = DataFrame(qual=class, no=count). This gives the following output: 6x2 DataFrame | Row | qual | no | |-----|------|-----| | 1 | 3 | 10 | | 2 | 4 | 53 | | 3 | 5 | 681 | | 4 | 6 | 638 | | 5 | 7 | 199 | | 6 | 8 | 18 | To deepen your understanding and learn about the other features of Julia DataFrames (such as joining, reshaping, and sorting), refer to the documentation available at http://dataframesjl.readthedocs.org/en/latest/. Other file formats Julia can work with other human-readable file formats through specialized packages: For JSON, use the JSON package. The parse method converts the JSON strings into Dictionaries, and the json method turns any Julia object into a JSON string. For XML, use the LightXML package For YAML, use the YAML package For HDF5 (a common format for scientific data), use the HDF5 package For working with Windows INI files, use the IniFile package Summary In this article we discussed the basics of network programming in Julia. Resources for Article: Further resources on this subject: Getting Started with Electronic Projects? [article] Getting Started with Selenium Webdriver and Python [article] Handling The Dom In Dart [article]

0
0
7860

article-image-write-high-quality-code-python-15-tips-data-scientists-researchers

Aarthi Kumaraswamy

21 Mar 2018

5 min read

How to write high quality code in Python: 15+ tips for data scientists and researchers

Aarthi Kumaraswamy

21 Mar 2018

5 min read

Writing code is easy. Writing high quality code is much harder. Quality is to be understood both in terms of actual code (variable names, comments, docstrings, and so on) and architecture (functions, modules, and classes). In general, coming up with a well-designed code architecture is much more challenging than the implementation itself. In this post, we will give a few tips about how to write high quality code. This is a particularly important topic in academia, as more and more scientists without prior experience in software development need to code. High quality code writing first principles Writing readable code means that other people (or you in a few months or years) will understand it quicker and will be more willing to use it. It also facilitates bug tracking. Modular code is also easier to understand and to reuse. Implementing your program's functionality in independent functions that are organized as a hierarchy of packages and modules is an excellent way of achieving high code quality. It is easier to keep your code loosely coupled when you use functions instead of classes. Spaghetti code is really hard to understand, debug, and reuse. Iterate between bottom-up and top-down approaches while working on a new project. Starting with a bottom-up approach lets you gain experience with the code before you start thinking about the overall architecture of your program. Still, make sure you know where you're going by thinking about how your components will work together. How these high quality code writing first principles translate in Python? Take the time to learn the Python language seriously. Review the list of all modules in the standard library—you may discover that functions you implemented already exist. Learn to write Pythonic code, and do not translate programming idioms from other languages such as Java or C++ to Python. Learn common design patterns; these are general reusable solutions to commonly occurring problems in software engineering. Use assertions throughout your code (the assert keyword) to prevent future bugs (defensive programming). Start writing your code with a bottom-up approach; write independent Python functions that implement focused tasks. Do not hesitate to refactor your code regularly. If your code is becoming too complicated, think about how you can simplify it. Avoid classes when you can. If you can use a function instead of a class, choose the function. A class is only useful when you need to store persistent state between function calls. Make your functions as pure as possible (no side effects). In general, prefer Python native types (lists, tuples, dictionaries, and types from Python's collections module) over custom types (classes). Native types lead to more efficient, readable, and portable code. Choose keyword arguments over positional arguments in your functions. Argument names are easier to remember than argument ordering. They make your functions self-documenting. Name your variables carefully. Names of functions and methods should start with a verb. A variable name should describe what it is. A function name should describe what it does. The importance of naming things well cannot be overstated. Every function should have a docstring describing its purpose, arguments, and return values, as shown in the following example. You can also look at the conventions chosen in popular libraries such as NumPy. The exact convention does not matter, the point is to be consistent within your code. You can use a markup language such as Markdown or reST to do that. Follow (at least partly) Guido van Rossum's Style Guide for Python, also known as Python Enhancement Proposal number 8 (PEP8). It is a long read, but it will help you write well-readable Python code. It covers many little things such as spacing between operators, naming conventions, comments, and docstrings. For instance, you will learn that it is considered a good practice to limit any line of your code to 79 or 99 characters. This way, your code can be correctly displayed in most situations (such as in a command-line interface or on a mobile device) or side by side with another file. Alternatively, you can decide to ignore certain rules. In general, following common guidelines is beneficial on projects involving many developers. You can check your code automatically against most of the style conventions in PEP8 with the pycodestyle Python package. You can also automatically make your code PEP8-compatible with the autopep8 package. Use a tool for static code analysis such as flake8 or Pylint. It lets you find potential errors or low-quality code statically, that is, without running your code. Use blank lines to avoid cluttering your code (see PEP8). You can also demarcate sections in a long Python module with salient comments. A Python module should not contain more than a few hundreds lines of code. Having too many lines of code in a module may be a sign that you need to split it into several modules. Organize important projects (with tens of modules) into subpackages (subdirectories). Take a look at how major Python projects are organized. For example, the code of IPython is well-organized into a hierarchy of subpackages with focused roles. Reading the code itself is also quite instructive. Learn best practices to create and distribute a new Python package. Make sure that you know setuptools, pip, wheels, virtualenv, PyPI, and so on. Also, you are highly encouraged to take a serious look at conda, a powerful and generic packaging system created by Anaconda. Packaging has long been a rapidly evolving topic in Python, so read only the most recent references. You enjoyed an excerpt from Cyrille Rossant’s latest book, IPython Cookbook, Second Edition. This book contains 100+ recipes for high-performance scientific computing and data analysis, from the latest IPython/Jupyter features to the most advanced tricks, to help you write better and faster code. For free recipes from the book, head over to the Ipython Cookbook Github page. If you loved what you saw, support Cyrille’s work by buying a copy of the book today!

0
1
7857

Sunith Shetty

31 Jan 2018

6 min read

How to run Spark in Mesos

Sunith Shetty

31 Jan 2018

6 min read

This article is an excerpt from a book written by Muhammad Asif Abbasi titled Learning Apache Spark 2. In this book, you will learn how to perform big data analytics using Spark streaming, machine learning techniques and more. From the article given below, you will learn how to operate Spark in Mesos cluster manager. What is Mesos? Mesos is an open source cluster manager started as a UC Berkley research project in 2008 and quite widely used by a number of organizations. Spark supports Mesos, and Matei Zahria has given a keynote at Mesos Con in June of 2016. Here is a link to the YouTube video of the keynote. Before you start If you haven't installed Mesos previously, the getting started page on the Apache website gives a good walk through of installing Mesos on Windows, MacOS, and Linux. Follow the URL https://mesos.apache.org/getting-started/. Once installed you need to start-up Mesos on your cluster Starting Mesos Master: ./bin/mesos-master.sh -ip=[MasterIP] -workdir=/var/lib/mesos Start Mesos Agents on all your worker nodes: ./bin/mesos-agent.sh - master=[MasterIp]:5050 -work-dir=/var/lib/mesos Make sure Mesos is up and running with all your relevant worker nodes configured: http://[MasterIP]@5050 Make sure that Spark binary packages are available and accessible by Mesos. They can be placed on a Hadoop-accessible URI for example: HTTP via http:// S3 via s3n:// HDFS via hdfs:// You can also install spark in the same location on all the Mesos slaves, and configure spark.mesos.executor.home to point to that location. Running in Mesos Mesos can have single or multiple masters, which means the Master URL differs when submitting application from Spark via mesos: Single Master Mesos://sparkmaster:5050 Multiple Masters (Using Zookeeper) Mesos://zk://master1:2181, master2:2181/mesos Modes of operation in Mesos Mesos supports both the Client and Cluster modes of operation: Client mode Before running the client mode, you need to perform couple of configurations: Spark-env.sh Export MESOS_NATIVE_JAVA_LIBRARY=<Path to libmesos.so [Linux]> or <Path to libmesos.dylib[MacOS]> Export SPARK_EXECUTOR_URI=<URI of Spark zipped file uploaded to an accessible location e.g. HTTP, HDFS, S3> Set spark.executor.uri to URI of Spark zipped file uploaded to an accessible location e.g. HTTP, HDFS, S3 Batch Applications For batch applications, in your application program you need to pass on the Mesos URL as the master when creating your Spark context. As an example: val sparkConf = new SparkConf() .setMaster("mesos://mesosmaster:5050") .setAppName("Batch Application") .set("spark.executor.uri", "Location to Spark binaries (Http, S3, or HDFS)") val sc = new SparkContext(sparkConf) If you are using Spark-submit, you can configure the URI in the conf/sparkdefaults.conf file using spark.executor.uri. Interactive applications When you are running one of the provided spark shells for interactive querying, you can pass the master argument e.g: ./bin/spark-shell -master mesos://mesosmaster:5050 Cluster mode Just as in YARN, you run spark on mesos in a cluster mode, which means the driver is launched inside the cluster and the client can disconnect after submitting the application, and get results from the Mesos WebUI. Steps to use the cluster mode Start the MesosClusterDispatcher in your cluster: ./sbin/start-mesos-dispatcher.sh -master mesos://mesosmaster:5050. This will generally start the dispatcher at port 7077. From the client, submit a job to the mesos cluster by Spark-submit specifying the dispatcher URL. Example: ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master mesos://dispatcher:7077 --deploy-mode cluster --supervise --executor-memory 2G --total-executor-cores 10 s3n://path/to/examples.jar Similar to Spark Mesos has lots of properties that can be set to optimize the processing. You should refer to the Spark Configuration page (http://spark.apache.org/docs/latest/configuration.html) for more Information. Mesos run modes Spark can run on Mesos in two modes: Coarse Grained (default-mode): Spark will acquire a long running Mesos task on each machine. This offers a much cost of statup, but the resources will continue to be allocated to spark for the complete duration of the application. Fine Grained (deprecated): The fine grained mode is deprecated as in this case each mesos task is created per Spark task. The benefit of this is each application receives cores as per its requirements, but the initial bootstrapping might act as a deterrent for interactive applications. Key Spark on Mesos configuration properties While Spark has a number of properties that can be configured to optimize Spark processing, some of these properties are specific to Mesos. We'll look at few of those key properties here. Property Name Meaning/Default Value spark.mesos.coarse Setting it to true (default value), will run Mesos in coarse grained mode. Setting it to false will run it in fine-grained mode. spark.mesos.extra.cores This is more of an advertisement rather than allocation in order to improve parallelism. An executor will pretend that it has extra cores resulting in the driver sending it more work. Default=0 spark.mesos.mesosExecutor.cores Only works in fine grained mode. This specifies how many cores should be given to each Mesos executor. spark.mesos.executor.home Identifies the directory of Spark installation for the executors in Mesos. As discussed, you can specify this using spark.executor.uri as well, however if you have not specified it, you can specify it using this property. spark.mesos.executor.memoryOverhead The amount of memory (in MBs) to be allocated per executor. spark.mesos.uris A comma separated list of URIs to be downloaded when the driver or executor is launched by Mesos. spark.mesos.prinicipal The name of the principal used by Spark to authenticate itself with Mesos. You can find other configuration properties at the Spark documentation page (http://spark.apache.org/docs/latest/running-on-mesos.html#spark-properties). To summarize, we covered the objective to get you started with running Spark on Mesos. To know more about Spark SQL, Spark Streaming, Machine Learning with Spark, you can refer to the book Learning Apache Spark 2.

0
0
7850

Packt

10 Dec 2010

10 min read

Working with Colors in Scribus

Packt

10 Dec 2010

10 min read

Scribus 1.3.5: Beginner's Guide Create optimum page layouts for your documents using productive tools of Scribus. Master desktop publishing with Scribus Create professional-looking documents with ease Enhance the readability of your documents using powerful layout tools of Scribus Packed with interesting examples and screenshots that show you the most important Scribus tools to create and publish your documents. Applying colors in Scribus Applying color is as basic as creating a frame or writing text. In this article we will often give color values. Each time we need to, we will use the first letter of the color followed by its value. For example, C75 will mean 75 percent of cyan. K will be used for black and B for blue. There are five main things you could apply colors to: Frame or Shape fill Frame or Shape border Line Text Text border You'd like to colorize pictures too. It's a very different method, using duotone or any equivalent image effect. Applying a color to a frame means that you will use the Colors tab of the PP, whereas applying color to text will require you to go to the Color & Effects expander in the Text tab. In both cases you'll find what's needed to apply color to the fill and the border, but the user interfaces are a bit different. (Move the mouse over the image to enlarge.) Time for action – applying colors to a Text Frame's text Colors on frames will use the same color list. Let's follow some steps to see how this is done. Draw a Text Frame where you want it on a page. Type some text inside like "colors of the world" or use Insert | Sample Text. Go to the Colors tab of the PP (F2). Click on the second button placed above the color list to specify that you want to apply the changes to the fill. Then click on the color you want in the list below, for example, Magenta. Click on the paintbrush button, and apply a black color that will be applied to the border (we could call it stroke too). Don't forget that applying a stroke color will need some border refinements in the Line tab to set the width and style of the border. If you need more information about these options. Now, you can select the text or some part of it and go to the Colors & Effects expander of the Text tab. Here you will again see the same icon we used previously. Each has its own color list. Let's choose Yellow for the text color. The stroke color cannot be changed. To change this, click on the Shadow button placed below, and now choose black as the stroke color. The text shadow should be black. What just happened? Color on text is quicker than frame colors in some ways because each has its own list. So, there is no need to click on any button, and you can see both at a glance. Just remember that text has no stroke color activated when setting it first. You need to add the stroke or shadow to a selection to activate the border color for that selection. Quick apply in story editor If, like me, you like the Story Editor (Edit | Edit Text), notice that colors can be applied from there. They are not displayed in the editor but will be displayed when the changes will be applied to the layout. This is much faster, but you need to know exactly what you're doing and need to be precise in your selection. If you need to apply the same color setting to a word in the overall document, you can alternatively use the Edit | Search/Replace window. You can set there the word you're looking for in the first field, and in the right-hand side, replace with the same word, and choose the Fill and Stroke color that you want to apply. Of course, it would be nice if this window could let us apply character styles to make future changes easier. The Scribus new document receives a default color list, which is the same all over your document. In this article, we will deal with many ways of adapting existing colors or creating new ones. Applying shade or transparency Shade and transparency are two ways of setting more precisely how a specific color will be applied on your items. Shades and transparencies are fake effects that will be interpreted by some later elements of the printing workflow, such as Raster Image Processors, to know how the set color can be rendered with pure colors. This is the key point of reproducing colors: if you want a gray, you'll generally have a black color for that. In offset printing which is the reference, the size of the point will vary relatively to the darkness of the black you chose. This will be optically interpreted by the reader. Using shades Each color property has a Shade value. The default is set to 100 percent, meaning that the color will be printed fully saturated. Reducing the shade value will produce a lighter color. When at 0 percent, the color, whatever it may be, will be interpreted as white. On a pure color item like any primary or spot, changing the shade won't affect the color composition. However, on processed colors that are made by mixing several primary colors, modifying the shade will proportionally change the amount of each ink used in the process. Our C75 M49 Y7 K12 at a 50 percent shade value will become a C37 M25 Y4 K6 color in the final PDF. Less color means less ink on the paper and more white (or paper color), which results in a lighter color. You should remember that Shade is a frame property and not a color property. So, if you apply a new color to the frame, the shade value will be kept and applied immediately. To change the shade of the color applied to some characters, it will be a bit different: we don't have a field to fill but a drop-down list with predefined values of 10 percent increments. If you need another value, just choose Other to display a window in which you'll add the amount that you exactly need. You can do the same in the Text Editor. Using transparency While shade is used to lighten a color, the Opacity value will tell you how the color will be less solid. Once again, the range goes from 0%, meaning the object is completely transparent and invisible, to 100% to make it opaque. The latter value is the default. When two objects overlap, the top object hides the bottom object. But when Opacity is decreased, the object at the bottom will become more and more visible. One difference to notice is that Opacity won't affect only the color rendering but the content too (if there is some). As for Shade, Opacity too is applied separately to the fill and to the stroke. So you'll need to set both if needed. One important aspect is that Shade and Opacity can both be applied on the frame and a value 50% of each will give a lighter color than if only one was used. Several opacity values applied to objects show how they can act and add to each other: The background for the text in the title, in the following screenshot, is done in the same color as the background at the top of the page. Using transparency or shade can help create this background and decrease the number of used colors. Time for action – transparency and layers Let's now use transparency and layers to create some custom effects over a picture, as can often be done for covers. Create a new document and display the Layers window from the Windows menu. This window will already contain a line called Background. You can add a layer by clicking on the + button at the bottom left-hand side of the window: it will be called New Layer 1. You can rename it by double-clicking on its name. On the first page of it, add an Image Frame that covers the entire page. Then draw a rectangular shape that covers almost half of the page height. Duplicate this layer by clicking on the middle button placed at the bottom of the Layers window. Select the visibility checkbox (it is the first column headed with an eye icon) of this layer to hide it. We'll modify the transparency of each object. Click on New Layer 1 to specify that you want to work on this layer; otherwise you won't be able to select its frames. The frames or shapes you'll create from now on will be added to this layer called New Layer 1. Select the black shape and decrease the Opacity value of the Colors tab of the PP to 50%. Do the same for the Image Frame. Now, hide this layer by clicking on its visibility icon and show the top layer. In the Layers window, verify if this layer is selected and decrease its opacity. What just happened? If there is a need to make several objects transparent at once, an idea would be to put them on a layer and set the layer Opacity. This way, the same amount of transparency will be applied to the whole. You can open the Layer window from the Window menu. When working with layers, it's important to have the right layer selected to work on it. Basically, any new layer will be added at the top of the stack and will be activated once created. When a layer is selected, you can change the Opacity of this layer by using the field on the top right-hand side of the Layer window. Since it is applied to the layer itself, all the objects placed on it will be affected, but their own opacity values won't be changed. If you look at the differences between the two layers we have made, you'll see that the area in the first black rectangle explicitly becomes transparent by itself because you can see the photo through it. This is not seen in the second. So using layer, as we have seen, can help us work faster when we need to apply the same opacity setting to several objects, but we have to take care, because the result is slightly different. Using layers to blend colors More rarely, layers can be used to mix colors. Blend Mode is originally set to Normal, which does no blending. But if you use any other mode on a layer, its colors will be mixed with colors of the item placed on a lower layer, relatively to the chosen mode. This can be very creative. If you need a more precise action, Blend Mode can be set to Single Object from the Colors tab of the PP. Just give it a try. Layers are still most commonly used to organize a document: a layer for text, a layer for images, a layer for each language for a multi-lingual document, and so on. They are a smart way to work, but are not necessary in your documents and really we can work without them in a layout program.

0
0
7849

article-image-solving-many-many-relationship-dimensional-modeling

Packt

28 Dec 2009

3 min read

Solving Many-to-Many Relationship in Dimensional Modeling

Packt

28 Dec 2009

3 min read

Bridge table solution We will use a simplified book sales dimensional model as an example to demonstrate our bridge solution. Our book sales model initially has the SALES_FACT fact table and two dimension tables: BOOK_DIM and DATE_DIM. The granularity of the model is sales amount by date (daily) and by book. Assume the BOOK_DIM table has five rows: BOOK_SK TITLE AUTHOR 1 Programming in Java King, Chan 3 Learning Python Simpson 2 Introduction to BIRT Chan, Gupta, Simpson (Editor) 4 Advanced Java King, Chan 5 Beginning XML King, Chan (Foreword) The DATE_DIM has two rows: DATE_SK DT 1 11-DEC-2009 2 12-DEC-2009 3 13-DEC-2009 And, the SALES_FACT table has ten rows: DATE_SK BOOK_SK SALES_AMT 1 1 1000 1 2 2000 1 3 3000 1 4 4000 2 2 2500 2 3 3500 2 4 4500 2 5 5500 3 3 8000 3 4 8500 Note that:The columns with _sk suffixes in the dimension tables are surrogate keys of the dimension tables; these surrogate keys relate the rows of the fact table to the rows in the dimension tables.King and Chan have collaborated in three books; two as co-authors, while in the “Beginning XML” Chan’s contribution is writing its foreword. Chan also co-authors the “Introduction to BIRT”.Simpson singly writes the “Learning Python” and is an editor for “Introduction to BIRT”. To analyze daily book sales, you simply run a query, joining the dimension tables to the fact table: SELECT dt, title, sales_amt FROM sales_fact s, date_dim d, book_dim bWHERE s.date_sk = d.date_skAND s.book_sk = b.book_sk This query produces the result showing the daily sales amount of every book that has a sale: DT TITLE SALES_AMT 11-DEC-09 Advanced Java 4000 11-DEC-09 Introduction to BIRT 2000 11-DEC-09 Learning Python 3000 11-DEC-09 Programming in Java 1000 12-DEC-09 Advanced Java 4500 12-DEC-09 Beginning XML 5500 12-DEC-09 Introduction to BIRT 2500 12-DEC-09 Learning Python 3500 13-DEC-09 Advanced Java 8500 13-DEC-09 Learning Python 8000 You will notice that the model does not allow you to readily analyze the sales by individual writer—the AUTHOR column is multi-value, not normalized, which violates the dimension modeling rule (we can resolve this by creating a view to “bundle” the AUTHOR_DIM with the SALES_FACT tables such that the AUTHORtable connects to the view as a normal dimension. We will create the view a bit later in this section). We can solve this issue by adding an AUTHOR_DIM and its AUTHOR_GROUP bridge table. The AUTHOR_DIM must contain all individual contributors, which you will have to extract from the books and enter into the table. In our example we have four authors. AUTHOR_SK NAME 1 Chan 2 King 3 Gupta 4 Simpson The weighting_factor column in the AUTHOR_GROUP bridge table contains a fractional numeric value that determines the contribution of an author to a book. Typically the authors have equal contribution to the book they write, but you might want to have different weighting_factor for different roles; for example, an editor and a foreword writer have smaller weighting_factors than that of an author. The total weighting_factors for a book must always equal to 1. The AUTHOR_GROUP bridge table has one surrogate key for every group of authors (a single author is considered a group that has one author only), and as many rows with that surrogate key for every contributor in the group.

0
0
7844

article-image-building-consumer-review-website-using-wordpress-3

Packt

06 Aug 2010

15 min read

Building a Consumer Review Website using WordPress 3

Packt

06 Aug 2010

15 min read

(For more resources on Wordpress, see here.) Building a consumer review website will allow you to supply consumers with the information that they seek and then, once they've decided to make a purchase, your site can direct them to a source for the product or service. This process can ultimately allow you to earn some nice commission checks because it's only logical that you would affiliate yourself with a number of the sites to which you will be directing consumers. The great thing about using the WP Review Site plugin to build your consumer review website is that you can provide people with an unbiased source of public opinions on any product or service that you can imagine. You will never have to resort to the hard sell in order to drive traffic to the companies that you've affiliated yourself with. Instead, consumers can research the reviews posted on your website and,ultimately, make a purchase feeling confident that they're making the right decision. In this article, you will learn about the following: Present reviews in the most convenient way possible for visitors browsing your site Specify the ratings criteria that site visitors will use when reviewing the products or services included on your website Display informational comparison tables on your site's index and category pages Provide visitors with the location of local businesses using Google Maps Perform the additional steps required when writing a post now that the WP Review Site plugin has been introduced into the process Perform either automatic and manual integration so that you can use a theme of your own rather than either of the ones provided with this plugin Once this project is complete, you will have succeeded in creating a site that's similar to the one shown in the following screenshot: Introducing WP Review Site With the WP Review Site plugin you will be able to build a consumer review site where visitors can share their opinions about the products or services of your choosing. The plugin, which can be found at WP Review Site, can be used to build a dedicated review site or, if you would like consumer reviews to make up only a subsection of your website, then you can specify certain categories where they should appear. This plugin gives you complete control over where ratings appear and where they don't since you can choose to include or exclude them on any category, page, or post. The WP Review Site plugin seamlessly integrates with WordPress by, among other things, altering the normal appearance and functionality of the comments submission form. This plugin provides visitors with a way to write a review and assign stars to the ratings categories that you previously defined. They can also write a review and opt to provide no stars without harming the overall rating presented on your site, since no stars is interpreted as though no rating was given. WP Review Site plugin makes it easy for you to present your visitors with concise information. Using the features available with this plugin, you can build comparison tables based upon your posts and user reviews. In order to accomplish this, you will need to configure a few settings and then the plugin will take care of the rest. Typically, WordPress displays posts in chronological order, but that doesn't make much sense on a consumer review site where visitors want to view posts based upon other factors such as the number of positive reviews that a particular product or service has received. The developer behind WP Site Review took that into consideration and has included two alternative sorting methods for your site's posts. The developer has even included a Bayesian weighting feature so that reviews are ordered in the most logical way possible. Right about now, you're probably wondering what Bayesian weighting is and how it works. What it does is provide a way to mathematically calculate the rating of products and/or services based upon the credibility of the votes that have been cast. If an item receives only a few votes, then it can't be said with any certainty that that's how the general public feels. If an item receives several votes, then it can be safely assumed that many others hold the same opinion. So, with Bayesian weighting, a product that has received only one five star review won't outrank another that has received fifteen four star reviews. As the product that received one five star review garners more ratings, its reviews will grow in credibility and, if it continues to receive high ratings, it will eventually become credible enough to outrank the other reviews. If you're planning to create a website where visitors can come and review local businesses, then you might consider this plugins ability to automatically embed Google Maps quite handy. After configuring the settings on the plugin's Google Maps screen you will be able to type the address for a business into a custom field when writing a post and then the plugin will take care of the rest. The WP Review Site plugin also includes two sidebar widgets that can used with any widget-ready theme. These widgets will allow you to display a list of top rated items and a list of recent reviews. Lastly, the themes provided with this plugin include built-in support for the hReview microformat. This means that Google will easily be able to extract and highlight reviews from your website. That feature will prove to be very beneficial for driving search engine traffic to your site. Installing WP Review Site Once you've installed WordPress you can then concentrate on the installation of the WP Review Site plugin and its accompanying themes. First, extract the wpreviewsite.zip archive. Inside you will find a plugins folder and a themes folder. Within the plugins folder is another folder named review-site. Since none of these folders are zipped, you will need to upload them using either an FTP program or the file manager provided by your web host. So, upload the review-site folder to the wp-content/plugins directory on your server. If you plan to use one of the themes provided with this plugin, then you will next need to upload the contents of the themes folder to the wp-content/themes directory. Setting up and configuring WP Review Site With the installation process complete, you will now need to activate the WP Review Site plugin. Once that's finished, a Review Site menu will appear on the left side of your screen. This menu contains links to the settings screens for this plugin. Before you delve into the configuration process you must first activate the theme that you plan to use on your consumer review website. Using one of the provided themes is a bit easier. That's because using any other theme will mean that you must integrate the functionality of WP Review Site into it. Now that you know the benefits offered by the themes that are bundled with this plugin, click on Appearance | Themes. Once there, activate either Award Winning Hosts, Bonus Black, or a theme of your choice. General Settings Navigate to Review Site | General Settings to be taken to the first of the WP Review Site settings screens. On this screen, Sort Posts By is the first setting that you will encounter. Rather than displaying reviews in the normal chronological order used by WordPress you should, instead, select either the Average User Rating (Weighted) or the Number of Reviews/Comments option. Either of these settings will provide a much more user-friendly experience for your visitors. If you want to make it impossible for site visitors to submit a comment without also choosing a rating, tick the checkbox next to Require Ratings with All Comments. If you don't want to make this a requirement, then you can leave this setting as is. This setting will, of course, only apply to posts that you would like your visitors to rate. On normal posts, that don't include rating stars in the comment form area, it will still be possible for your visitors to submit a comment. When using one of the themes provided with the plugin, none of the other settings on this screen need to be configured. If you would like to integrate this plugin into a different theme, then, depending upon the method that you choose, you may need to revisit this screen later on. No matter how you're handling the theme issue, you can, for now, just click Save Settings before proceeding to the next screen. Rating Categories To access the next settings screen, click on Review Site | Rating Categories. Here you can add categories for people to rate when submitting reviews. These categories shouldn't be confused with the categories used in WordPress for organizational purposes. These WP Review Site categories are more like ratings criteria. By default, WP Review Site includes a category called Overall Rating, but you can click the remove link to delete it if you like. To add your first rating category, simply enter its title into the Add a Category textbox and then click Save Settings. The screen will then refresh and your newly created rating category will now appear under the Edit Rating Categories section of the screen. To add additional rating categories, simply repeat the process that you previously completed. Once you've finished adding rating categories, you will next need to turn your attention to the Bulk Apply Rating Categories section of the screen. In the Edit Rating Categories area you will see all of the rating categories that you just finished adding to your site. If you want to simplify matters, and apply these rating categories to all of the posts on your site, tick the checkbox next to each of the available rating categories. Then, from the Apply to Posts in Category drop-down menu, select All Categories. This is most likely the configuration that you will use if you're building a website entirely dedicated to providing consumer reviews. Once you've finished, click Save Settings. If you, instead, want your newly added rating categories to only appear on certain categories, then bypass the Edit Rating Categories area for now and first look to the Apply to Posts in Category settings area. Currently this will only show All Categories and Uncategorized. The lack of categories in this menu is being caused by two things. First, you haven't added any WordPress categories to your site yet. Secondly, categories won't be included in this menu until they contain at least one post. To solve part of this problem, open a new browser window and then, navigate to Posts | Categories. Then, add the categories that you would like to include on your website. Now, click on Posts | Edit to visit the Edit Posts screen. At the moment, the Hello world! post is the only one published on your site and you can use it to force your site's categories to appear in the Apply to Posts in Category drop-down menu. So, hover over the title of this post and then, from the now visible set of links, click Quick Edit. In the Categories section of the Quick Edit configuration area, tick the checkbox next to each of the categories found on your site. Then, click Update Post. After content has been added to each of your site's categories, you can delete the Hello world! post, since you will no longer need to use it to force the categories to appear in the Apply to Posts in Category drop-down menu. Now, return to the Rating Categories screen and then select the first category that you want to configure from the Apply to Posts in Category drop-down menu. With that selected, in the Edit Rating Categories area, tick the checkbox next to each rating category that you want to appear within that WordPress category. Then, click Save Settings. Repeat this process for each of the WordPress categories to which you would like rating categories to be added. Comparison Tables If you wish, you can add a comparison table to either the home page or the category pages on your site. To do this, you need to visit the Comparison Tables screen, so click on Review Site | Comparison Tables. If you want to display a comparison table on your home page, then tick the checkbox next to Display a Comparison Table on Home Page. If you would like to include all of your site's categories in the comparison table that will be displayed on the home page, then leave the Categories To Display On Home Page textbox as is. However, if you would prefer to include only certain categories, then enter their category IDs, separated by commas, into the textbox instead. You can learn the ID numbers that have been assigned to each of your site's categories by opening a new browser window and then navigating to Posts | Categories. Once there, hover over the title of each of the categories found on the right hand side of your screen. As you do, look at the URL that appears in your browser's status bar and make a note of the number that appears directly after tag_ID=. That's the number that you will need to enter in the Comparison Table screen. If you want to display a comparison table in one or more categories, then tick the checkbox next to Display a Comparison Table on Category Page(s). Now, return to the Comparison Table screen. If you want a comparison table to be displayed on each of your category pages, leave the Categories To Display Comparison Table On textbox at its default. Otherwise, enter a list of comma separated category IDs into the textbox for the categories where you want to display comparison tables. The Number of Posts in the Table setting is currently set to 5, but you can enter another value if you would like a different number of posts to be included in each comparison table. When writing posts, you might use custom fields to include additional information. If you would like that information to be displayed in your comparison tables you will need to enter the names of those fields, separated by commas, into the Custom Fields to Display textbox. Lastly, you can change the text that appears in the Text for the Visit Site link in the Table if you wish or you may leave it at its default. With these configurations complete, click Save Settings. In this screenshot, you can see what a populated comparison table will look like on your website: Google Maps If you plan on featuring reviews centered around local businesses, then you might want to consider adding Google Maps to your site. This will make it easy for visitors to see exactly where each business is located. You can access this settings screen by clicking on Review Site | Google Maps. To activate this feature, tick the checkbox next to Display a Google Map on Posts/Pages with mapaddress Custom Field. Next, you need to use the Map Position setting to specify where these Google Maps will appear in relation to the content. You can choose to use either the Top of Post or Bottom of Post position. The Your Google Maps API Key textbox is next. Here you will need to enter a Google Maps API key. If you don't have a Google Maps API key for this domain, then you will need to visit Google to generate one. To do this, right-click on the link provided on the Google Maps screen and then open that link in a new browser window. You will then be taken to the Google Maps API sign up screen, which can be found at Google Maps API sign up. If you've ever signed up to use any of Google's services, then you can use that username and password to log in. If you don't have an account with Google, create one now. Take a moment to read the information and terms presented on the Google Maps API sign up page. After you've finished reviewing this text, if it's acceptable to you, enter the URL for your website into the My web site URL textbox and then click Generate API Key. You will then be taken to a thank you screen where your API key will be displayed. Copy the API key and then return to the Google Maps screen on your website. Once there, paste your API key into the textbox for Your Google Maps API Key. The Map Width and Map Height settings are next. By default, these are configured to 400px and 300px. If you would prefer that the maps be displayed at a different size, then enter new values into each of these textboxes. The last setting is Map Zoom Level (1-5), which is currently set to 3. This setting should be fine, but you may change it if you wish. Finally, click Save Settings. When you publish a post that includes the mappadress custom field, this is what the Google Map will look like on your site.

0
0
7834

article-image-sop-module-setup-microsoft-dynamics-gp

Packt

17 May 2011

14 min read

SOP Module Setup in Microsoft Dynamics GP

Packt

17 May 2011

14 min read

0
0
7804

How-To Tutorials

article-image-understanding-basics-rxjava

Packt

20 Jun 2017

15 min read

Understanding the Basics of RxJava

Packt

20 Jun 2017

15 min read

0
0
7802

article-image-blocking-common-attacks-using-modsecurity-25-part-3

Packt

01 Dec 2009

12 min read

Blocking Common Attacks using ModSecurity 2.5: Part 3

Packt

01 Dec 2009

12 min read

Source code revelation Normally, requesting a file with a .php extension will cause mod_php to execute the PHP code contained within the file and then return the resulting web page to the user. If the web server is misconfigured (for example if mod_php is not loaded) then the .php file will be sent by the server without interpretation, and this can be a security problem. If the source code contains credentials used to connect to an SQL database then that opens up an avenue for attack, and of course the source code being available will allow a potential attacker to scrutinize the code for vulnerabilities. Preventing source code revelation is easy. With response body access on in ModSecurity, simply add a rule to detect the opening PHP tag: Prevent PHP source code from being disclosed SecRule RESPONSE_BODY "<?" "deny,msg:'PHP source code disclosure blocked'" Preventing Perl and JSP source code from being disclosed works in a similar manner: # Prevent Perl source code from being disclosed SecRule RESPONSE_BODY "#!/usr/bin/perl" "deny,msg:'Perl source code disclosure blocked'" # Prevent JSP source code from being disclosed SecRule RESPONSE_BODY "<%" "deny,msg:'JSP source code disclosure blocked'" Directory traversal attacks Normally, all web servers should be configured to reject attempts to access any document that is not under the web server's root directory. For example, if your web server root is /home/www, then attempting to retrieve /home/joan/.bashrc should not be possible since this file is not located under the /home/www web server root. The obvious attempt to access the /home/joan directory is, of course, easy for the web server to block, however there is a more subtle way to access this directory which still allows the path to start with /home/www, and that is to make use of the .. symbolic directory link which links to the parent directory in any given directory. Even though most web servers are hardened against this sort of attack, web applications that accept input from users may still not be checking it properly, potentially allowing users to get access to files they shouldn't be able to view via simple directory traversal attacks. This alone is reason to implement protection against this sort of attack using ModSecurity rules. Furthermore, keeping with the principle of Defense in Depth, having multiple protections against this vulnerability can be beneficial in case the web server should contain a flaw that allows this kind of attack in certain circumstances. There is more than one way to validly represent the .. link to the parent directory. URL encoding of .. yields % 2e% 2e, and adding the final slash at the end we end up with % 2e% 2e% 2f(please ignore the space). Here, then is a list of what needs to be blocked: ../ ..% 2f .% 2e/ % 2e% 2e% 2f % 2e% 2e/ % 2e./ Fortunately, we can use the ModSecurity transformation t:urlDecode. This function does all the URL decoding for us, and will allow us to ignore the percent-encoded values, and thus only one rule is needed to block these attacks: SecRule REQUEST_URI "../" "t:urlDecode,deny" Blog spam The rise of weblogs, or blogs, as a new way to present information, share thoughts, and keep an online journal has made way for a new phenomenon: blog comments designed to advertise a product or drive traffic to a website. Blog spam isn't a security problem per se, but it can be annoying and cost a lot of time when you have to manually remove spam comments (or delete them from the approval queue, if comments have to be approved before being posted on the blog). Blog spam can be mitigated by collecting a list of the most common spam phrases, and using the ability of ModSecurity to scan POST data. Any attempted blog comment that contains one of the offending phrases can then be blocked. From both a performance and maintainability perspective, using the @pmFromFile operator is the best choice when dealing with large word lists such as spam phrases. To create the list of phrases to be blocked, simply insert them into a text file, for example, /usr/local/spamlist.txt: viagra v1agra auto insurance rx medications cheap medications ... Then create ModSecurity rules to block those phrases when they are used in locations such as the page that creates new blog comments: # # Prevent blog spam by checking comment against known spam # phrases in file /usr/local/spamlist.txt # <Location /blog/comment.php> SecRule ARGS "@pmFromFile /usr/local/spamlist.txt" "t: lowercase,deny,msg:'Blog spam blocked'" </Location> Keep in mind that the spam list file can contain whole sentences—not just single words—so be sure to take advantage of that fact when creating the list of known spam phrases. SQL injection SQL injection attacks can occur if an attacker is able to supply data to a web application that is then used in unsanitized form in an SQL query. This can cause the SQL query to do completely different things than intended by the developers of the web application. Consider an SQL query like this: SELECT * FROM user WHERE username = '%s' AND password = '%s'; The flaw here is that if someone can provide a password that looks like ' OR '1'='1, then the query, with username and password inserted, will become: SELECT * FROM user WHERE username = 'anyuser' AND password = '' OR '1'='1'; This query will return all users in the results table, since the OR '1'='1' part at the end of the statement will make the entire statement true no matter what username and password is provided. Standard injection attempts Let's take a look at some of the most common ways SQL injection attacks are performed. Retrieving data from multiple tables with UNION An SQL UNION statement can be used to retrieve data from two separate tables. If there is one table named cooking_recipes and another table named user_credentials, then the following SQL statement will retrieve data from both tables: SELECT dish_name FROM recipe UNION SELECT username, password FROM user_credentials; It's easy to see how the UNION statement can allow an attacker to retrieve data from other tables in the database if he manages to sneak it into a query. A similar SQL statement is UNION ALL, which works almost the same way as UNION—the only difference is that UNION ALL will not eliminate any duplicate rows returned in the result. Multiple queries in one call If the SQL engine allows multiple statements in a single SQL query then seemingly harmless statements such as the following can present a problem: SELECT * FROM products WHERE id = %d; If an attacker is able to provide an ID parameter of 1; DROP TABLE products;, then the statement suddenly becomes: SELECT * FROM products WHERE id = 1; DROP TABLE products; When the SQL engine executes this, it will first perform the expected SELECT query, and then the DROP TABLE products statement, which will cause the products table to be deleted. Reading arbitrary files MySQL can be used to read data from arbitrary files on the system. This is done by using the LOAD_FILE() function: SELECT LOAD_FILE("/etc/passwd"); This command returns the contents of the file /etc/passwd. This works for any file to which the MySQL process has read access. Writing data to files MySQL also supports the command INTO OUTFILE which can be used to write data into files. This attack illustrates how dangerous it can be to include user-supplied data in SQL commands, since with the proper syntax, an SQL command can not only affect the database, but also the underlying file system. This simple example shows how to use MySQL to write the string some data into the file test.txt: mysql> SELECT "some data" INTO OUTFILE "test.txt"; Preventing SQL injection attacks There are three important steps you need to take to prevent SQL injection attacks: Use SQL prepared statements. Sanitize user data. Use ModSecurity to block SQL injection code supplied to web applications. These are in order of importance, so the most important consideration should always be to make sure that any code querying SQL databases that relies on user input should use prepared statements. A prepared statement looks as follows: SELECT * FROM books WHERE isbn = ? AND num_copies < ?; This allows the SQL engine to replace the question marks with the actual data. Since the SQL engine knows exactly what is data and what SQL syntax, this prevents SQL injection from taking place. The advantages of using prepared statements are twofold: They effectively prevent SQL injection. They speed up execution time, since the SQL engine can compile the statement once, and use the pre-compiled statement on all subsequent query invocations. So not only will using prepared statements make your code more secure—it will also make it quicker. The second step is to make sure that any user data used in SQL queries is sanitized. Any unsafe characters such as single quotes should be escaped. If you are using PHP, the function mysql_real_escape_string() will do this for you. Finally, let's take a look at strings that ModSecurity can help block to prevent SQL injection attacks. What to block The following table lists common SQL commands that you should consider blocking, together with a suggested regular expression for blocking. The regular expressions are in lowercase and therefore assume that the t:lowercase transformation function is used. SQL code Regular expression UNION SELECT unions+select UNION ALL SELECT unions+alls+select INTO OUTFILE intos+outfile DROP TABLE drops+table ALTER TABLE alters+table LOAD_FILE load_file SELECT * selects+* For example, a rule to detect attempts to write data into files using INTO OUTFILE looks as follows: SecRule ARGS "intos+outfile" "t:lowercase,deny,msg: 'SQL Injection'" The s+ regular expression syntax allows for detection of an arbitrary number of whitespace characters. This will detect evasion attempts such as INTO%20%20OUTFILE where multiple spaces are used between the SQL command words. Website defacement We've all seen the news stories: "Large Company X was yesterday hacked and their homepage was replaced with an obscene message". This sort of thing is an everyday occurrence on the Internet. After the company SCO initiated a lawsuit against Linux vendors citing copyright violations in the Linux source code, the SCO corporate website was hacked and an image was altered to read WE OWN ALL YOUR CODE—pay us all your money. The hack was subtle enough that the casual visitor to the SCO site would likely not be able to tell that this was not the official version of the homepage: The above image shows what the SCO homepage looked like after being defaced—quite subtle, don't you think? Preventing website defacement is important for a business for several reasons: Potential customers will turn away when they see the hacked site There will be an obvious loss of revenue if the site is used for any sort of e-commerce sales Bad publicity will tarnish the company's reputation Defacement of a site will of course depend on a vulnerability being successfully exploited. The measures we will look at here are aimed to detect that a defacement has taken place, so that the real site can be restored as quickly as possible. Detection of website defacement is usually done by looking for a specific token in the outgoing web pages. This token has been placed within the pages in advance specifically so that it may be used to detect defacement—if the token isn't there then the site has likely been defaced. This can be sufficient, but it can also allow the attacker to insert the same token into his defaced page, defeating the detection mechanism. Therefore, we will go one better and create a defacement detection technology that will be difficult for the hacker to get around. To create a dynamic token, we will be using the visitor's IP address. The reason we use the IP address instead of the hostname is that a reverse lookup may not always be possible, whereas the IP address will always be available. The following example code in JSP illustrates how the token is calculated and inserted into the page. <%@ page import="java.security.*" %> <% String tokenPlaintext = request.getRemoteAddr(); String tokenHashed = ""; String hexByte = ""; // Hash the IP address MessageDigest md5 = MessageDigest.getInstance("MD5"); md5.update(tokenPlaintext.getBytes()); byte[] digest = md5.digest(); for (int i = 0; i < digest.length; i++) { hexByte = Integer.toHexString(0xFF & digest[i]); if (hexByte.length() < 2) { hexByte = "0" + hexByte; } tokenHashed += hexByte; } // Write MD5 sum token to HTML document out.println(String.format("<span style='color: white'>%s</span>", tokenHashed)); %> Assuming the background of the page is white, the <span style="color: white"> markup will ensure it is not visible to website viewers. Now for the ModSecurity rules to handle the defacement detection. We need to look at outgoing pages and make sure that they include the appropriate token. Since the token will be different for different users, we need to calculate the same MD5 sum token in our ModSecurity rule and make sure that this token is included in the output. If not, we block the page from being sent and sound the alert by sending an email message to the website administrator. # # Detect and block outgoing pages not containing our token # SecRule REMOTE_ADDR ".*" "phase:4,deny,chain,t:md5,t:hexEncode, exec:/usr/bin/emailadmin.sh" SecRule RESPONSE_BODY "!@contains %{MATCHED_VAR}" We are placing the rule in phase 4 since this is required when we want to inspect the response body. The exec action is used to send an email to the website administrator to let him know of the website defacement.

0
1
7801

article-image-7-things-java-programmers-need-to-watch-for-in-2019

Prasad Ramesh

24 Jan 2019

7 min read

7 things Java programmers need to watch for in 2019

Prasad Ramesh

24 Jan 2019

7 min read

Java is one of the most popular and widely used programming languages in the world. Its dominance of the TIOBE index ranking is unmatched for the most part, holding the number 1 position for almost 20 years. Although Java’s dominance is unlikely to waver over the next 12 months, there are many important issues and announcements that will demand the attention of Java developers. So, get ready for 2019 with this list of key things in the Java world to watch out for. #1 Commercial Java SE users will now need a license Perhaps the most important change for Java in 2019 is that commercial users will have to pay a license fee to use Java SE from February. This move comes in as Oracle decided to change the support model for the Java language. This change currently affects Java SE 8 which is an LTS release with premier and extended support up to March 2022 and 2025 respectively. For individual users, however, the support and updates will continue till December 2020. The recently released Java SE 11 will also have long term support with five and extended eight-year support from the release date. #2 The Java 12 release in March 2019 Since Oracle changed their support model, non-LTS version releases will be bi-yearly and probably won’t contain many major changes. JDK 12 is non-LTS, that is not to say that the changes in it are trivial, it comes with its own set of new features. It will be generally available in March this year and supported until September which is when Java 13 will be released. Java 12 will have a couple of new features, some of them are approved to ship in its March release and some are under discussion. #3 Java 13 release slated for September 2019, with early access out now So far, there is very little information about Java 13. All we really know at the moment is that it’s’ due to be released in September 2019. Like Java 12, Java 13 will be a non-LTS release. However, if you want an early insight, there is an early access build available to test right now. Some of the JEP (JDK Enhancement Proposals) in the next section may be set to be featured in Java 13, but that’s just speculation. https://twitter.com/OpenJDK/status/1082200155854639104 #4 A bunch of new features in Java in 2019 Even though the major long term support version of Java, Java 11, was released last year, releases this year also have some new noteworthy features in store. Let’s take a look at what the two releases this year might have. Confirmed candidates for Java 12 A new low pause time compiler called Shenandoah is added to cause minimal interruption when a program is running. It is added to match modern computing resources. The pause time will be the same irrespective of the heap size which is achieved by reducing GC pause times. The Microbenchmark Suite feature will make it easier for developers to run existing testing benchmarks or create new ones. Revamped switch statements should help simplify the process of writing code. It essentially means the switch statement can also be used as an expression. The JVM Constants API will, the OpenJDK website explains, “introduce a new API to model nominal descriptions of key class-file and run-time artifacts”. Integrated with Java 12 is one AArch64 port, instead of two. Default CDS Archives. G1 mixed collections. Other features that may not be out with Java 12 Raw string literals will be added to Java. A Packaging Tool, designed to make it easier to install and run a self-contained Java application on a native platform. Limit Speculative Execution to help both developers and operations engineers more effectively secure applications against speculative-execution vulnerabilities. #5 More contributions and features with OpenJDK OpenJDK is an open source implementation of Java standard edition (Java SE) which has contributions from both Oracle and the open-source community. As of now, the binaries of OpenJDK are available for the newest LTS release, Java 11. Even the life cycles of OpenJDK 7 and 8 have been extended to June 2020 and 2023 respectively. This suggests that Oracle does seem to be interested in the idea of open source and community participation. And why would it not be? Many valuable contributions come from the open source community. Microsoft seems to have benefitted from open sourcing with the incoming submissions. Although Oracle will not support these versions after six months from initial release, Red Hat will be extending support. As the chief architect of the Java platform, Mark Reinhold said stewards are the true leaders who can shape what Java should be as a language. These stewards can propose new JEPs, bring new OpenJDK problems to notice leading to more JEPs and contribute to the language overall. #6 Mobile and machine learning job opportunities In the mobile ecosystem, especially Android, Java is still the most widely used language. Yes, there’s Kotlin, but it is still relatively new. Many developers are yet to adopt the new language. According to an estimated by Indeed, the average salary of a Java developer is about $100K in the U.S. With the Android ecosystem growing rapidly over the last decade, it’s not hard to see what’s driving Java’s value. But Java - and the broader Java ecosystem - are about much more than mobile. Although Java’s importance in enterprise application development is well known, it's also used in machine learning and artificial intelligence. Even if Python is arguably the most used language in this area, Java does have its own set of libraries and is used a lot in enterprise environments. Deeplearning4j, Neuroph, Weka, OpenNLP, RapidMiner, RL4J etc are some of the popular Java libraries in artificial intelligence. #7 Java conferences in 2019 Now that we’ve talked about the language, possible releases and new features let’s take a look at the conferences that are going to take place in 2019. Conferences are a good medium to hear top professionals present, speak, and programmers to socialize. Even if you can’t attend, they are important fixtures in the calendar for anyone interested in following releases and debates in Java. Here are some of the major Java conferences in 2019 worth checking out: JAX is a Java architecture and software innovation conference. To be held in Mainz, Germany happening May 6–10 this year, the Expo is from May 7 to 9. Other than Java, topics like agile, Cloud, Kubernetes, DevOps, microservices and machine learning are also a part of this event. They’re offering discounts on passes till February 14. JBCNConf is happening in Barcelona, Spain from May 27. It will be a three-day conference with talks from notable Java champions. The focus of the conference is on Java, JVM, and open-source technologies. Jfokus is a developer-centric conference taking place in Stockholm, Sweden. It will be a three-day event from February 4-6. Speakers include the Java language architect, Brian Goetz from Oracle and many other notable experts. The conference will include Java, of course, Frontend & Web, cloud and DevOps, IoT and AI, and future trends. One of the biggest conferences is JavaZone attracting thousands of visitors and hundreds of speakers will be 18 years old this year. Usually held in Oslo, Norway in the month of September. Their website for 2019 is not active at the time of writing, you can check out last year’s website. Javaland will feature lectures, training, and community activities. Held in Bruehl, Germany from March 19 to 21 attendees can also exhibit at this conference. If you’re working in or around Java this year, there’s clearly a lot to look forward to - as well as a few unanswered questions about the evolution of the language in the future. While these changes might not impact the way you work in the immediate term, keeping on top of what’s happening and what key figures are saying will set you up nicely for the future. 4 key findings from The State of JavaScript 2018 developer survey Netflix adopts Spring Boot as its core Java framework Java 11 is here with TLS 1.3, Unicode 11, and more updates

0
0
7788

article-image-how-to-build-topic-models-in-r-tutorial

Natasha Mathur

22 Apr 2019

8 min read

How to build topic models in R [Tutorial]

Natasha Mathur

22 Apr 2019

8 min read

Topic models are a powerful method to group documents by their main topics. Topic models allow probabilistic modeling of term frequency occurrence in documents. The fitted model can be used to estimate the similarity between documents, as well as between a set of specified keywords using an additional layer of latent variables, which are referred to as topics (Grun and Hornik, 2011). In essence, a document is assigned to a topic based on the distribution of the words in that document, and the other documents in that topic will have roughly the same frequency of words. In this tutorial, we will look at a useful framework for text mining, called topic models. We will apply the framework to the State of the Union addresses. In building topic models, the number of topics must be determined before running the algorithm (k-dimensions). If no prior reason for the number of topics exists, then you can build several and apply judgment and knowledge to the final selection. There are different methods that come under Topic Modeling. We'll look at LDA with Gibbs sampling. This method is quite complicated mathematically, but my intent is to provide an introduction so that you are at least able to describe how the algorithm learns to assign a document to a topic in layperson terms. If you are interested in mastering the math associated with the method, block out a couple of hours on your calendar and have a go at it. Excellent background material can be found here. This tutorial is an excerpt taken from the book 'Mastering Machine Learning with R - Third Edition' written by Cory Lesmeister. The book explores expert techniques for solving data analytics and covers machine learning challenges that can help you gain insights from complex projects and power up your applications. Talking about LDA or Latent Dirichlet Allocation in topic modeling, it is a generative process, and works in the following manner to iterate to a steady state: For each document (j), there are 1 to j documents. We will randomly assign a multinomial distribution (Dirichlet distribution) to the topics (k) with 1 to k topics, for example, document A is 25 percent topic one, 25 percent topic two, and 50 percent topic three. Probabilistically, for each word (i), there are 1 to i words to a topic (k); for example, the word mean has a probability of 0.25 for the topic statistics. For each word (i) in document (j) and topic (k), calculate the proportion of words in that document assigned to that topic; note it as the probability of topic (k) with document (j), p(k|j), and the proportion of word (i) in topic (k) from all the documents containing the word. Note it as the probability of word (i) with topic (k), p(i|k). Resample, that is, assign w a new t based on the probability that t contains w, which is based on p(k|j) times p(i|k). Rinse and repeat; over numerous iterations, the algorithm finally converges and a document is assigned a topic based on the proportion of words assigned to a topic in that document. The LDA assumes that the order of words and documents does not matter. There has been work done to relax these assumptions in order to build models of language generation and sequence models over time (known as dynamic topic modeling or DTM). Applying Topic models in State of the Union addresses We will leave behind the 19th century and look at these recent times of trial and tribulation (1965 through 2016). On looking at this data, I found something interesting and troubling. Let's take a look at the 1970s: > sotu_meta[185:191, 1:4] # A tibble: 7 x 4 president year years_active party <chr> <int> <chr> <chr> 1 Richard M. Nixon 1970 1969-1973 Republican 2 Richard M. Nixon 1971 1969-1973 Republican 3 Richard M. Nixon 1972 1969-1973 Republican 4 Richard M. Nixon 1972 1969-1973 Republican 5 Richard M. Nixon 1974 1973-1974 Republican 6 Richard M. Nixon 1974 1973-1974 Republican 7 Gerald R. Ford 1975 1974-1977 Republican We see there are two 1972 and two 1974 addresses, but none for 1973. What? I went to the Nixon Foundation website, spent about 10 minutes trying to deconflict this, and finally threw my hands in the air and decided on implementing a quick fix. Be advised that there are a number of these conflicts to put in order: > sotu_meta[188, 2] <- "1972_2" > sotu_meta[190, 2] <- "1974_2" > sotu_meta[157, 2] <- "1945_2" > sotu_meta[166, 2] <- "1953_2" > sotu_meta[170, 2] <- "1956_2" > sotu_meta[176, 2] <- "1961_2" > sotu_meta[195, 2] <- "1978_2" > sotu_meta[197, 2] <- "1979_2" > sotu_meta[199, 2] <- "1980_2" > sotu_meta[201, 2] <- "1981_2" An email to the author of this package is in order. I won't bother with that, but feel free to solve the issue yourself. With this tragedy behind us, we'll go through tokenizing and removing stop words again for our relevant time frame: > sotu_meta_recent <- sotu_meta %>% dplyr::filter(year > 1964) > sotu_meta_recent %>% tidytext::unnest_tokens(word, text) -> sotu_unnest_recent > sotu_recent <- sotu_unnest_recent %>% dplyr::anti_join(stop_words, by = "word") As discussed previously, we need to put the data into a DTM before building a model. This is done by creating a word count grouped by year, then passing that to the cast_dtm() function: > sotu_recent %>% dplyr::group_by(year) %>% dplyr::count(word) -> lda_words > sotu_dtm <- tidytext::cast_dtm(lda_words, year, word, n) Let's get our model built. I'm going to create six different topics using the Gibbs method, and I specified verbose. It should run 2,000 iterations: > sotu_lda <- topicmodels::LDA( sotu_dtm, k = 6, method = "Gibbs", control = list(seed = 1965, verbose = 1) ) > sotu_lda A LDA_Gibbs topic model with 6 topics. The algorithm gives each topic a number. We can see what year is mapped to what topic. I abbreviate the output since 2002: > topicmodels::topics(sotu_lda) 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 We see a clear transition between Bush and Obama from topic 2 to topic 4. Here is a table of the count of topics: > table(topicmodels::topics(sotu_lda)) 1 2 3 4 5 6 8 7 5 18 14 5 Topic 4 is the most prevalent, which is associated with Clinton's term also. This output gives us the top five words associated with each topic: > topicmodels::terms(sotu_lda, 5) Topic 1 Topic 2 Topic 3 [1,] "future" "america" "administration" [2,] "tax" "security" "congress" [3,] "spending" "country" "economic" [4,] "government" "world" "legislation" [5,] "economic" "iraq" "energy" Topic 4 Topic 5 Topic 6 [1,] "people" "world" "federal" [2,] "american" "people" "programs" [3,] "jobs" "american" "government" [4,] "america" "congress" "program" [5,] "children" "peace" "act" This all makes good sense, and topic 2 is spot on for the time. If you drill down further to, say, 10, 15, or 20 words, it is even more revealing, but I won't bore you further. What about an application in the tidy ecosystem and a visualization? Certainly! We'll turn the model object into a data frame first and in the process capture the per-topic-per-word probabilities called beta: > lda_topics <- tidytext::tidy(sotu_lda, matrix = "beta") > ap_top_terms <- lda_topics %>% dplyr::group_by(topic) %>% dplyr::top_n(10, beta) %>% dplyr::ungroup() %>% dplyr::arrange(topic, -beta) We can explore that data further or just plot it as follows: > ap_top_terms %>% dplyr::mutate(term = reorder(term, beta)) %>% ggplot2::ggplot(ggplot2::aes(term, beta, fill = factor(topic))) + ggplot2::geom_col(show.legend = FALSE) + ggplot2::facet_wrap(~ topic, scales = "free") + ggplot2::coord_flip() + ggthemes::theme_economist_white() The output of the preceding code is as follows: This is the top 10 words per topic based on the beta probability. Another thing we can do is look at the probability an address is related to a topic. This is referred to as gamma in the model and we can pull those in just like the beta: > ap_documents <- tidytext::tidy(sotu_lda, matrix = "gamma") We now have the probabilities of an address per topic. Let's look at the 1981 Ronald Reagan values: > dplyr::filter(ap_documents, document == "1981") # A tibble: 6 x 3 document topic gamma <chr> <int> <dbl> 1 1981 1 0.286 2 1981 2 0.0163 3 1981 3 0.0923 4 1981 4 0.118 5 1981 5 0.0777 6 1981 6 0.411 Topic 1 is a close second in the topic race. If you think about it, this means that more than six topics would help to create better separation in the probabilities. However, I like just six topics for this tutorial for the purpose of demonstration. In this tutorial, we looked at topic models in R. We applied the framework to the State of the Union addresses. If you want to stay updated with expert techniques for solving data analytics and explore other machine learning challenges in R, be sure to check out the book 'Mastering Machine Learning with R - Third Edition'. How to make machine learning based recommendations using Julia [Tutorial] The rise of machine learning in the investment industry GitHub Octoverse: top machine learning packages, languages, and projects of 2018

0
0
7773

How-To Tutorials

article-image-working-forms-dynamics-ax-part-2

Packt

06 Jan 2010

13 min read

Working with Forms in Dynamics AX: Part 2

Packt

06 Jan 2010

13 min read

Adding form splitters Commonly used forms like Sales orders or Projects in Dynamics AX have multiple grids. Normally, one grid is in the upper section and another one is in the bottom section of the form. Sometimes grids are placed next to each other. The size of the data in each grid may vary, and that's why most of the forms with multiple grids have splitters in the middle so users can resize both grids at once by dragging the splitter with the help of a mouse. It is a good practice to add splitters to newly created forms. Although Microsoft developers did a good job by adding splitters to most of the multi-grid forms, there is still at least one that has not got it. It is the Account reconciliation form in the Bank module, which is one of the most commonly used forms. It can be opened from Bank | Bank Account Details, Functions | Account reconciliation button, and then the Transactions button. In the following screenshot, you can see that the size of the bottom grid cannot be changed: In this recipe, we will demonstrate the usage of splitters by resolving this situation. We will add a form splitter in the middle of the two grids in the mentioned form. It will allow users to define the sizes of both grids to make sure that the data is displayed optimally. How to do it... Open the BankReconciliation form in AOT, and create a new Group at the very top of the form's design with the following properties: Property Value Name Top AutoDeclaration Yes FrameType None Width Column width Move the AllReconciled, Balances, and Tab controls into the newly created group. Create a new Group right below the Top group with properties: Property Value Name Splitter AutoDeclaration Yes Width Column width Height 5 FrameType Raised 3D BackgroundColor Window background HideIfEmpty No AlignChild No Add the following line of code to the bottom of the form's class declaration: SysFormSplitter_Y fs; Add the following line of code to the bottom of the form's init(): fs = new SysFormSplitter_Y(Splitter, Top, element); Override three methods in the Splitter group with the following code: public int mouseDown( int _x, int _y, int _button, boolean _ctrl, boolean _shift) { return fs.mouseDown(_x, _y, _button, _ctrl, _shift); } public int mouseMove( int _x, int _y, int _button, boolean _ctrl, boolean _shift) { return fs.mouseMove(_x, _y, _button, _ctrl, _shift); } public int mouseUp( int _x, int _y, int _button, boolean _ctrl, boolean _shift) { return fs.mouseUp(_x, _y, _button, _ctrl, _shift); } Change the following properties of the existing BankTransTypeGroup group: Property Value Top Auto Width Column width Height Column height Change the following property of the exiting TypeSums grid located in BankTransTypeGroup group: Property Value Height Column height In AOT the Modified BankReconciliation form should look like the following screenshot: Now, to test the results, open Bank | Bank Account Details, select any bank account, click Functions | Account reconciliation, choose an existing or create a new account statement, and click the Transactions button. Notice that now the form has a nice splitter in the middle, which makes the form look better and allows defining the size of each grid: How it works... Normally a splitter is placed between two form groups. In this recipe, to follow that rule, we need to adjust the BankReconciliation form's design. The filter AllReconciled, the group Balances and the tab Tab are moved to a new group called Top. We do not want this new group to be visible to user, so we set FrameType to None. Setting AutoDeclaration to Yes allows us to access this object from X++ code. And finally, we make this group automatically expand in the horizontal direction by setting its Width to Column width. At this stage, visual form layout did not change, but now we have the upper group ready. The BankTransTypeGroup group could be used as a bottom group with slight changes. We change its Top behavior to Auto and make it fully expandable in the horizontal and vertical directions. The Height of the grid inside this group also has to be changed to Column height in order to fill all the vertical space. In the middle of those two groups, we add a splitter. The splitter is nothing else but another group, which looks like a splitter. In order to achieve that, we set Height to 5, FrameType to Raised 3D, and BackgroundColor to Windows background. This group does not hold any other controls inside. Therefore, in order to make it visible, we have to set the property HideIfEmpty to No. The value No of the property AlignChild makes the splitter begin on the very left side of the form and the Column width value of the property Width forces the splitter to automatically fill the form's width. Mouse events are handled by the SysFormSplitter_Y application class. After it has been declared in the form's class declaration, we create the actual object in the form's init(). We pass the name of the splitter control, the name of the top group and the form itself as arguments when creating it. A fully working splitter requires three mouse event handlers. It is implemented by overriding the mouseMove(), mouseDown(), and mouseUp() methods in the splitter group control. All arguments are passed to the respective member methods of the SysFormSplitter_Y class which does all the job. In this way, horizontal splitters can be easily added to any form. The Dynamics AX application also contains nice examples about splitters, which can be found in AOT in the Tutorial_Form_Split form. Vertical splitters can also be added to forms using a very similar approach. For this, we need to use another application class called SysFormSplitter_X. Creating modal forms During my trainings and working with Dynamics AX users, I noticed that people who are not familiar with computers and software tend to get lost among open application windows. The same could be applied to Dynamics AX. I experienced many times when a user opened one form, clicked some button to open another one, and then went back to the first one without closing the second one. Sometimes this happens intentionally, sometimes—not, but the result is that the second form is hidden behind the first one and the user starts wondering why it is not possible to close or edit the first form. Such issues could be easily solved by making the child form a modal window. In other words, the second form always stays on top of the fi rst one until closed. In this recipe, we will do exactly that. As an example, we will make the Create sales order form a modal window. How to do it... Open the SalesCreateOrder form in AOT, and set its Design property: Property Value WindowType Popup To test, open Accounts receivable | Sales Order Details, and start creating a new order. Notice that now the sales order creation form always stays on top of the Sales order form: How it works... Dynamics AX form design has a WindowType property, which is set to Standard by default. In order to make a form behave as a modal window, we have to change it to Popup. Such forms will always stay on top of the parent form. There's more... We already know that some of the Dynamics AX forms are created dynamically using the Dialog class. If we look deeper into the code, we could find that the Dialog class actually creates a runtime Dynamics AX form. That means we can apply the same principle, i.e. change the relevant form's design property. The following code could be added to the Dialog object and would do the job: dialog.dialogForm().buildDesign().windowType( FormWindowType::Popup); We get a reference to the form's design, by first using dialogForm() of the Dialog object to get a reference to the DialogForm object, and then we call buildDesign() on the latter object. Then, we set the design's property by calling its windowType() with an argument FormWindowType::Popup. Changing common form appearance In every single multi-company Dynamics AX project, in order to prevent user mistakes, I was asked to add functionality that allows setting the background color of every form per company. By doing that, users clearly see in which company account they are at the moment and can easily work within multiple companies at the same time. In this recipe, we will modify SysSetupFormRun class to change the background color for every form in Dynamics AX. How to do it... Open SysSetupFormRun in AOT, and override its run() with the following code: public void run() {; super(); this.design().colorScheme(FormColorScheme::RGB); this.design().backgroundColor(WinAPI::RGB2int(255,0,0)); } To test the results, open any Dynamics AX form, for example, General ledger | Chart of Accounts Details and notice how the background color is changed to red: How it works... SysSetupFormRun is the application class that is called by the system every time a user runs a form. The best place to add our custom code is to override the run() method and place it under the super(). We use this.design() to get a reference to the form's design. By calling colorScheme() and backgroundColor(), we set the color scheme to red/green/blue and the color code to red. We use WinAPI::RGB2int() to transform the human-readable red/green/blue code into the numeric color code. There's more... This recipe showed a very basic principle of how to change the common appearance of all forms with few lines of code. You noticed that the color in this recipe does not fi ll all areas of the form, which does not make the form look nice. An alternative to this could be to dynamically add a colored rectangle or something similar to the top of the form. The possibilities are endless here. New controls like input fields, buttons, menu items, and others could also be added to all forms dynamically using this class. But do not overdo as it may impact system performance. Storing last form values Dynamics AX has a very useful feature, which allows saving the latest user choices per user per form. This feature is implemented across a number of standard reports, periodic jobs, and other objects, which require user input. When developing a new functionality for Dynamics AX, I always try to keep that practice. One of the frequently used areas is custom filters for grid-based forms. Although, Dynamics AX allows users to use standard filtering for any grid, in practice sometimes it is not very useful, especially when the user requires something specific. In this recipe, we will see how to store the latest user filter selections. To make it as simple as possible, we will use existing filters on the General journal form, which can be opened from General ledger | Journals | General journal. This form contains two filters—Show and Show user-created only. Show allows displaying journals by their posting status and Show user-created only toggles between all journals and the currently logged user's journals. How to do it... Find the LedgerJournalTable form in AOT, and add the following code to the bottom of its class declaration: AllOpenPosted showStatus; NoYes showCurrentUser; #define.CurrentVersion(1) #localmacro.CurrentList showStatus, showCurrentUser #endmacro Create these additional form methods: public void initParmDefault() {; showStatus = AllOpenPosted::Open; showCurrentUser = true; } public container pack() { return [#CurrentVersion,#CurrentList]; } public boolean unpack(container packedClass) { int version = RunBase::getVersion(packedClass); ; switch (version) { case #CurrentVersion: [version,#CurrentList] = packedClass; return true; default: return false; } return false; } public identifiername lastValueDesignName() { return element.args().menuItemName(); } public identifiername lastValueElementName() { return this.name(); } public UtilElementType lastValueType() { return UtilElementType::Form; } public userId lastValueUserId() { return curuserid(); } public dataAreaId lastValueDataAreaId() { return curext(); } xSysLastValue::getLast(this); AllOpenPostedField.selection(showStatus); ShowUserCreatedOnly.value(showCurrentUser); journalFormTable.designSelectionChangeAllOpenPosted(); journalFormTable.designSelectionChangeShowUserCreateOnly(); And the following code to the bottom of the form's close(): showStatus = AllOpenPostedField.selection(); showCurrentUser = ShowUserCreatedOnly.value(); xSysLastValue::saveLast(this); Now to test the form, open General ledger | Journals | General journal, change filter values, close it, and run again. The latest filter selections should stay: How it works... First of all, we define some variables. We will store the journal posting status filter value in showStatus and the current user filter value in showCurrentUser. The macro #CurrentList is used to define a list of variables that we are going to store. Currently, we have two variables. The macro #CurrentVersion defines a version of saved values. In other words, it says that the variables defined by the #CurrentList, which will be stored in system cache later, can be addressed using the number 1. Normally, when implementing last value saving for the first time for particular object, #CurrentVersion is set to 1. Later on, if we decide to add new values or change existing ones, we have to change the value of #CurrentVersion, normally increasing it by 1. This ensures that the system addresses the correct list of variables in the cache and does not break existing functionality. The initParmDefault()method specifies default values if nothing is found in the system cache. Normally, this happens if we run a form for the first time, we change #CurrentVersion or clean the cache. Later, this method is called automatically by the xSysLastValue object. The methods pack() and unpack() are responsible for formatting a storage container from variables and extracting variables from a storage container respectively. In our case, pack() returns a container consisting of three values: version number, posting status, and current user toggle. Those values will be sent to the system cache after the form is closed. During an opening of the form, the xSysLastValue object uses unpack() to extract values from the stored container. It checks the container version from cache first, and if it matches the current version number, then the values from the cache are considered correct and are assigned to the form variables. A combination of lastValueDesignName(), lastValueElementName(), lastValueType(), and lastValueDataAreaId() return values form unique string representing saved values. This ensures that different users can store last values for different objects without overriding each other's values in cache. The lastValueDesignName() method is meant to return the name of the object's current design in cases where the object can have several designs. In this recipe, there is only one design, so instead of leaving it empty, I used it for a slightly different purpose. The same LedgerJournalTable AOT form can represent different user forms like Ledger journal, Periodic journal, Vendor payment journal, and so on depending on the location from which it was opened. To ensure that the user's latest choices are saved correctly, we included the opening menu item name as part of the unique string. The last two pieces of code need to be added to the bottom of the form's run() and close(). In the run() method, xSysLastValue::getLast(this) retrieves saved user values from cache and assigns them to the form's variables. The next two lines assign the same values to the respective form controls. designSelectionChangeAllOpenPosted() and designSelectionChangeShowUserCreateOnly() execute a form query to apply updated filters. Although both of those methods currently perform exactly the same action, we keep both for the future in case this functionality is updated. Code lines in close() are responsible for assigning user selections to variables and saving them to cache by calling xSysLastValue::saveLast(this).

0
0
7770

How-To Tutorials

article-image-r-statistical-package-interfacing-python

Janu Verma

17 Nov 2016

8 min read

The R Statistical Package Interfacing with Python

Janu Verma

17 Nov 2016

8 min read

One of my coding hobbies is to explore different Python packages and libraries. In this post, I'll talk about the package rpy2, which is used to call R inside python. Being an avid user of R and a huge supporter of R graphical packages, I had always desired to call R inside my Python code to be able to produce beautiful visualizations. The R framework offers machinery for a variety of statistical and data mining tasks. Let's review the basics of R before we delve into R-Python interfacing. R is a statistical language which is free, is open source, and has comprehensive support for various statistical, data mining, and visualization tasks. Quick-R describes it as: "R is an elegant and comprehensive statistical and graphical programming language." R is one of the fastest growing languages, mainly due to the surge in interest in statistical learning and data science. The Data Science Specialization on Coursera has all courses taught in R. There are R packages for machine learning, graphics, text mining, bioinformatics, topics modeling, interactive visualizations, markdown, and many others. In this post, I'll give a quick introduction to R. The motivation is to acquire some knowledge of R to be able to follow the discussion on R-Python interfacing. Installing R R can be downloaded from one of the Comprehensive R Archive Network (CRAN) mirror sites. Running R To run R interactively on the command line, type r. Launch the standard GUI (which should have been included in the download) and type R code in it. RStudio is the most popular IDE for R. It is recommended, though not required, to install RStudio and run R on it. To write a file with R code, create a file with the .r extension (for example, myFirstCode.r). And run the code by typing the following on the terminal: Rscript file.r Basics of R The most fundamental data structure in R is a vector; actually everything in R is a vector (even numbers are 1-dimensional vectors). This is one of the strangest things about R. Vectors contain elements of the same type. A vector is created by using the c() function. a = c(1,2,5,9,11) a [1] 1 2 5 9 11 strings = c("aa", "apple", "beta", "down") strings [1] "aa" "apple" "beta" "down" The elements in a vector are indexed, but the indexing starts at 1 instead of 0, as in most major languages (for example, python). strings[1] [1] "aa" The fact that everything in R is a vector and that the indexing starts at 1 are the main reasons for people's initial frustration with R (I forget this all the time). Data Frames A lot of R packages expect data as a data frame, which are essentially matrices but the columns can be accessed by names. The columns can be of different types. Data frames are useful outside of R also. The Python package Pandas was written primarily to implement data frames and to do analysis on them. In R, data frames are created (from vectors) as follows: students = c("Anne", "Bret", "Carl", "Daron", "Emily") scores = c(7,3,4,9,8) grades = c('B', 'D', 'C', 'A', 'A') results = data.frame(students, scores, grades) results students scores grades 1 Anne 7 B 2 Bret 3 D 3 Carl 4 C 4 Daron 9 A 5 Emily 8 A The elements of a data frame can be accessed as: results$students [1] Anne Bret Carl Daron Emily Levels: Anne Bret Carl Daron Emily This gives a vector, the elements of which can be called by indexing. results$students[1] [1] Anne Levels: Anne Bret Carl Daron Emily Reading Files Most of the times the data is given as a comma-separated values (csv) file or a tab-separated values (tsv) file. We will see how to read a csv/tsv file in R and create a data frame from it. (Aside: The datasets in most Kaggle competitions are given as csv files and we are required to do machine learning on them. In Python, one creates a pandas data frame or a numpy array from this csv file.) In R, we use a read.csv or read.table command to load a csv file into memory, for example, for the Titanic competition on Kaggle: training_data <- read.csv("train.csv", header=TRUE) train <- data.frame(survived=train_all$Survived, age=train_all$Age, fare=train_all$Fare, pclass=train_all$Pclass) Similarly, a tsv file can be loaded as: data <- read.csv("file.tsv";, header=TRUE, delimiter="t") Thus given a csv/tsv file with or without headers, we can read it using the read.csv function and create a data frame using: data.frame(vector_1, vector_2, ... vector_n). This should be enough to start exploring R packages. Another command that is very useful in R is head(), which is similar to the less command on Unix. rpy2 First things first, we need to have both Python and R installed. Then install rpy2 from the Python package index (Pypi). To do this, simply type the following on the command line: pip install rpy2 We will use the high-level interface to R, the robjects subpackage of rpy2. import rpy2.robjects as ro We can pass commands to the R session by putting the R commands in the ro.r() method as strings. Recall that everything in R is a vector. Let's create a vector using robjects: ro.r('x=c(2,4,6,8)') print(ro.r('x')) [1] 2 4 6 8 Keep in mind that though x is an R object (vector), ro.r('x') is a Python object (rpy2 object). This can be checked as follows: type(ro.r('x')) <class 'rpy2.robjects.vectors.FloatVector'> The most important data types in R are data frames, which are essentially matrices. We can create a data frame using rpy2: ro.r('x=c(2,4,6,8)') ro.r('y=c(4,8,12,16)') ro.r('rdf=data.frame(x,y)') This created an R data frame, rdf. If we want to manipulate this data frame using Python, we need to convert it to a python object. We will convert the R data frame to a pandas data frame. The Python package pandas contains efficient implementations of data frame objects in python. import pandas.rpy.common as com df = com.load_data('rdf') print type(df) <class 'pandas.core.frame.DataFrame'> df.x = 2*df.x Here we have doubled each of the elements of the x vector in the data frame df. But df is a Python object, which we can convert back to an R data frame using pandas as: rdf = com.convert_to_r_dataframe(df) print type(rdf) <class 'rpy2.robjects.vectors.DataFrame'> Let's use the plotting machinery of R, which is the main purpose of studying rpy2: ro.r('plot(x,y)') Not only R data types, but rpy2 lets us import R packages as well (given that these packages are installed on R) and use them for analysis. Here we will build a linear model on x and y using the R package stats: from rpy2.robjects.packages import importr stats = importr('stats') base = importr('base') fit = stats.lm('y ~ x', data=rdf) print(base.summary(fit)) We get the following results: Residuals: 1 2 3 4 0 0 0 0 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0 0 NA NA x 2 0 Inf <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0 on 2 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: Inf on 1 and 2 DF, p-value: < 2.2e-16 R programmers will immediately recognize the output as coming from applying linear model function lm() on data. I'll end this discussion with an example using my favorite R package ggplot2. I have written a lot of posts on data visualization using ggplot2. The following example is borrowed from the official documentation of rpy2. import math, datetime import rpy2.robjects.lib.ggplot2 as ggplot2 import rpy2.robjects as ro from rpy2.robjects.packages import importr base = importr('base') datasets = importr('datasets') mtcars = datasets.data.fetch('mtcars')['mtcars'] pp = ggplot2.ggplot(mtcars) + ggplot2.aes_string(x='wt', y='mpg', col='factor(cyl)') + ggplot2.geom_point() + ggplot2.geom_smooth(ggplot2.aes_string(group = 'cyl'), method = 'lm') pp.plot() Author: Janu Verma is a researcher in the IBM T.J. Watson Research Center, New York. His research interests are in mathematics, machine learning, information visualization, computational biology, and healthcare analytics. He has held research positions at Cornell University, Kansas State University, Tata Institute of Fundamental Research, Indian Institute of Science, and the Indian Statistical Institute. He has written papers for IEEE Vis, KDD, International Conference on HealthCare Informatics, Computer Graphics and Applications, Nature Genetics, IEEE Sensors Journals and so on. His current focus is on the development of visual analytics systems for prediction and understanding. He advises start-ups and other companies on data science and machine learning in the Delhi-NCR area. He can be found at Here.

0
0
7770

Ninject Patterns and Anti-patterns

Creating a graph application with Python, Neo4j, Gephi & Linkurious.js

Basics of Programming in Julia

How to write high quality code in Python: 15+ tips for data scientists and researchers

How to run Spark in Mesos

Working with Colors in Scribus

Solving Many-to-Many Relationship in Dimensional Modeling

Building a Consumer Review Website using WordPress 3

SOP Module Setup in Microsoft Dynamics GP

Understanding the Basics of RxJava

Trending Topics

Blocking Common Attacks using ModSecurity 2.5: Part 3

7 things Java programmers need to watch for in 2019

How to build topic models in R [Tutorial]

Working with Forms in Dynamics AX: Part 2

The R Statistical Package Interfacing with Python