Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

6719 Articles
article-image-ninject-patterns-and-anti-patterns
Packt
30 Sep 2013
7 min read
Save for later

Ninject Patterns and Anti-patterns

Packt
30 Sep 2013
7 min read
(For more resources related to this topic, see here.) Dependencies can be injected in a consumer class using different patterns and injecting them into a constructor is just one of them. While there are some patterns that can be followed for injecting dependencies, there are also some patterns that are recommended to be avoided, as they usually lead to undesirable results. In this article, we will examine only those patterns and antipatterns that are somehow relevant to Ninject features. Constructor Injection Constructor Injection is the most common and recommended pattern for injecting dependencies in a class. Generally this pattern should always be used as the primary injection pattern unless we have to use other ones. In this pattern, a list of all class dependencies should be introduced in the constructor. The question is what if the class has more than one constructor. Although Ninject's strategy for selecting constructor is customizable, its default behavior is selecting the constructor with more parameters, provided all of them are resolvable by Ninject. So, although in the following code the second constructor introduces more parameters, Ninject will select the first one if it cannot resolve IService2 and it will even use the default constructor if IService1 is not registered either. But if both dependencies are registered and resolvable, Ninject will select the second constructor because it has more parameters: public class Consumer { private readonly IService1 dependency1; private readonly IService2 dependency2; public Consumer(IService1 dependency1) { this.dependency1 = dependency1; } public Consumer(IService1 dependency1, IService2 dependency2) { this.dependency1 = dependency1; this.dependency2 = dependency2; } } If the preceding class had another constructor with two resolvable parameters, Ninject would throw an ActivationException exception notifying that several constructors had the same priority. There are two approaches to override this default behavior and explicitly select a constructor. The first approach is to indicate the desired constructor in a binding as follows: Bind<Consumer>().ToConstructor(arg => new Consumer(arg.Inject<IService1>())); In the preceding example, we explicitly selected the first constructor. Using the Inject<T> method that the arg argument provides, we requested Ninject to resolve IService1 in order to be injected into the specified constructor. The second method is to indicate the desired constructor using the [Inject] attribute: [Inject] public Consumer(IService1 dependency1) { this.dependency1 = dependency1; } In the preceding example, we applied the Ninject's [Inject] attribute on the first constructor to explicitly specify that we need to initialize the class by injecting dependencies into this constructor; even though the second constructor has more parameters and the default strategy of Ninject would be to select the second one. Note that applying this attribute on more than one constructor will result in the ActivationException. Ninject is highly customizable and it is even possible to substitute the default [Inject] attribute with another one, so that we don't need to add reference to the Ninject library from our consumer classes just because of an attribute: kernel.Settings.Set("InjectAttribute",typeof(MyAttribute)); Initializer methods and properties Apart from constructor injection, Ninject supports the injection of dependencies using initializer methods and property setters. We can specify as many methods and properties as required using the [Inject] attribute to inject dependencies. Although the dependencies will be injected to them as soon as the class is constructed, it is not possible to predict in which order they will receive their dependencies. The following code shows how to specify a property for injection: [Inject]public IService Service{ get { return dependency; } set { dependency = value; }} Here is an example of injecting dependencies using an injector method: [Inject]public void Setup(IService dependency){ this.dependency = dependency;} Note that only public members and constructors will be injected and even the internals will be ignored unless Ninject is configured to inject nonpublic members. In Constructor Injection, the constructor is a single point where we can consume all of the dependencies as soon as the class is activated. But when we use initializer methods the dependencies will be injected via multiple points in an unpredictable order, so we cannot decide in which method all of the dependencies will be ready to consume. In order to solve this problem, Ninject offers the IInitializable interface. This interface has an Initialize method which will be called once all of the dependencies have been injected: public class Consumer:IInitializable{ private IService1 dependency1; private IService2 dependency2; [Inject] public IService Service1 { get { return dependency1; } set { dependency1 = value; } } [Inject] public IService Service2 { get { return dependency2; } set { dependency2 = value; } } public void Initialize() { // Consume all dependencies here }} Although Ninject supports injection using properties and methods, Constructor Injection should be the superior approach. First of all, Constructor Injection makes the class more reusable, because a list of all class dependencies are visible, while in the initializer property or method the user of the class should investigate all of the class members or go through the class documentations (if any), to discover its dependencies. Initialization of the class is easier while using Constructor Injection because all the dependencies get injected at the same time and we can easily consume them at the same place where the constructor initializes the class. As we have seen in the preceding examples the only case where the backing fields could be readonly was in the Constructor Injection scenario. As the readonly fields are initializable only in the constructor, we need to make them writable to be able to use initializer methods and properties. This can lead to potential mutation of backing fields. Service Locator Service Locator is a design pattern introduced by Martin Fowler regarding which there have been some controversies. Although it can be useful in particular circumstances, it is generally considered as an antipattern and preferably should be avoided. Ninject can easily be misused as a Service Locator if we are not familiar to this pattern. The following example demonstrates misusing the Ninject kernel as a Service Locator rather than a DI container: public class Consumer{ public void Consume() { var kernel = new StandardKernel(); var depenency1 = kernel.Get<IService1>(); var depenency2 = kernel.Get<IService2>(); ... }} There are two significant downsides with the preceding code. The first one is that although we are using a DI container, we are not at all implementing DI. The class is tied to the Ninject kernel while it is not really a dependency of this class. This class and all of its prospective consumers will always have to drag their unnecessary dependency on the kernel object and Ninject library. On the other hand, the real dependencies of class (IService1 and IService2) are invisible from the consumers, and this reduces its reusability. Even if we change the design of this class to the following one, the problems still exist: public class Consumer{ private readonly IKernel kernel; public Consumer(IKernel kernel) { this.kernel = kernel; } public void Consume() { var depenency1 = kernel.Get<IService1>(); var depenency2 = kernel.Get<IService2>(); ... }} The preceding class still depends on the Ninject library while it doesn't have to and its actual dependencies are still invisible to its consumers. It can easily be refactored using the Constructor Injection pattern: public Consumer(IService1 dependency1, IService2 dependency2){ this.dependency1 = dependency1; this.dependency2 = dependency2;} Summary In this article we studied the most common DI patterns and anti-patterns related to Ninject. Resources for Article: Further resources on this subject: Introduction to JBoss Clustering [Article] Configuring Clusters in GlassFish [Article] Designing Secure Java EE Applications in GlassFish [Article]
Read more
  • 0
  • 0
  • 7864

article-image-creating-graph-application-python-neo4j-gephi-linkuriousjs
Greg Roberts
12 Oct 2015
13 min read
Save for later

Creating a graph application with Python, Neo4j, Gephi & Linkurious.js

Greg Roberts
12 Oct 2015
13 min read
I love Python, and to celebrate Packt Python week, I’ve spent some time developing an app using some of my favorite tools. The app is a graph visualization of Python and related topics, as well as showing where all our content fits in. The topics are all StackOverflow tags, related by their co-occurrence in questions on the site. The app is available to view at http://gregroberts.github.io/ and in this blog, I’m going to discuss some of the techniques I used to construct the underlying dataset, and how I turned it into an online application. Graphs, not charts Graphs are an incredibly powerful tool for analyzing and visualizing complex data. In recent years, many different graph database engines have been developed to make use of this novel manner of representing data. These databases offer many benefits over traditional, relational databases because of how the data is stored and accessed. Here at Packt, I use a Neo4j graph to store and analyze data about our business. Using the Cypher query language, it’s easy to express complicated relations between different nodes succinctly. It’s not just the technical aspect of graphs which make them appealing to work with. Seeing the connections between bits of data visualized explicitly as in a graph helps you to see the data in a different light, and make connections that you might not have spotted otherwise. This graph has many uses at Packt, from customer segmentation to product recommendations. In the next section, I describe the process I use to generate recommendations from the database. Make the connection For product recommendations, I use what’s known as a hybrid filter. This considers both content based filtering (product x and y are about the same topic) and collaborative filtering (people who bought x also bought y). Each of these methods has strengths and weaknesses, so combining them into one algorithm provides a more accurate signal. The collaborative aspect is straightforward to implement in Cypher. For a particular product, we want to find out which other product is most frequently bought alongside it. We have all our products and customers stored as nodes, and purchases are stored as edges. Thus, the Cypher query we want looks like this: MATCH (n:Product {title:’Learning Cypher’})-[r:purchased*2]-(m:Product) WITH m.title AS suggestion, count(distinct r)/(n.purchased+m.purchased) AS alsoBought WHERE m<>n RETURN* ORDER BY alsoBought DESC and will very efficiently return the most commonly also purchased product. When calculating the weight, we divide by the total units sold of both titles, so we get a proportion returned. We do this so we don’t just get the titles with the most units; we’re effectively calculating the size of the intersection of the two titles’ audiences relative to their overall audience size. The content side of the algorithm looks very similar: MATCH (n:Product {title:’Learning Cypher’})-[r:is_about*2]-(m:Product) WITH m.title AS suggestion, count(distinct r)/(length(n.topics)+length(m.topics)) AS alsoAbout WHERE m<>n RETURN * ORDER BY alsoAbout DESC Implicit in this algorithm is knowledge that a title is_about a topic of some kind. This could be done manually, but where’s the fun in that? In Packt’s domain there already exists a huge, well moderated corpus of technology concepts and their usage: StackOverflow. The tagging system on StackOverflow not only tells us about all the topics developers across the world are using, it also tells us how those topics are related, by looking at the co-occurrence of tags in questions. So in our graph, StackOverflow tags are nodes in their own right, which represent topics. These nodes are connected via edges, which are weighted to reflect their co-occurrence on StackOverflow: edge_weight(n,m) = (Number of questions tagged with both n & m)/(Number questions tagged with n or m) So, to find topics related to a given topic, we could execute a query like this: MATCH (n:StackOverflowTag {name:'Matplotlib'})-[r:related_to]-(m:StackOverflowTag) RETURN n.name, r.weight, m.name ORDER BY r.weight DESC LIMIT 10 Which would return the following: | n.name | r.weight | m.name ----+------------+----------+-------------------- 1 | Matplotlib | 0.065699 | Plot 2 | Matplotlib | 0.045678 | Numpy 3 | Matplotlib | 0.029667 | Pandas 4 | Matplotlib | 0.023623 | Python 5 | Matplotlib | 0.023051 | Scipy 6 | Matplotlib | 0.017413 | Histogram 7 | Matplotlib | 0.015618 | Ipython 8 | Matplotlib | 0.013761 | Matplotlib Basemap 9 | Matplotlib | 0.013207 | Python 2.7 10 | Matplotlib | 0.012982 | Legend There are many, more complex relationships you can define between topics like this, too. You can infer directionality in the relationship by looking at the local network, or you could start constructing Hyper graphs using the extensive StackExchange API. So we have our topics, but we still need to connect our content to topics. To do this, I’ve used a two stage process. Step 1 – Parsing out the topics We take all the copy (words) pertaining to a particular product as a document representing that product. This includes the title, chapter headings, and all the copy on the website. We use this because it’s already been optimized for search, and should thus carry a fair representation of what the title is about. We then parse this document and keep all the words which match the topics we’ve previously imported. #...code for fetching all the copy for all the products key_re = 'W(%s)W' % '|'.join(re.escape(i) for i in topic_keywords) for i in documents tags = re.findall(key_re, i[‘copy’]) i['tags'] = map(lambda x: tag_lookup[x],tags) Having done this for each product, we have a bag of words representing each product, where each word is a recognized topic. Step 2 – Finding the information From each of these documents, we want to know the topics which are most important for that document. To do this, we use the tf-idf algorithm. tf-idf stands for term frequency, inverse document frequency. The algorithm takes the number of times a term appears in a particular document, and divides it by the proportion of the documents that word appears in. The term frequency factor boosts terms which appear often in a document, whilst the inverse document frequency factor gets rid of terms which are overly common across the entire corpus (for example, the term ‘programming’ is common in our product copy, and whilst most of the documents ARE about programming, this doesn’t provide much discriminating information about each document). To do all of this, I use python (obviously) and the excellent scikit-learn library. Tf-idf is implemented in the class sklearn.feature_extraction.text.TfidfVectorizer. This class has lots of options you can fiddle with to get more informative results. import sklearn.feature_extraction.text as skt tagger = skt.TfidfVectorizer(input = 'content', encoding = 'utf-8', decode_error = 'replace', strip_accents = None, analyzer = lambda x: x, ngram_range = (1,1), max_df = 0.8, min_df = 0.0, norm = 'l2', sublinear_tf = False) It’s a good idea to use the min_df & max_df arguments of the constructor so as to cut out the most common/obtuse words, to get a more informative weighting. The ‘analyzer’ argument tells it how to get the words from each document, in our case, the documents are already lists of normalized words, so we don’t need anything additional done. #create vectors of all the documents vectors = tagger.fit_transform(map(lambda x: x['tags'],rows)).toarray() #get back the topic names to map to the graph t_map = tagger.get_feature_names() jobs = [] for ind, vec in enumerate(vectors): features = filter(lambda x: x[1]>0, zip(t_map,vec)) doc = documents[ind] for topic, weight in features: job = ‘’’MERGE (n:StackOverflowTag {name:’%s’}) MERGE (m:Product {id:’%s’}) CREATE UNIQUE (m)-[:is_about {source:’tf_idf’,weight:%d}]-(n) ’’’ % (topic, doc[‘id’], weight) jobs.append(job) We then execute all of the jobs using Py2neo’s Batch functionality. Having done all of this, we can now relate products to each other in terms of what topics they have in common: MATCH (n:Product {isbn10:'1783988363'})-[r:is_about]-(a)-[q:is_about]-(m:Product {isbn10:'1783289007'}) WITH a.name as topic, r.weight+q.weight AS weight RETURN topic ORDER BY weight DESC limit 6 Which returns: | topic ---+------------------ 1 | Machine Learning 2 | Image 3 | Models 4 | Algorithm 5 | Data 6 | Python Huzzah! I now have a graph into which I can throw any piece of content about programming or software, and it will fit nicely into the network of topics we’ve developed. Take a breath So, that’s how the graph came to be. To communicate with Neo4j from Python, I use the excellent py2neo module, developed by Nigel Small. This module has all sorts of handy abstractions to allow you to work with nodes and edges as native Python objects, and then update your Neo instance with any changes you’ve made. The graph I’ve spoken about is used for many purposes across the business, and has grown in size and scope significantly over the last year. For this project, I’ve taken from this graph everything relevant to Python. I started by getting all of our content which is_about Python, or about a topic related to python: titles = [i.n for i in graph.cypher.execute('''MATCH (n)-[r:is_about]-(m:StackOverflowTag {name:'Python'}) return distinct n''')] t2 = [i.n for i in graph.cypher.execute('''MATCH (n)-[r:is_about]-(m:StackOverflowTag)-[:related_to]-(m:StackOverflowTag {name:'Python'}) where has(n.name) return distinct n''')] titles.extend(t2) then hydrated this further by going one or two hops down each path in various directions, to get a large set of topics and content related to Python. Visualising the graph Since I started working with graphs, two visualisation tools I’ve always used are Gephi and Sigma.js. Gephi is a great solution for analysing and exploring graphical data, allowing you to apply a plethora of different layout options, find out more about the statistics of the network, and to filter and change how the graph is displayed. Sigma.js is a lightweight JavaScript library which allows you to publish beautiful graph visualizations in a browser, and it copes very well with even very large graphs. Gephi has a great plugin which allows you to export your graph straight into a web page which you can host, share and adapt. More recently, Linkurious have made it their mission to bring graph visualization to the masses. I highly advise trying the demo of their product. It really shows how much value it’s possible to get out of graph based data. Imagine if your Customer Relations team were able to do a single query to view the entire history of a case or customer, laid out as a beautiful graph, full of glyphs and annotations. Linkurious have built their product on top of Sigma.js, and they’ve made available much of the work they’ve done as the open source Linkurious.js. This is essentially Sigma.js, with a few changes to the API, and an even greater variety of plugins. On Github, each plugin has an API page in the wiki and a downloadable demo. It’s worth cloning the repository just to see the things it’s capable of! Publish It! So here’s the workflow I used to get the Python topic graph out of Neo4j and onto the web. Use Py2neo to graph the subgraph of content and topics pertinent to Python, as described above Add to this some other topics linked to the same books to give a fuller picture of the Python “world” Add in topic-topic edges and product-product edges to show the full breadth of connections observed in the data Export all the nodes and edges to csv files Import node and edge tables into Gephi. The reason I’m using Gephi as a middle step is so that I can fiddle with the visualisation in Gephi until it looks perfect. The layout plugin in Sigma is good, but this way the graph is presentable as soon as the page loads, the communities are much clearer, and I’m not putting undue strain on browsers across the world! The layout of the graph has been achieved using a number of plugins. Instead of using the pre-installed ForceAtlas layouts, I’ve used the OpenOrd layout, which I feel really shows off the communities of a large graph. There’s a really interesting and technical presentation about how this layout works here. Export the graph into gexf format, having applied some partition and ranking functions to make it more clear and appealing. Now it’s all down to Linkurious and its various plugins! You can explore the source code of the final page to see all the details, but here I’ll give an overview of the different plugins I’ve used for the different parts of the visualisation: First instantiate the graph object, pointing to a container (note the CSS of the container, without this, the graph won’t display properly: <style type="text/css"> #container { max-width: 1500px; height: 850px; margin: auto; background-color: #E5E5E5; } </style> … <div id="container"></div> … <script> s= new sigma({ container: 'container', renderer: { container: document.getElementById('container'), type: 'canvas' }, settings: { … } }); sigma.parsers.gexf - used for (trivially!) importing a gexf file into a sigma instance sigma.parsers.gexf( 'static/data/Graph1.gexf', s, function(s) { //callback executed once the data is loaded, use this to set up any aspects of the app which depend on the data }); sigma.plugins.filter - Adds the ability to very simply hide nodes/edges based on a callback function which returns a boolean. This powers the filtering widgets on the page. <input class="form-control" id="min-degree" type="range" min="0" max="0" value="0"> … function applyMinDegreeFilter(e) { var v = e.target.value; $('#min-degree-val').textContent = v; filter .undo('min-degree') .nodesBy( function(n, options) { return this.graph.degree(n.id) >= options.minDegreeVal; },{ minDegreeVal: +v }, 'min-degree' ) .apply(); }; $('#min-degree').change(applyMinDegreeFilter); sigma.plugins.locate - Adds the ability to zoom in on a single node or collection of nodes. Very useful if you’re filtering a very large initial graph function locateNode (nid) { if (nid == '') { locate.center(1); } else { locate.nodes(nid); } }; sigma.renderers.glyphs - Allows you to add custom glyphs to each node. Useful if you have many types of node. Outro This application has been a very fun little project to build. The improvements to Sigma wrought by Linkurious have resulted in an incredibly powerful toolkit to rapidly generate graph based applications with a great degree of flexibility and interaction potential. None of this would have been possible were it not for Python. Python is my right (left, I’m left handed) hand which I use for almost everything. Its versatility and expressiveness make it an incredibly robust Swiss army knife in any data-analysts toolkit.
Read more
  • 0
  • 0
  • 7863

article-image-basics-programming-julia
Packt
03 Mar 2015
17 min read
Save for later

Basics of Programming in Julia

Packt
03 Mar 2015
17 min read
 In this article by Ivo Balbaert, author of the book Getting Started with Julia Programming, we will explore how Julia interacts with the outside world, reading from standard input and writing to standard output, files, networks, and databases. Julia provides asynchronous networking I/O using the libuv library. We will see how to handle data in Julia. We will also discover the parallel processing model of Julia. In this article, the following topics are covered: Working with files (including the CSV files) Using DataFrames (For more resources related to this topic, see here.) Working with files To work with files, we need the IOStream type. IOStream is a type with the supertype IO and has the following characteristics: The fields are given by names(IOStream) 4-element Array{Symbol,1}:  :handle   :ios    :name   :mark The types are given by IOStream.types (Ptr{None}, Array{Uint8,1}, String, Int64) The file handle is a pointer of the type Ptr, which is a reference to the file object. Opening and reading a line-oriented file with the name example.dat is very easy: // code in Chapter 8io.jl fname = "example.dat"                                 f1 = open(fname) fname is a string that contains the path to the file, using escaping of special characters with when necessary; for example, in Windows, when the file is in the test folder on the D: drive, this would become d:\test\example.dat. The f1 variable is now an IOStream(<file example.dat>) object. To read all lines one after the other in an array, use data = readlines(f1), which returns 3-element Array{Union(ASCIIString,UTF8String),1}: "this is line 1.rn" "this is line 2.rn" "this is line 3." For processing line by line, now only a simple loop is needed: for line in data   println(line) # or process line end close(f1) Always close the IOStream object to clean and save resources. If you want to read the file into one string, use readall. Use this only for relatively small files because of the memory consumption; this can also be a potential problem when using readlines. There is a convenient shorthand with the do syntax for opening a file, applying a function process, and closing it automatically. This goes as follows (file is the IOStream object in this code): open(fname) do file     process(file) end The do command creates an anonymous function, and passes it to open. Thus, the previous code example would have been equivalent to open(process, fname). Use the same syntax for processing a file fname line by line without the memory overhead of the previous methods, for example: open(fname) do file     for line in eachline(file)         print(line) # or process line     end end Writing a file requires first opening it with a "w" flag, then writing strings to it with write, print, or println, and then closing the file handle that flushes the IOStream object to the disk: fname =   "example2.dat" f2 = open(fname, "w") write(f2, "I write myself to a filen") # returns 24 (bytes written) println(f2, "even with println!") close(f2) Opening a file with the "w" option will clear the file if it exists. To append to an existing file, use "a". To process all the files in the current folder (or a given folder as an argument to readdir()), use this for loop: for file in readdir()   # process file end Reading and writing CSV files A CSV file is a comma-separated file. The data fields in each line are separated by commas "," or another delimiter such as semicolons ";". These files are the de-facto standard for exchanging small and medium amounts of tabular data. Such files are structured so that one line contains data about one data object, so we need a way to read and process the file line by line. As an example, we will use the data file Chapter 8winequality.csv that contains 1,599 sample measurements, 12 data columns, such as pH and alcohol per sample, separated by a semicolon. In the following screenshot, you can see the top 20 rows:   In general, the readdlm function is used to read in the data from the CSV files: # code in Chapter 8csv_files.jl: fname = "winequality.csv" data = readdlm(fname, ';') The second argument is the delimiter character (here, it is ;). The resulting data is a 1600x12 Array{Any,2} array of the type Any because no common type could be found:     "fixed acidity"   "volatile acidity"      "alcohol"   "quality"      7.4                        0.7                                9.4              5.0      7.8                        0.88                              9.8              5.0      7.8                        0.76                              9.8              5.0   … If the data file is comma separated, reading it is even simpler with the following command: data2 = readcsv(fname) The problem with what we have done until now is that the headers (the column titles) were read as part of the data. Fortunately, we can pass the argument header=true to let Julia put the first line in a separate array. It then naturally gets the correct datatype, Float64, for the data array. We can also specify the type explicitly, such as this: data3 = readdlm(fname, ';', Float64, 'n', header=true) The third argument here is the type of data, which is a numeric type, String or Any. The next argument is the line separator character, and the fifth indicates whether or not there is a header line with the field (column) names. If so, then data3 is a tuple with the data as the first element and the header as the second, in our case, (1599x12 Array{Float64,2}, 1x12 Array{String,2}) (There are other optional arguments to define readdlm, see the help option). In this case, the actual data is given by data3[1] and the header by data3[2]. Let's continue working with the variable data. The data forms a matrix, and we can get the rows and columns of data using the normal array-matrix syntax). For example, the third row is given by row3 = data[3, :] with data:  7.8  0.88  0.0  2.6  0.098  25.0  67.0  0.9968  3.2  0.68  9.8  5.0, representing the measurements for all the characteristics of a certain wine. The measurements of a certain characteristic for all wines are given by a data column, for example, col3 = data[ :, 3] represents the measurements of citric acid and returns a column vector 1600-element Array{Any,1}:   "citric acid" 0.0  0.0  0.04  0.56  0.0  0.0 …  0.08  0.08  0.1  0.13  0.12  0.47. If we need columns 2-4 (volatile acidity to residual sugar) for all wines, extract the data with x = data[:, 2:4]. If we need these measurements only for the wines on rows 70-75, get these with y = data[70:75, 2:4], returning a 6 x 3 Array{Any,2} outputas follows: 0.32   0.57  2.0 0.705  0.05  1.9 … 0.675  0.26  2.1 To get a matrix with the data from columns 3, 6, and 11, execute the following command: z = [data[:,3] data[:,6] data[:,11]] It would be useful to create a type Wine in the code. For example, if the data is to be passed around functions, it will improve the code quality to encapsulate all the data in a single data type, like this: type Wine     fixed_acidity::Array{Float64}     volatile_acidity::Array{Float64}     citric_acid::Array{Float64}     # other fields     quality::Array{Float64} end Then, we can create objects of this type to work with them, like in any other object-oriented language, for example, wine1 = Wine(data[1, :]...), where the elements of the row are splatted with the ... operator into the Wine constructor. To write to a CSV file, the simplest way is to use the writecsv function for a comma separator, or the writedlm function if you want to specify another separator. For example, to write an array data to a file partial.dat, you need to execute the following command: writedlm("partial.dat", data, ';') If more control is necessary, you can easily combine the more basic functions from the previous section. For example, the following code snippet writes 10 tuples of three numbers each to a file: // code in Chapter 8tuple_csv.jl fname = "savetuple.csv" csvfile = open(fname,"w") # writing headers: write(csvfile, "ColName A, ColName B, ColName Cn") for i = 1:10   tup(i) = tuple(rand(Float64,3)...)   write(csvfile, join(tup(i),","), "n") end close(csvfile) Using DataFrames If you measure n variables (each of a different type) of a single object of observation, then you get a table with n columns for each object row. If there are m observations, then we have m rows of data. For example, given the student grades as data, you might want to know "compute the average grade for each socioeconomic group", where grade and socioeconomic group are both columns in the table, and there is one row per student. The DataFrame is the most natural representation to work with such a (m x n) table of data. They are similar to pandas DataFrames in Python or data.frame in R. A DataFrame is a more specialized tool than a normal array for working with tabular and statistical data, and it is defined in the DataFrames package, a popular Julia library for statistical work. Install it in your environment by typing in Pkg.add("DataFrames") in the REPL. Then, import it into your current workspace with using DataFrames. Do the same for the packages DataArrays and RDatasets (which contains a collection of example datasets mostly used in the R literature). A common case in statistical data is that data values can be missing (the information is not known). The DataArrays package provides us with the unique value NA, which represents a missing value, and has the type NAtype. The result of the computations that contain the NA values mostly cannot be determined, for example, 42 + NA returns NA. (Julia v0.4 also has a new Nullable{T} type, which allows you to specify the type of a missing value). A DataArray{T} array is a data structure that can be n-dimensional, behaves like a standard Julia array, and can contain values of the type T, but it can also contain the missing (Not Available) values NA and can work efficiently with them. To construct them, use the @data macro: // code in Chapter 8dataarrays.jl using DataArrays using DataFrames dv = @data([7, 3, NA, 5, 42]) This returns 5-element DataArray{Int64,1}: 7  3   NA  5 42. The sum of these numbers is given by sum(dv) and returns NA. One can also assign the NA values to the array with dv[5] = NA; then, dv becomes [7, 3, NA, 5, NA]). Converting this data structure to a normal array fails: convert(Array, dv) returns ERROR: NAException. How to get rid of these NA values, supposing we can do so safely? We can use the dropna function, for example, sum(dropna(dv)) returns 15. If you know that you can replace them with a value v, use the array function: repl = -1 sum(array(dv, repl)) # returns 13 A DataFrame is a kind of an in-memory database, versatile in the ways you can work with the data. It consists of columns with names such as Col1, Col2, Col3, and so on. Each of these columns are DataArrays that have their own type, and the data they contain can be referred to by the column names as well, so we have substantially more forms of indexing. Unlike two-dimensional arrays, columns in a DataFrame can be of different types. One column might, for instance, contain the names of students and should therefore be a string. Another column could contain their age and should be an integer. We construct a DataFrame from the program data as follows: // code in Chapter 8dataframes.jl using DataFrames # constructing a DataFrame: df = DataFrame() df[:Col1] = 1:4 df[:Col2] = [e, pi, sqrt(2), 42] df[:Col3] = [true, false, true, false] show(df) Notice that the column headers are used as symbols. This returns the following 4 x 3 DataFrame object: We could also have used the full constructor as follows: df = DataFrame(Col1 = 1:4, Col2 = [e, pi, sqrt(2), 42],    Col3 = [true, false, true, false]) You can refer to the columns either by an index (the column number) or by a name, both of the following expressions return the same output: show(df[2]) show(df[:Col2]) This gives the following output: [2.718281828459045, 3.141592653589793, 1.4142135623730951,42.0] To show the rows or subsets of rows and columns, use the familiar splice (:) syntax, for example: To get the first row, execute df[1, :]. This returns 1x3 DataFrame.  | Row | Col1 | Col2    | Col3 |  |-----|------|---------|------|  | 1   | 1    | 2.71828 | true | To get the second and third row, execute df [2:3, :] To get only the second column from the previous result, execute df[2:3, :Col2]. This returns [3.141592653589793, 1.4142135623730951]. To get the second and third column from the second and third row, execute df[2:3, [:Col2, :Col3]], which returns the following output: 2x2 DataFrame  | Row | Col2    | Col3  |  |---- |-----   -|-------|  | 1   | 3.14159 | false |  | 2   | 1.41421 | true  | The following functions are very useful when working with DataFrames: The head(df) and tail(df) functions show you the first six and the last six lines of data respectively. The names function gives the names of the columns names(df). It returns 3-element Array{Symbol,1}:  :Col1  :Col2  :Col3. The eltypes function gives the data types of the columns eltypes(df). It gives the output as 3-element Array{Type{T<:Top},1}:  Int64  Float64  Bool. The describe function tries to give some useful summary information about the data in the columns, depending on the type, for example, describe(df) gives for column 2 (which is numeric) the min, max, median, mean, number, and percentage of NAs: Col2 Min      1.4142135623730951 1st Qu.  2.392264761937558  Median   2.929937241024419 Mean     12.318522011105483  3rd Qu.  12.856194490192344  Max      42.0  NAs      0  NA%      0.0% To load in data from a local CSV file, use the method readtable. The returned object is of type DataFrame: // code in Chapter 8dataframes.jl using DataFrames fname = "winequality.csv" data = readtable(fname, separator = ';') typeof(data) # DataFrame size(data) # (1599,12) Here is a fraction of the output: The readtable method also supports reading in gzipped CSV files. Writing a DataFrame to a file can be done with the writetable function, which takes the filename and the DataFrame as arguments, for example, writetable("dataframe1.csv", df). By default, writetable will use the delimiter specified by the filename extension and write the column names as headers. Both readtable and writetable support numerous options for special cases. Refer to the docs for more information (refer to http://dataframesjl.readthedocs.org/en/latest/). To demonstrate some of the power of DataFrames, here are some queries you can do: Make a vector with only the quality information data[:quality] Give the wines with alcohol percentage equal to 9.5, for example, data[ data[:alcohol] .== 9.5, :] Here, we use the .== operator, which does element-wise comparison. data[:alcohol] .== 9.5 returns an array of Boolean values (true for datapoints, where :alcohol is 9.5, and false otherwise). data[boolean_array, : ] selects those rows where boolean_array is true. Count the number of wines grouped by quality with by(data, :quality, data -> size(data, 1)), which returns the following: 6x2 DataFrame | Row | quality | x1  | |-----|---------|-----| | 1    | 3      | 10  | | 2    | 4      | 53  | | 3    | 5      | 681 | | 4    | 6      | 638 | | 5    | 7      | 199 | | 6    | 8      | 18  | The DataFrames package contains the by function, which takes in three arguments: A DataFrame, here it takes data A column to split the DataFrame on, here it takes quality A function or an expression to apply to each subset of the DataFrame, here data -> size(data, 1), which gives us the number of wines for each quality value Another easy way to get the distribution among quality is to execute the histogram hist function hist(data[:quality]) that gives the counts over the range of quality (2.0:1.0:8.0,[10,53,681,638,199,18]). More precisely, this is a tuple with the first element corresponding to the edges of the histogram bins, and the second denoting the number of items in each bin. So there are, for example, 10 wines with quality between 2 and 3, and so on. To extract the counts as a variable count of type Vector, we can execute _, count = hist(data[:quality]); the _ means that we neglect the first element of the tuple. To obtain the quality classes as a DataArray class, we will execute the following: class = sort(unique(data[:quality])) We can now construct a df_quality DataFrame with the class and count columns as df_quality = DataFrame(qual=class, no=count). This gives the following output: 6x2 DataFrame | Row | qual | no  | |-----|------|-----| | 1   | 3    | 10  | | 2   | 4    | 53  | | 3   | 5    | 681 | | 4   | 6    | 638 | | 5   | 7    | 199 | | 6   | 8    | 18  | To deepen your understanding and learn about the other features of Julia DataFrames (such as joining, reshaping, and sorting), refer to the documentation available at http://dataframesjl.readthedocs.org/en/latest/. Other file formats Julia can work with other human-readable file formats through specialized packages: For JSON, use the JSON package. The parse method converts the JSON strings into Dictionaries, and the json method turns any Julia object into a JSON string. For XML, use the LightXML package For YAML, use the YAML package For HDF5 (a common format for scientific data), use the HDF5 package For working with Windows INI files, use the IniFile package Summary In this article we discussed the basics of network programming in Julia. Resources for Article: Further resources on this subject: Getting Started with Electronic Projects? [article] Getting Started with Selenium Webdriver and Python [article] Handling The Dom In Dart [article]
Read more
  • 0
  • 0
  • 7860

article-image-write-high-quality-code-python-15-tips-data-scientists-researchers
Aarthi Kumaraswamy
21 Mar 2018
5 min read
Save for later

How to write high quality code in Python: 15+ tips for data scientists and researchers

Aarthi Kumaraswamy
21 Mar 2018
5 min read
Writing code is easy. Writing high quality code is much harder. Quality is to be understood both in terms of actual code (variable names, comments, docstrings, and so on) and architecture (functions, modules, and classes). In general, coming up with a well-designed code architecture is much more challenging than the implementation itself. In this post, we will give a few tips about how to write high quality code. This is a particularly important topic in academia, as more and more scientists without prior experience in software development need to code. High quality code writing first principles Writing readable code means that other people (or you in a few months or years) will understand it quicker and will be more willing to use it. It also facilitates bug tracking. Modular code is also easier to understand and to reuse. Implementing your program's functionality in independent functions that are organized as a hierarchy of packages and modules is an excellent way of achieving high code quality. It is easier to keep your code loosely coupled when you use functions instead of classes. Spaghetti code is really hard to understand, debug, and reuse. Iterate between bottom-up and top-down approaches while working on a new project. Starting with a bottom-up approach lets you gain experience with the code before you start thinking about the overall architecture of your program. Still, make sure you know where you're going by thinking about how your components will work together. How these high quality code writing first principles translate in Python? Take the time to learn the Python language seriously. Review the list of all modules in the standard library—you may discover that functions you implemented already exist. Learn to write Pythonic code, and do not translate programming idioms from other languages such as Java or C++ to Python. Learn common design patterns; these are general reusable solutions to commonly occurring problems in software engineering. Use assertions throughout your code (the assert keyword) to prevent future bugs (defensive programming). Start writing your code with a bottom-up approach; write independent Python functions that implement focused tasks. Do not hesitate to refactor your code regularly. If your code is becoming too complicated, think about how you can simplify it. Avoid classes when you can. If you can use a function instead of a class, choose the function. A class is only useful when you need to store persistent state between function calls. Make your functions as pure as possible (no side effects). In general, prefer Python native types (lists, tuples, dictionaries, and types from Python's collections module) over custom types (classes). Native types lead to more efficient, readable, and portable code. Choose keyword arguments over positional arguments in your functions. Argument names are easier to remember than argument ordering. They make your functions self-documenting. Name your variables carefully. Names of functions and methods should start with a verb. A variable name should describe what it is. A function name should describe what it does. The importance of naming things well cannot be overstated. Every function should have a docstring describing its purpose, arguments, and return values, as shown in the following example. You can also look at the conventions chosen in popular libraries such as NumPy. The exact convention does not matter, the point is to be consistent within your code. You can use a markup language such as Markdown or reST to do that. Follow (at least partly) Guido van Rossum's Style Guide for Python, also known as Python Enhancement Proposal number 8 (PEP8). It is a long read, but it will help you write well-readable Python code. It covers many little things such as spacing between operators, naming conventions, comments, and docstrings. For instance, you will learn that it is considered a good practice to limit any line of your code to 79 or 99 characters. This way, your code can be correctly displayed in most situations (such as in a command-line interface or on a mobile device) or side by side with another file. Alternatively, you can decide to ignore certain rules. In general, following common guidelines is beneficial on projects involving many developers. You can check your code automatically against most of the style conventions in PEP8 with the pycodestyle Python package. You can also automatically make your code PEP8-compatible with the autopep8 package. Use a tool for static code analysis such as flake8 or Pylint. It lets you find potential errors or low-quality code statically, that is, without running your code. Use blank lines to avoid cluttering your code (see PEP8). You can also demarcate sections in a long Python module with salient comments. A Python module should not contain more than a few hundreds lines of code. Having too many lines of code in a module may be a sign that you need to split it into several modules. Organize important projects (with tens of modules) into subpackages (subdirectories). Take a look at how major Python projects are organized. For example, the code of IPython is well-organized into a hierarchy of subpackages with focused roles. Reading the code itself is also quite instructive. Learn best practices to create and distribute a new Python package. Make sure that you know setuptools, pip, wheels, virtualenv, PyPI, and so on. Also, you are highly encouraged to take a serious look at conda, a powerful and generic packaging system created by Anaconda. Packaging has long been a rapidly evolving topic in Python, so read only the most recent references. You enjoyed an excerpt from Cyrille Rossant’s latest book, IPython Cookbook, Second Edition. This book contains 100+ recipes for high-performance scientific computing and data analysis, from the latest IPython/Jupyter features to the most advanced tricks, to help you write better and faster code. For free recipes from the book, head over to the Ipython Cookbook Github page. If you loved what you saw, support Cyrille’s work by buying a copy of the book today!
Read more
  • 0
  • 1
  • 7857

article-image-how-to-run-spark-in-mesos
Sunith Shetty
31 Jan 2018
6 min read
Save for later

How to run Spark in Mesos

Sunith Shetty
31 Jan 2018
6 min read
This article is an excerpt from a book written by Muhammad Asif Abbasi titled Learning Apache Spark 2. In this book, you will learn how to perform big data analytics using Spark streaming, machine learning techniques and more. From the article given below, you will learn how to operate Spark in Mesos cluster manager. What is Mesos? Mesos is an open source cluster manager started as a UC Berkley research project in 2008 and quite widely used by a number of organizations. Spark supports Mesos, and Matei Zahria has given a keynote at Mesos Con in June of 2016. Here is a link to the YouTube video of the keynote. Before you start If you haven't installed Mesos previously, the getting started page on the Apache website gives a good walk through of installing Mesos on Windows, MacOS, and Linux. Follow the URL https://mesos.apache.org/getting-started/. Once installed you need to start-up Mesos on your cluster Starting Mesos Master: ./bin/mesos-master.sh -ip=[MasterIP] -workdir=/var/lib/mesos Start Mesos Agents on all your worker nodes: ./bin/mesos-agent.sh - master=[MasterIp]:5050 -work-dir=/var/lib/mesos Make sure Mesos is up and running with all your relevant worker nodes configured: http://[MasterIP]@5050 Make sure that Spark binary packages are available and accessible by Mesos. They can be placed on a Hadoop-accessible URI for example: HTTP via http:// S3 via s3n:// HDFS via hdfs:// You can also install spark in the same location on all the Mesos slaves, and configure spark.mesos.executor.home to point to that location. Running in Mesos Mesos can have single or multiple masters, which means the Master URL differs when submitting application from Spark via mesos: Single Master Mesos://sparkmaster:5050 Multiple Masters (Using Zookeeper) Mesos://zk://master1:2181, master2:2181/mesos Modes of operation in Mesos Mesos supports both the Client and Cluster modes of operation: Client mode Before running the client mode, you need to perform couple of configurations: Spark-env.sh Export MESOS_NATIVE_JAVA_LIBRARY=<Path to libmesos.so [Linux]> or <Path to libmesos.dylib[MacOS]> Export SPARK_EXECUTOR_URI=<URI of Spark zipped file uploaded to an accessible location e.g. HTTP, HDFS, S3> Set spark.executor.uri to URI of Spark zipped file uploaded to an accessible location e.g. HTTP, HDFS, S3 Batch Applications For batch applications, in your application program you need to pass on the Mesos URL as the master when creating your Spark context. As an example: val sparkConf = new SparkConf()                .setMaster("mesos://mesosmaster:5050")                .setAppName("Batch Application")                .set("spark.executor.uri", "Location to Spark binaries                (Http, S3, or HDFS)") val sc = new SparkContext(sparkConf) If you are using Spark-submit, you can configure the URI in the conf/sparkdefaults.conf file using spark.executor.uri. Interactive applications When you are running one of the provided spark shells for interactive querying, you can pass the master argument e.g: ./bin/spark-shell -master mesos://mesosmaster:5050 Cluster mode Just as in YARN, you run spark on mesos in a cluster mode, which means the driver is launched inside the cluster and the client can disconnect after submitting the application, and get results from the Mesos WebUI. Steps to use the cluster mode Start the MesosClusterDispatcher in your cluster: ./sbin/start-mesos-dispatcher.sh -master mesos://mesosmaster:5050. This will generally start the dispatcher at port 7077. From the client, submit a job to the mesos cluster by Spark-submit specifying the dispatcher URL. Example:        ./bin/spark-submit        --class org.apache.spark.examples.SparkPi        --master mesos://dispatcher:7077        --deploy-mode cluster        --supervise        --executor-memory 2G        --total-executor-cores 10        s3n://path/to/examples.jar Similar to Spark Mesos has lots of properties that can be set to optimize the processing. You should refer to the Spark Configuration page (http://spark.apache.org/docs/latest/configuration.html) for more Information. Mesos run modes Spark can run on Mesos in two modes: Coarse Grained (default-mode): Spark will acquire a long running Mesos task on each machine. This offers a much cost of statup, but the resources will continue to be allocated to spark for the complete duration of the application. Fine Grained (deprecated): The fine grained mode is deprecated as in this case each mesos task is created per Spark task. The benefit of this is each application receives cores as per its requirements, but the initial bootstrapping might act as a deterrent for interactive applications. Key Spark on Mesos configuration properties While Spark has a number of properties that can be configured to optimize Spark processing, some of these properties are specific to Mesos. We'll look at few of those key properties here. Property Name Meaning/Default Value spark.mesos.coarse Setting it to true (default value), will run Mesos in coarse grained mode. Setting it to false will run it in fine-grained mode. spark.mesos.extra.cores This is more of an advertisement rather than allocation in order to improve parallelism. An executor will pretend that it has extra cores resulting in the driver sending it more work. Default=0 spark.mesos.mesosExecutor.cores Only works in fine grained mode. This specifies how many cores should be given to each Mesos executor. spark.mesos.executor.home Identifies the directory of Spark installation for the executors in Mesos. As discussed, you can specify this using spark.executor.uri as well, however if you have not specified it, you can specify it using this property. spark.mesos.executor.memoryOverhead The amount of memory (in MBs) to be allocated per executor. spark.mesos.uris A comma separated list of URIs to be downloaded when the driver or executor is launched by Mesos. spark.mesos.prinicipal The name of the principal used by Spark to authenticate itself with Mesos.   You can find other configuration properties at the Spark documentation page (http://spark.apache.org/docs/latest/running-on-mesos.html#spark-properties). To summarize, we covered the objective to get you started with running Spark on Mesos. To know more about Spark SQL, Spark Streaming, Machine Learning with Spark, you can refer to the book Learning Apache Spark 2.
Read more
  • 0
  • 0
  • 7850

article-image-working-colors-scribus
Packt
10 Dec 2010
10 min read
Save for later

Working with Colors in Scribus

Packt
10 Dec 2010
10 min read
Scribus 1.3.5: Beginner's Guide Create optimum page layouts for your documents using productive tools of Scribus. Master desktop publishing with Scribus Create professional-looking documents with ease Enhance the readability of your documents using powerful layout tools of Scribus Packed with interesting examples and screenshots that show you the most important Scribus tools to create and publish your documents. Applying colors in Scribus Applying color is as basic as creating a frame or writing text. In this article we will often give color values. Each time we need to, we will use the first letter of the color followed by its value. For example, C75 will mean 75 percent of cyan. K will be used for black and B for blue. There are five main things you could apply colors to: Frame or Shape fill Frame or Shape border Line Text Text border You'd like to colorize pictures too. It's a very different method, using duotone or any equivalent image effect. Applying a color to a frame means that you will use the Colors tab of the PP, whereas applying color to text will require you to go to the Color & Effects expander in the Text tab. In both cases you'll find what's needed to apply color to the fill and the border, but the user interfaces are a bit different. (Move the mouse over the image to enlarge.) Time for action – applying colors to a Text Frame's text Colors on frames will use the same color list. Let's follow some steps to see how this is done. Draw a Text Frame where you want it on a page. Type some text inside like "colors of the world" or use Insert | Sample Text. Go to the Colors tab of the PP (F2). Click on the second button placed above the color list to specify that you want to apply the changes to the fill. Then click on the color you want in the list below, for example, Magenta. Click on the paintbrush button, and apply a black color that will be applied to the border (we could call it stroke too). Don't forget that applying a stroke color will need some border refinements in the Line tab to set the width and style of the border. If you need more information about these options. Now, you can select the text or some part of it and go to the Colors & Effects expander of the Text tab. Here you will again see the same icon we used previously. Each has its own color list. Let's choose Yellow for the text color. The stroke color cannot be changed. To change this, click on the Shadow button placed below, and now choose black as the stroke color. The text shadow should be black. What just happened? Color on text is quicker than frame colors in some ways because each has its own list. So, there is no need to click on any button, and you can see both at a glance. Just remember that text has no stroke color activated when setting it first. You need to add the stroke or shadow to a selection to activate the border color for that selection. Quick apply in story editor If, like me, you like the Story Editor (Edit | Edit Text), notice that colors can be applied from there. They are not displayed in the editor but will be displayed when the changes will be applied to the layout. This is much faster, but you need to know exactly what you're doing and need to be precise in your selection. If you need to apply the same color setting to a word in the overall document, you can alternatively use the Edit | Search/Replace window. You can set there the word you're looking for in the first field, and in the right-hand side, replace with the same word, and choose the Fill and Stroke color that you want to apply. Of course, it would be nice if this window could let us apply character styles to make future changes easier. The Scribus new document receives a default color list, which is the same all over your document. In this article, we will deal with many ways of adapting existing colors or creating new ones. Applying shade or transparency Shade and transparency are two ways of setting more precisely how a specific color will be applied on your items. Shades and transparencies are fake effects that will be interpreted by some later elements of the printing workflow, such as Raster Image Processors, to know how the set color can be rendered with pure colors. This is the key point of reproducing colors: if you want a gray, you'll generally have a black color for that. In offset printing which is the reference, the size of the point will vary relatively to the darkness of the black you chose. This will be optically interpreted by the reader. Using shades Each color property has a Shade value. The default is set to 100 percent, meaning that the color will be printed fully saturated. Reducing the shade value will produce a lighter color. When at 0 percent, the color, whatever it may be, will be interpreted as white. On a pure color item like any primary or spot, changing the shade won't affect the color composition. However, on processed colors that are made by mixing several primary colors, modifying the shade will proportionally change the amount of each ink used in the process. Our C75 M49 Y7 K12 at a 50 percent shade value will become a C37 M25 Y4 K6 color in the final PDF. Less color means less ink on the paper and more white (or paper color), which results in a lighter color. You should remember that Shade is a frame property and not a color property. So, if you apply a new color to the frame, the shade value will be kept and applied immediately. To change the shade of the color applied to some characters, it will be a bit different: we don't have a field to fill but a drop-down list with predefined values of 10 percent increments. If you need another value, just choose Other to display a window in which you'll add the amount that you exactly need. You can do the same in the Text Editor. Using transparency While shade is used to lighten a color, the Opacity value will tell you how the color will be less solid. Once again, the range goes from 0%, meaning the object is completely transparent and invisible, to 100% to make it opaque. The latter value is the default. When two objects overlap, the top object hides the bottom object. But when Opacity is decreased, the object at the bottom will become more and more visible. One difference to notice is that Opacity won't affect only the color rendering but the content too (if there is some). As for Shade, Opacity too is applied separately to the fill and to the stroke. So you'll need to set both if needed. One important aspect is that Shade and Opacity can both be applied on the frame and a value 50% of each will give a lighter color than if only one was used. Several opacity values applied to objects show how they can act and add to each other: The background for the text in the title, in the following screenshot, is done in the same color as the background at the top of the page. Using transparency or shade can help create this background and decrease the number of used colors. Time for action – transparency and layers Let's now use transparency and layers to create some custom effects over a picture, as can often be done for covers. Create a new document and display the Layers window from the Windows menu. This window will already contain a line called Background. You can add a layer by clicking on the + button at the bottom left-hand side of the window: it will be called New Layer 1. You can rename it by double-clicking on its name. On the first page of it, add an Image Frame that covers the entire page. Then draw a rectangular shape that covers almost half of the page height. Duplicate this layer by clicking on the middle button placed at the bottom of the Layers window. Select the visibility checkbox (it is the first column headed with an eye icon) of this layer to hide it. We'll modify the transparency of each object. Click on New Layer 1 to specify that you want to work on this layer; otherwise you won't be able to select its frames. The frames or shapes you'll create from now on will be added to this layer called New Layer 1. Select the black shape and decrease the Opacity value of the Colors tab of the PP to 50%. Do the same for the Image Frame. Now, hide this layer by clicking on its visibility icon and show the top layer. In the Layers window, verify if this layer is selected and decrease its opacity. What just happened? If there is a need to make several objects transparent at once, an idea would be to put them on a layer and set the layer Opacity. This way, the same amount of transparency will be applied to the whole. You can open the Layer window from the Window menu. When working with layers, it's important to have the right layer selected to work on it. Basically, any new layer will be added at the top of the stack and will be activated once created. When a layer is selected, you can change the Opacity of this layer by using the field on the top right-hand side of the Layer window. Since it is applied to the layer itself, all the objects placed on it will be affected, but their own opacity values won't be changed. If you look at the differences between the two layers we have made, you'll see that the area in the first black rectangle explicitly becomes transparent by itself because you can see the photo through it. This is not seen in the second. So using layer, as we have seen, can help us work faster when we need to apply the same opacity setting to several objects, but we have to take care, because the result is slightly different. Using layers to blend colors More rarely, layers can be used to mix colors. Blend Mode is originally set to Normal, which does no blending. But if you use any other mode on a layer, its colors will be mixed with colors of the item placed on a lower layer, relatively to the chosen mode. This can be very creative. If you need a more precise action, Blend Mode can be set to Single Object from the Colors tab of the PP. Just give it a try. Layers are still most commonly used to organize a document: a layer for text, a layer for images, a layer for each language for a multi-lingual document, and so on. They are a smart way to work, but are not necessary in your documents and really we can work without them in a layout program.
Read more
  • 0
  • 0
  • 7849
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $15.99/month. Cancel anytime
article-image-solving-many-many-relationship-dimensional-modeling
Packt
28 Dec 2009
3 min read
Save for later

Solving Many-to-Many Relationship in Dimensional Modeling

Packt
28 Dec 2009
3 min read
Bridge table solution We will use a simplified book sales dimensional model as an example to demonstrate our bridge solution. Our book sales model initially has the SALES_FACT fact table and two dimension tables: BOOK_DIM and DATE_DIM. The granularity of the model is sales amount by date (daily) and by book. Assume the BOOK_DIM table has five rows: BOOK_SK TITLE AUTHOR 1 Programming in Java King, Chan 3 Learning Python Simpson 2 Introduction to BIRT Chan, Gupta, Simpson (Editor) 4 Advanced Java King, Chan 5 Beginning XML King, Chan (Foreword) The DATE_DIM has two rows: DATE_SK DT 1 11-DEC-2009 2 12-DEC-2009 3 13-DEC-2009 And, the SALES_FACT table has ten rows: DATE_SK BOOK_SK SALES_AMT 1 1 1000 1 2 2000 1 3 3000 1 4 4000 2 2 2500 2 3 3500 2 4 4500 2 5 5500 3 3 8000 3 4 8500 Note that:The columns with _sk suffixes in the dimension tables are surrogate keys of the dimension tables; these surrogate keys relate the rows of the fact table to the rows in the dimension tables.King and Chan have collaborated in three books; two as co-authors, while in the “Beginning XML” Chan’s contribution is writing its foreword. Chan also co-authors the “Introduction to BIRT”.Simpson singly writes the “Learning Python” and is an editor for “Introduction to BIRT”. To analyze daily book sales, you simply run a query, joining the dimension tables to the fact table: SELECT dt, title, sales_amt FROM sales_fact s, date_dim d, book_dim bWHERE s.date_sk = d.date_skAND s.book_sk = b.book_sk This query produces the result showing the daily sales amount of every book that has a sale: DT TITLE SALES_AMT 11-DEC-09 Advanced Java 4000 11-DEC-09 Introduction to BIRT 2000 11-DEC-09 Learning Python 3000 11-DEC-09 Programming in Java 1000 12-DEC-09 Advanced Java 4500 12-DEC-09 Beginning XML 5500 12-DEC-09 Introduction to BIRT 2500 12-DEC-09 Learning Python 3500 13-DEC-09 Advanced Java 8500 13-DEC-09 Learning Python 8000 You will notice that the model does not allow you to readily analyze the sales by individual writer—the AUTHOR column is multi-value, not normalized, which violates the dimension modeling rule (we can resolve this by creating a view to “bundle” the AUTHOR_DIM with the SALES_FACT tables such that the AUTHORtable connects to the view as a normal dimension. We will create the view a bit later in this section). We can solve this issue by adding an AUTHOR_DIM and its AUTHOR_GROUP bridge table. The AUTHOR_DIM must contain all individual contributors, which you will have to extract from the books and enter into the table. In our example we have four authors. AUTHOR_SK NAME 1 Chan 2 King 3 Gupta 4 Simpson The weighting_factor column in the AUTHOR_GROUP bridge table contains a fractional numeric value that determines the contribution of an author to a book. Typically the authors have equal contribution to the book they write, but you might want to have different weighting_factor for different roles; for example, an editor and a foreword writer have smaller weighting_factors than that of an author. The total weighting_factors for a book must always equal to 1. The AUTHOR_GROUP bridge table has one surrogate key for every group of authors (a single author is considered a group that has one author only), and as many rows with that surrogate key for every contributor in the group.
Read more
  • 0
  • 0
  • 7844

article-image-building-consumer-review-website-using-wordpress-3
Packt
06 Aug 2010
15 min read
Save for later

Building a Consumer Review Website using WordPress 3

Packt
06 Aug 2010
15 min read
(For more resources on Wordpress, see here.) Building a consumer review website will allow you to supply consumers with the information that they seek and then, once they've decided to make a purchase, your site can direct them to a source for the product or service. This process can ultimately allow you to earn some nice commission checks because it's only logical that you would affiliate yourself with a number of the sites to which you will be directing consumers. The great thing about using the WP Review Site plugin to build your consumer review website is that you can provide people with an unbiased source of public opinions on any product or service that you can imagine. You will never have to resort to the hard sell in order to drive traffic to the companies that you've affiliated yourself with. Instead, consumers can research the reviews posted on your website and,ultimately, make a purchase feeling confident that they're making the right decision. In this article, you will learn about the following: Present reviews in the most convenient way possible for visitors browsing your site Specify the ratings criteria that site visitors will use when reviewing the products or services included on your website Display informational comparison tables on your site's index and category pages Provide visitors with the location of local businesses using Google Maps Perform the additional steps required when writing a post now that the WP Review Site plugin has been introduced into the process Perform either automatic and manual integration so that you can use a theme of your own rather than either of the ones provided with this plugin Once this project is complete, you will have succeeded in creating a site that's similar to the one shown in the following screenshot:   Introducing WP Review Site With the WP Review Site plugin you will be able to build a consumer review site where visitors can share their opinions about the products or services of your choosing. The plugin, which can be found at WP Review Site, can be used to build a dedicated review site or, if you would like consumer reviews to make up only a subsection of your website, then you can specify certain categories where they should appear. This plugin gives you complete control over where ratings appear and where they don't since you can choose to include or exclude them on any category, page, or post. The WP Review Site plugin seamlessly integrates with WordPress by, among other things, altering the normal appearance and functionality of the comments submission form. This plugin provides visitors with a way to write a review and assign stars to the ratings categories that you previously defined. They can also write a review and opt to provide no stars without harming the overall rating presented on your site, since no stars is interpreted as though no rating was given. WP Review Site plugin makes it easy for you to present your visitors with concise information. Using the features available with this plugin, you can build comparison tables based upon your posts and user reviews. In order to accomplish this, you will need to configure a few settings and then the plugin will take care of the rest. Typically, WordPress displays posts in chronological order, but that doesn't make much sense on a consumer review site where visitors want to view posts based upon other factors such as the number of positive reviews that a particular product or service has received. The developer behind WP Site Review took that into consideration and has included two alternative sorting methods for your site's posts. The developer has even included a Bayesian weighting feature so that reviews are ordered in the most logical way possible. Right about now, you're probably wondering what Bayesian weighting is and how it works. What it does is provide a way to mathematically calculate the rating of products and/or services based upon the credibility of the votes that have been cast. If an item receives only a few votes, then it can't be said with any certainty that that's how the general public feels. If an item receives several votes, then it can be safely assumed that many others hold the same opinion. So, with Bayesian weighting, a product that has received only one five star review won't outrank another that has received fifteen four star reviews. As the product that received one five star review garners more ratings, its reviews will grow in credibility and, if it continues to receive high ratings, it will eventually become credible enough to outrank the other reviews. If you're planning to create a website where visitors can come and review local businesses, then you might consider this plugins ability to automatically embed Google Maps quite handy. After configuring the settings on the plugin's Google Maps screen you will be able to type the address for a business into a custom field when writing a post and then the plugin will take care of the rest. The WP Review Site plugin also includes two sidebar widgets that can used with any widget-ready theme. These widgets will allow you to display a list of top rated items and a list of recent reviews. Lastly, the themes provided with this plugin include built-in support for the hReview microformat. This means that Google will easily be able to extract and highlight reviews from your website. That feature will prove to be very beneficial for driving search engine traffic to your site. Installing WP Review Site Once you've installed WordPress you can then concentrate on the installation of the WP Review Site plugin and its accompanying themes. First, extract the wpreviewsite.zip archive. Inside you will find a plugins folder and a themes folder. Within the plugins folder is another folder named review-site. Since none of these folders are zipped, you will need to upload them using either an FTP program or the file manager provided by your web host. So, upload the review-site folder to the wp-content/plugins directory on your server. If you plan to use one of the themes provided with this plugin, then you will next need to upload the contents of the themes folder to the wp-content/themes directory. Setting up and configuring WP Review Site With the installation process complete, you will now need to activate the WP Review Site plugin. Once that's finished, a Review Site menu will appear on the left side of your screen. This menu contains links to the settings screens for this plugin. Before you delve into the configuration process you must first activate the theme that you plan to use on your consumer review website. Using one of the provided themes is a bit easier. That's because using any other theme will mean that you must integrate the functionality of WP Review Site into it. Now that you know the benefits offered by the themes that are bundled with this plugin, click on Appearance | Themes. Once there, activate either Award Winning Hosts, Bonus Black, or a theme of your choice. General Settings Navigate to Review Site | General Settings to be taken to the first of the WP Review Site settings screens. On this screen, Sort Posts By is the first setting that you will encounter. Rather than displaying reviews in the normal chronological order used by WordPress you should, instead, select either the Average User Rating (Weighted) or the Number of Reviews/Comments option. Either of these settings will provide a much more user-friendly experience for your visitors. If you want to make it impossible for site visitors to submit a comment without also choosing a rating, tick the checkbox next to Require Ratings with All Comments. If you don't want to make this a requirement, then you can leave this setting as is. This setting will, of course, only apply to posts that you would like your visitors to rate. On normal posts, that don't include rating stars in the comment form area, it will still be possible for your visitors to submit a comment. When using one of the themes provided with the plugin, none of the other settings on this screen need to be configured. If you would like to integrate this plugin into a different theme, then, depending upon the method that you choose, you may need to revisit this screen later on. No matter how you're handling the theme issue, you can, for now, just click Save Settings before proceeding to the next screen. Rating Categories To access the next settings screen, click on Review Site | Rating Categories. Here you can add categories for people to rate when submitting reviews. These categories shouldn't be confused with the categories used in WordPress for organizational purposes. These WP Review Site categories are more like ratings criteria. By default, WP Review Site includes a category called Overall Rating, but you can click the remove link to delete it if you like. To add your first rating category, simply enter its title into the Add a Category textbox and then click Save Settings. The screen will then refresh and your newly created rating category will now appear under the Edit Rating Categories section of the screen. To add additional rating categories, simply repeat the process that you previously completed. Once you've finished adding rating categories, you will next need to turn your attention to the Bulk Apply Rating Categories section of the screen. In the Edit Rating Categories area you will see all of the rating categories that you just finished adding to your site. If you want to simplify matters, and apply these rating categories to all of the posts on your site, tick the checkbox next to each of the available rating categories. Then, from the Apply to Posts in Category drop-down menu, select All Categories. This is most likely the configuration that you will use if you're building a website entirely dedicated to providing consumer reviews. Once you've finished, click Save Settings. If you, instead, want your newly added rating categories to only appear on certain categories, then bypass the Edit Rating Categories area for now and first look to the Apply to Posts in Category settings area. Currently this will only show All Categories and Uncategorized. The lack of categories in this menu is being caused by two things. First, you haven't added any WordPress categories to your site yet. Secondly, categories won't be included in this menu until they contain at least one post. To solve part of this problem, open a new browser window and then, navigate to Posts | Categories. Then, add the categories that you would like to include on your website. Now, click on Posts | Edit to visit the Edit Posts screen. At the moment, the Hello world! post is the only one published on your site and you can use it to force your site's categories to appear in the Apply to Posts in Category drop-down menu. So, hover over the title of this post and then, from the now visible set of links, click Quick Edit. In the Categories section of the Quick Edit configuration area, tick the checkbox next to each of the categories found on your site. Then, click Update Post. After content has been added to each of your site's categories, you can delete the Hello world! post, since you will no longer need to use it to force the categories to appear in the Apply to Posts in Category drop-down menu. Now, return to the Rating Categories screen and then select the first category that you want to configure from the Apply to Posts in Category drop-down menu. With that selected, in the Edit Rating Categories area, tick the checkbox next to each rating category that you want to appear within that WordPress category. Then, click Save Settings. Repeat this process for each of the WordPress categories to which you would like rating categories to be added. Comparison Tables If you wish, you can add a comparison table to either the home page or the category pages on your site. To do this, you need to visit the Comparison Tables screen, so click on Review Site | Comparison Tables. If you want to display a comparison table on your home page, then tick the checkbox next to Display a Comparison Table on Home Page. If you would like to include all of your site's categories in the comparison table that will be displayed on the home page, then leave the Categories To Display On Home Page textbox as is. However, if you would prefer to include only certain categories, then enter their category IDs, separated by commas, into the textbox instead. You can learn the ID numbers that have been assigned to each of your site's categories by opening a new browser window and then navigating to Posts | Categories. Once there, hover over the title of each of the categories found on the right hand side of your screen. As you do, look at the URL that appears in your browser's status bar and make a note of the number that appears directly after tag_ID=. That's the number that you will need to enter in the Comparison Table screen. If you want to display a comparison table in one or more categories, then tick the checkbox next to Display a Comparison Table on Category Page(s). Now, return to the Comparison Table screen. If you want a comparison table to be displayed on each of your category pages, leave the Categories To Display Comparison Table On textbox at its default. Otherwise, enter a list of comma separated category IDs into the textbox for the categories where you want to display comparison tables. The Number of Posts in the Table setting is currently set to 5, but you can enter another value if you would like a different number of posts to be included in each comparison table. When writing posts, you might use custom fields to include additional information. If you would like that information to be displayed in your comparison tables you will need to enter the names of those fields, separated by commas, into the Custom Fields to Display textbox. Lastly, you can change the text that appears in the Text for the Visit Site link in the Table if you wish or you may leave it at its default. With these configurations complete, click Save Settings. In this screenshot, you can see what a populated comparison table will look like on your website: Google Maps If you plan on featuring reviews centered around local businesses, then you might want to consider adding Google Maps to your site. This will make it easy for visitors to see exactly where each business is located. You can access this settings screen by clicking on Review Site | Google Maps. To activate this feature, tick the checkbox next to Display a Google Map on Posts/Pages with mapaddress Custom Field. Next, you need to use the Map Position setting to specify where these Google Maps will appear in relation to the content. You can choose to use either the Top of Post or Bottom of Post position. The Your Google Maps API Key textbox is next. Here you will need to enter a Google Maps API key. If you don't have a Google Maps API key for this domain, then you will need to visit Google to generate one. To do this, right-click on the link provided on the Google Maps screen and then open that link in a new browser window. You will then be taken to the Google Maps API sign up screen, which can be found at Google Maps API sign up. If you've ever signed up to use any of Google's services, then you can use that username and password to log in. If you don't have an account with Google, create one now. Take a moment to read the information and terms presented on the Google Maps API sign up page. After you've finished reviewing this text, if it's acceptable to you, enter the URL for your website into the My web site URL textbox and then click Generate API Key. You will then be taken to a thank you screen where your API key will be displayed. Copy the API key and then return to the Google Maps screen on your website. Once there, paste your API key into the textbox for Your Google Maps API Key. The Map Width and Map Height settings are next. By default, these are configured to 400px and 300px. If you would prefer that the maps be displayed at a different size, then enter new values into each of these textboxes. The last setting is Map Zoom Level (1-5), which is currently set to 3. This setting should be fine, but you may change it if you wish. Finally, click Save Settings. When you publish a post that includes the mappadress custom field, this is what the Google Map will look like on your site.
Read more
  • 0
  • 0
  • 7834

article-image-sop-module-setup-microsoft-dynamics-gp
Packt
17 May 2011
14 min read
Save for later

SOP Module Setup in Microsoft Dynamics GP

Packt
17 May 2011
14 min read
All module settings are company specific. If you have multiple companies in Dynamics GP, you will need to set up each module in each company separately. This is often advantageous, as different companies may require different settings for various features. If you would like to copy setup from one company to another, Microsoft Dynamics GP KnowledgeBase article 872709?9? 9? 9? 9? 9? describes how to do this: https://mbs.microsoft.com/knowledgebase/KBDisplay.aspx?scid=kb;en-us;872709 (CustomerSource or PartnerSource login required). The Sales Order Processing module, also commonly referred to as SOP, bridges the gap between the Inventory and Receivables modules in Dynamics GP. In SOP, you can enter quotes, orders, back orders, invoices, and returns, with detailed inventory and non-inventory items. SOP also integrates to the Purchase Order Processing module with the ability to automatically create purchase orders for sales orders that you do not have stock to fill. Setup for Sales Order Processing consists of the following steps: Sales Order Processing Setup Sales Document Setup User-Defined Fields SOP Document Numbers Sales Order Processing Setup Options E-mail Settings Customer Items Sales Order Processing Setup To begin setting up SOP, navigate to Microsoft Dynamics GP | Tools | Setup | Sales | Sales Order Processing. The following is a list of the fields on the Sales Order Processing Setup window: Display Item Unit Cost: Unmarking this will show the unit cost of items entered on sales order transactions as zero. Some companies prefer to hide item costs from users; however, this option hides them only on the Sales Transaction Entry window, not anywhere else in the system. So, unless a user has extremely limited access in Dynamics GP, this is not as useful as it sounds. Track Voided Transactions in History: It is recommended to leave this option marked, as there may often be a reason to look at voided sales transactions. Calculate Kit Price based on Component Cost: This option applies to pricing methods based on cost, which are not too common. Selecting this option will cause the cost of the kit (and thus the price) to be recalculated based on the cost of each component in the kit. Display Quantity Distribution Warning: Selecting this enables a warning during transaction entry that helps users avoid mistakes when entering quantities for transactions. It is recommended to leave this checked. Search for New Rates During Transfer Process: This setting only applies to Multicurrency transactions. If it is checked, then during the transfer of sales documents (for example from quote to order), Dynamics GP will look for an updated exchange rate. If this setting is not selected, the system will still verify that the exchange rate is valid, but will not update exchange rates unless they are expired. Track Master Numbers and Next Master Number: Dynamics GP offers the ability to track related transactions with a Master Number. For example, a quote, a resulting order, and two partial invoices may all have their own individual numbers but share one master number, thus allowing for easy lookups of all related transactions. It is recommended to leave this option selected, even if you do not foresee a need to use master numbers. The Next Master Number can be anything you would like, although there is typically no reason to change it from the default value of 1. Prices Not Required in Price List: This option works together with the Inventory module and, if selected, allows users to enter items without a price level set up. This may sometimes be needed, but it is better to leave this unchecked and create price lists for all items to keep setup consistent. If this option is selected and no Password is supplied, any user can enter an item without a price level on a transaction. With a Password selected, a user will be prompted to enter the password before they can continue. Convert Functional Price: This option is only available if Prices Not Required in Price List is selected. With this checked, if a price cannot be found for the item in the currency that is being used on the transaction, the functional currency price will be converted as needed, using the current exchange rate. Data Entry Defaults: These are defaults to help speed up data entry: Quantity Shortage: Choose what users will see as the default option when they enter a quantity that is greater than what is in stock. Document Date: When entering a new SOP transaction, the Document Date will default to either the User Date (this is at the top of every window next to the User ID and Company Name) or the date of the previously saved transaction. For companies that typically enter all invoices with the same date (for example, the end of the previous month), it is best to default this to the Previous Doc. Date. For companies that want to have the current date defaulted for new transactions, choose User Date. Price Warning: You can decide whether to warn users if the price being used on a transaction they are entering is a default price for an item because a price has not been set up for the customer's price level. This may be useful if there should be a price set up on the item price list for each customer price level and can help avoid mistakes. If you choose to give a warning, using the Message option is recommended, as it will give users an indication of what the problem is; the Beep option may not be enough to catch a user's attention when you consider the typical noise level in an office and the number of beeps various applications generate. Requested Ship Date: When SOP orders are entered into Dynamics GP, a Requested Ship Date automatically populates to help determine when orders should be shipped and when purchase orders may need to be placed. A user can change the requested ship date on each line item for an order; however, if there is a typical default you can use this setting to help speed up data entry. Choices are either Document Date (order date) or number of Days After Doc. Date. If the latter is chosen, a box will appear next to this option where you can fill in the number of days (up to 999). Document Defaults: Similar to a customer with multiple Address IDs, each transaction type in SOP can have multiple document types set up to follow different rules. During transaction entry, users select which transaction type and document type to use for the proper set of rules to be followed. This section allows you to optionally set up a default for each transaction type, as well as the Site ID and Checkbook ID, to help speed up data entry. If you decide to set these up, you will need to come back to this section after you have created the various document types for each transaction type. If there is a transaction type you will not be using, or that you want to make sure users have to proactively select during transaction entry, you can leave it blank. Posting Accounts From: If Item is chosen, the system will first look at the item's accounts, then at the Inventory series of the company posting accounts to determine what GL accounts to use for SOP transactions. If Customer is chosen, the system will first look at the customer's accounts, then at the Sales series of the company posting accounts to determine what GL accounts to use for SOP transactions. Maintain History: It is recommended to keep all history and leave all of these selections checked. Decimal Places for Non-Inventoried Items: If non-inventory items are entered on SOP transactions, these settings will determine how many decimal places to use for Quantities and Currency. If Multicurrency is enabled, use the expansion button (blue arrow) next to the Currency field to enter these for each currency. The following screenshot shows a typical Sales Order Processing Setup window: Sales Document Setup You can set up as many document types as you need for each Sales Order Processing transaction type. One reason to set up different document types is to set up different transaction rules. Another reason is to use different numbering schemes or be able to segregate transactions for reporting purposes. Dynamics GP has two types of transactions that are basically the same—a fulfillment order and an invoice. In most places you will see these listed together as Fulfillment Order/Invoice. When using Workflow with the SOP module, a fulfillment order becomes a separate transaction type that, once completed, turns into an invoice automatically. To start setting up document types, click the Sales Document Setup button at the bottom of the Sales Order Processing Setup window. You will see a list of the available transaction types: The sections below will go over the setup for each transaction type. Quote Quotes are typically the start of the sales process and can be entered for customers or prospects. Prospects can be created "on the fly", as they are needed in Dynamics GP, and then transferred to customers if they accept a quote and place an order. The following list explains the fields on the Sales Quote Setup window: Quote ID: An ID for this quote document type—up to 15 characters. This ID is what users will need to type or select on a sales transaction when using this quote document type. Quote ID Next Number: If you are setting up multiple Quote IDs and would like for each to have its own numbering scheme, you can enter the next quote number here. Otherwise, if all quotes share one numbering scheme that can be entered during SOP Document Numbers setup, and you can leave this blank. Days to Expire: The default number of days a quote is valid for. After expiration, the user will not be able to transfer quotes to orders. Users can change this as needed on individual quotes. Comment ID: This is the default Comment ID for a quote. For example, if all quotes are set to expire in 30 days, you could create a comment that says Prices are guaranteed for 30 days from quote date and enter the corresponding Comment ID here. The comment would automatically populate on every quote created with this Quote ID. Format: Dynamics GP has four report formats available for each SOP transaction type: Blank Paper, Short Form, Long Form, and Other Form. If you need to set up multiple Quote IDs that show different information when printed, each Quote ID can be defaulted to use a different report format. Transfer Quote to Order: If this quote can be transferred to an order, check the box and enter an Order ID. Transfer Quote to Fulfillment Order/Invoice: If this quote can be transferred to a fulfillment order or invoice, check the box and enter a Fulfillment Order/Invoice ID. Default Quantities: This determines whether the item quantities for the quote default to Quantity to Invoice or Quantity to Order. This setting will depend on whether you plan to typically transfer a quote to an order or an invoice. Use Prospects: Leave this checked if you want to allow prospects to be used on quotes, otherwise only customers will be allowed. Allow Repeating Documents: A quote can be set up to repeat. This is more typical for orders, but may be needed at times for quotes. This setting determines whether this Quote ID will be allowed to repeat. Options and Password scrolling list: each of the Options listed has an optional Password, saved in clear text. Delete Documents: With this unchecked, once saved, a quote cannot be deleted. Edit Printed Documents: With this unchecked, once a quote is printed, it cannot be changed. Override Document Numbers: If users can change the quote numbers when they are creating them, check this option (once created a quote number cannot be changed). Void Documents: Select this if users are allowed to void quotes. If so, make sure that you have selected Track Voided Transactions in History on the Sales Order Processing Setup window. Order In Dynamics GP, a sales transaction can start out as an order, or quotes can be transferred to orders. Depending on setup, orders can allocate inventory items, thus making them unavailable for other orders or invoices. The following is a list of the fields on the Sales Order Setup window: Order ID: An ID for the order document type—the maximum length allowed is 15 characters. This is what users will need to type or select when using this order document type. Order ID Next Number: If you are setting up multiple Order IDs and would like for each to have its own numbering scheme, you can enter the Next Number here. Otherwise, if all orders will share one numbering scheme, that numbering scheme can be entered during SOP Document Numbers setup. Comment ID: The default Comment ID for an order. Format: Dynamics GP has four report formats available for each SOP transaction type: Blank Paper, Short Form, Long Form, and Other Form. If you need to set up multiple Order IDs that show different information when printed, each Order ID can be defaulted to use a different report format. Allocate by: This setting determines how (or if) inventory is allocated for the Order ID. Allocated inventory is still considered On Hand stock, but it is not available for other orders or invoices. Line Item: Select this if you want to allocate inventory as each line is entered. Users will need to choose an action on each line with a quantity shortage, which could significantly slow down order entry if most of the orders being entered do not have inventory in stock. Document/Batch: Inventory is not allocated as orders are entered and a separate allocation process is run either for each order or batch of orders. This gives less visibility of inventory availability during order entry; however, this can greatly speed it up because each line item is not checked and dealt with individually. None: Orders are not allocated at all and only after an order is transferred to an invoice or fulfillment order is inventory allocated. Transfer Order to Back Order: If this order type can be transferred to a back order, check this selection and enter a Back Order ID. Many companies do not use back orders, and simply use the back ordered quantity on orders to track back ordered items. Transfer Order to Fulfillment Order/Invoice: There is no checkbox here, as the basic functionality of an order is to transfer to an invoice. The only choice is what Fulfillment Order/Invoice ID to use. Options section: Allow Repeating Documents: If this Order ID can be set up to be repeated, mark this checkbox. Allocate by has to be set to Document/Batch or None to enable this option. Use Separate Fulfillment Process: If this option is checked, a separate step to fulfill orders will be needed prior to being able to transfer them to invoices. Allow all Back Ordered Items to Print on Invoice: If this is selected, all back ordered items will be transferred to an invoice with a fulfilled quantity of zero, allowing them to print on the invoice. Most companies like to show only items that are being billed on invoices, so they would leave this option unchecked. Credit Limit Hold ID: Holds can offer an additional level of control in Sales Order Processing. Override Quantity to Invoice with Quantity Fulfilled: Marking this option will set the Quantity to Invoice to be the same as Quantity Fulfilled, if Quantity Fulfilled is not zero. If this option is checked, Enable Quantity Cancelled in Sales Order Fulfillment becomes available. If Transfer Order to Back Order is selected, Enable Quantity to Back Order in Sales Order Fulfillment will also be enabled. Options and Password scrolling list: Each of these Options has an optional Password, saved in clear text. Allow Invoicing of Unfulfilled or Partially Fulfilled Orders: This option is activated only when the Use Separate Fulfillment Process option is selected. Otherwise, if Use Separate Fulfillment Process is not selected, invoicing an unfulfilled or partially fulfilled order will not be allowed, even though you can select this option. Delete Documents: With this unchecked, a saved order cannot be deleted. Edit Printed Documents: With this unchecked, once an order is printed, it cannot be changed. Override Document Numbers: If users can change the order numbers when they are creating them, check this option (once created, an order number cannot be changed). Void Documents: Select this option if users are allowed to void orders. If so, make sure that you have selected Track Voided Transactions in History in the Sales Order Processing Setup window. The following screenshot shows a typical Sales Order Setup window:
Read more
  • 0
  • 0
  • 7804

article-image-understanding-basics-rxjava
Packt
20 Jun 2017
15 min read
Save for later

Understanding the Basics of RxJava

Packt
20 Jun 2017
15 min read
In this article, by Tadas Subonis author of the book Reactive Android Programming, will go through the core basics of RxJava so that we can fully understand what it is, what are the core elements, and how they work. Before that, let's take a step back and briefly discuss how RxJava is different from other approaches. RxJava is about reacting to results. It might be an item that originated from some source. It can also be an error. RxJava provides a framework to handle these items in a reactive way and to create complicated manipulation and handling schemes in a very easy-to-use interface. Things like waiting for an arrival of an item before transforming it become very easy with RxJava. To achieve all this, RxJava provides some basic primitives: Observables: A source of data Subscriptions: An activated handle to the Observable that receives data Schedulers: A means to define where (on which Thread) the data is processed First of all, we will cover Observables--the source of all the data and the core structure/class that we will be working with. We will explore how are they related to Disposables (Subscriptions). Furthermore, the life cycle and hook points of an Observable will be described, so we will actually know what's happening when an item travels through an Observable and what are the different stages that we can tap into. Finally, we will briefly introduce Flowable--a big brother of Observable that lets you handle big amounts of data with high rates of publishing. To summarize, we will cover these aspects: What is an Observable? What are Disposables (formerly Subscriptions)? How items travel through the Observable? What is backpressure and how we can use it with Flowable? Let's dive into it! (For more resources related to this topic, see here.) Observables Everything starts with an Observable. It's a source of data that you can observe for emitted data (hence the name). In almost all cases, you will be working with the Observable class. It is possible to (and we will!) combine different Observables into one Observable. Basically, it is a universal interface to tap into data streams in a reactive way. There are lots of different ways of how one can create Observables. The simplest way is to use the .just() method like we did before: Observable.just("First item", "Second item"); It is usually a perfect way to glue non-Rx-like parts of the code to Rx compatible flow. When an Observable is created, it is not usually defined when it will start emitting data. If it was created using simple tools such as.just(), it won't start emitting data until there is a subscription to the observable. How do you create a subscription? It's done by calling .subscribe() : Observable.just("First item", "Second item") .subscribe(); Usually (but not always), the observable be activated the moment somebody subscribes to it. So, if a new Observable was just created, it won't magically start sending data "somewhere". Hot and Cold Observables Quite often, in the literature and documentation terms, Hot and Cold Observables can be found. Cold Observable is the most common Observable type. For example, it can be created with the following code: Observable.just("First item", "Second item") .subscribe(); Cold Observable means that the items won't be emitted by the Observable until there is a Subscriber. This means that before the .subscribe() is called, no items will be produced and thus none of the items that are intended to be omitted will be missed, everything will be processed. Hot Observable is an Observable that will begin producing (emitting) items internally as soon as it is created. The status updates are produced constantly and it doesn't matter if there is something that is ready to receive them (like Subscription). If there were no subscriptions to the Observable, it means that the updates will be lost. Disposables A disposable (previously called Subscription in RxJava 1.0) is a tool that can be used to control the life cycle of an Observable. If the stream of data that the Observable is producing is boundless, it means that it will stay active forever. It might not be a problem for a server-side application, but it can cause some serious trouble on Android. Usually, this is the common source of memory leaks. Obtaining a reference to a disposable is pretty simple: Disposable disposable = Observable.just("First item", "Second item") .subscribe(); Disposable is a very simple interface. It has only two methods: dispose() and isDisposed() .  dispose() can be used to cancel the existing Disposable (Subscription). This will stop the call of .subscribe()to receive any further items from Observable, and the Observable itself will be cleaned up. isDisposed() has a pretty straightforward function--it checks whether the subscription is still active. However, it is not used very often in regular code as the subscriptions are usually unsubscribed and forgotten. The disposed subscriptions (Disposables) cannot be re-enabled. They can only be created anew. Finally, Disposables can be grouped using CompositeDisposable like this: Disposable disposable = new CompositeDisposable( Observable.just("First item", "Second item").subscribe(), Observable.just("1", "2").subscribe(), Observable.just("One", "Two").subscribe() ); It's useful in the cases when there are many Observables that should be canceled at the same time, for example, an Activity being destroyed. Schedulers As described in the documentation, a scheduler is something that can schedule a unit of work to be executed now or later. In practice, it means that Schedulers control where the code will actually be executed and usually that means selecting some kind of specific thread. Most often, Subscribers are used to executing long-running tasks on some background thread so that it wouldn't block the main computation or UI thread. This is especially relevant on Android when all long-running tasks must not be executed on MainThread. Schedulers can be set with a simple .subscribeOn() call: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .subscribe(); There are only a few main Schedulers that are commonly used: Schedulers.io() Schedulers.computation() Schedulers.newThread() AndroidSchedulers.mainThread() The AndroidSchedulers.mainThread() is only used on Android systems. Scheduling examples Let's explore how schedulers work by checking out a few examples. Let's run the following code: Observable.just("First item", "Second item") .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); The output will be as follows: on-next:main:First item subscribe:main:First item on-next:main:Second item subscribe:main:Second item Now let's try changing the code to as shown: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); Now, the output should look like this: on-next:RxCachedThreadScheduler-1:First item subscribe:RxCachedThreadScheduler-1:First item on-next:RxCachedThreadScheduler-1:Second item subscribe:RxCachedThreadScheduler-1:Second item We can see how the code was executed on the main thread in the first case and on a new thread in the next. Android requires that all UI modifications should be done on the main thread. So, how can we execute a long-running process in the background but process the result on the main thread? That can be done with .observeOn() method: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .observeOn(AndroidSchedulers.mainThread()) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); The output will be as illustrated: on-next:RxCachedThreadScheduler-1:First item on-next:RxCachedThreadScheduler-1:Second item subscribe:main:First item subscribe:main:Second item You will note that the items in the doOnNext block were executed on the "RxThread", and the subscribe block items were executed on the main thread. Investigating the Flow of Observable The logging inside the steps of an Observable is a very powerful tool when one wants to understand how they work. If you are in doubt at any point as to what's happening, add logging and experiment. A few quick iterations with logs will definitely help you understand what's going on under the hood. Let's use this technique to analyze a full flow of an Observable. We will start off with this script: private void log(String stage, String item) { Log.d("APP", stage + ":" + Thread.currentThread().getName() + ":" + item); } private void log(String stage) { Log.d("APP", stage + ":" + Thread.currentThread().getName()); } Observable.just("One", "Two") .subscribeOn(Schedulers.io()) .doOnDispose(() -> log("doOnDispose")) .doOnComplete(() -> log("doOnComplete")) .doOnNext(e -> log("doOnNext", e)) .doOnEach(e -> log("doOnEach")) .doOnSubscribe((e) -> log("doOnSubscribe")) .doOnTerminate(() -> log("doOnTerminate")) .doFinally(() -> log("doFinally")) .observeOn(AndroidSchedulers.mainThread()) .subscribe(e -> log("subscribe", e)); It can be seen that it has lots of additional and unfamiliar steps (more about this later). They represent different stages during the processing of an Observable. So, what's the output of the preceding script?: doOnSubscribe:main doOnNext:RxCachedThreadScheduler-1:One doOnEach:RxCachedThreadScheduler-1 doOnNext:RxCachedThreadScheduler-1:Two doOnEach:RxCachedThreadScheduler-1 doOnComplete:RxCachedThreadScheduler-1 doOnEach:RxCachedThreadScheduler-1 doOnTerminate:RxCachedThreadScheduler-1 doFinally:RxCachedThreadScheduler-1 subscribe:main:One subscribe:main:Two doOnDispose:main Let's go through some of the steps. First of all, by calling .subscribe() the doOnSubscribe block was executed. This started the emission of items from the Observable as we can see on the doOnNext and doOnEach lines. Finally, the stream finished at termination life cycle was activated--the doOnComplete, doOnTerminate and doOnFinally. Also, the reader will note that the doOnDispose block was called on the main thread along with the subscribe block. The flow will be a little different if .subscribeOn() and .observeOn() calls won't be there: doOnSubscribe:main doOnNext:main:One doOnEach:main subscribe:main:One doOnNext:main:Two doOnEach:main subscribe:main:Two doOnComplete:main doOnEach:main doOnTerminate:main doOnDispose:main doFinally:main You will readily note that now, the doFinally block was executed after doOnDispose while in the former setup, doOnDispose was the last. This happens due to the way Android Looper schedulers code blocks for execution and the fact that we used two different threads in the first case. The takeaway here is that whenever you are unsure of what is going on, start logging actions (and the thread they are running on) to see what's actually happening. Flowable Flowable can be regarded as a special type of Observable (but internally it isn't). It has almost the same method signature like the Observable as well. The difference is that Flowable allows you to process items that emitted faster from the source than some of the following steps can handle. It might sound confusing, so let's analyze an example. Assume that you have a source that can emit a million items per second. However, the next step uses those items to do a network request. We know, for sure, that we cannot do more than 50 requests per second: That poses a problem. What will we do after 60 seconds? There will be 60 million items in the queue waiting to be processed. The items are accumulating at a rate of 1 million items per second between the first and the second steps because the second step processes them at a much slower rate. Clearly, the problem here is that the available memory will be exhausted and the programming will fail with an OutOfMemory (OOM) exception. For example, this script will cause an excessive memory usage because the processing step just won't be able to keep up with the pace the items are emitted at. PublishSubject<Integer> observable = PublishSubject.create(); observable .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); for (int i = 0; i < 1000000; i++) { observable.onNext(i); } private void log(Throwable throwable) { Log.e("APP", "Error", throwable); } By converting this to a Flowable, we can start controlling this behavior: observable.toFlowable(BackpressureStrategy.MISSING) .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); Since we have chosen not to specify how we want to handle items that cannot be processed (it's called Backpressuring), it will throw a MissingBackpressureException. However, if the number of items was 100 instead of a million, it would have been just fine as it wouldn't hit the internal buffer of Flowable. By default, the size of the Flowable queue (buffer) is 128. There are a few Backpressure strategies that will define how the excessive amount of items should be handled. Drop Items Dropping means that if the downstream processing steps cannot keep up with the pace of the source Observable, just drop the data that cannot be handled. This can only be used in the cases when losing data is okay, and you care more about the values that were emitted in the beginning. There are a few ways in which items can be dropped. The first one is just to specify Backpressure strategy, like this: observable.toFlowable(BackpressureStrategy.DROP) Alternatively, it will be like this: observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureDrop() A similar way to do that would be to call .sample(). It will emit items only periodically, and it will take only the last value that's available (while BackpressureStrategy.DROP drops it instantly unless it is free to push it down the stream). All the other values between "ticks" will be dropped: observable.toFlowable(BackpressureStrategy.MISSING) .sample(10, TimeUnit.MILLISECONDS) .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); Preserve Latest Item Preserving the last items means that if the downstream cannot cope with the items that are being sent to them, stop emitting values and wait until they become available. While waiting, keep dropping all the values except the last one that arrived and when the downstream becomes available to send the last message that's currently stored. Like with Dropping, the "Latest" strategy can be specified while creating an Observable: observable.toFlowable(BackpressureStrategy.LATEST) Alternatively, by calling .onBackpressure(): observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureLatest() Finally, a method, .debounce(), can periodically take the last value at specific intervals: observable.toFlowable(BackpressureStrategy.MISSING) .debounce(10, TimeUnit.MILLISECONDS) Buffering It's usually a poor way to handle different paces of items being emitted and consumed as it often just delays the problem. However, this can work just fine if there is just a temporal slowdown in one of the consumers. In this case, the items emitted will be stored until later processing and when the slowdown is over, the consumers will catch up. If the consumers cannot catch up, at some point the buffer will run out and we can see a very similar behavior to the original Observable with memory running out. Enabling buffers is, again, pretty straightforward by calling the following: observable.toFlowable(BackpressureStrategy.BUFFER) or observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureBuffer() If there is a need to specify a particular value for the buffer, one can use .buffer(): observable.toFlowable(BackpressureStrategy.MISSING) .buffer(10) Completable, Single, and Maybe Types Besides the types of Observable and Flowable, there are three more types that RxJava provides: Completable: It represents an action without a result that will be completed in the future Single: It's just like Observable (or Flowable) that returns a single item instead of a stream Maybe: It stands for an action that can complete (or fail) without returning any value (like Completable) but can also return an item like Single However, all these are used quite rarely. Let's take a quick look at the examples. Completable Since Completable can basically process just two types of actions--onComplete and onError--we will cover it very briefly. Completable has many static factory methods available to create it but, most often, it will just be found as a return value in some other libraries. For example, the Completable can be created by calling the following: Completable completable = Completable.fromAction(() -> { log("Let's do something"); }); Then, it is to be subscribed with the following: completable.subscribe(() -> { log("Finished"); }, throwable -> { log(throwable); }); Single Single provides a way to represent an Observable that will return just a single item (thus the name). You might ask, why it is worth having it at all? These types are useful to tell the developers about the specific behavior that they should expect. To create a Single, one can use this example: Single.just("One item") The Single and the Subscription to it can be created with the following: Single.just("One item") .subscribe((item) -> { log(item); }, (throwable) -> { log(throwable); }); Make a note that this differs from Completable in that the first argument to the .subscribe() action now expects to receive an item as a result. Maybe Finally, the Maybe type is very similar to the Single type, but the item might not be returned to the subscriber in the end. The Maybe type can be created in a very similar fashion as before: Maybe.empty(); or like Maybe.just("Item"); However, the .subscribe() can be called with arguments dedicated to handling onSuccess (for received items), onError (to handle errors), and onComplete (to do a final action after the item is handled): Maybe.just("Item") .subscribe( s -> log("success: " + s), throwable -> log("error"), () -> log("onComplete") ); Summary In this article, we covered the most essentials parts of RxJava. Resources for Article: Further resources on this subject: The Art of Android Development Using Android Studio [article] Drawing and Drawables in Android Canvas [article] Optimizing Games for Android [article]
Read more
  • 0
  • 0
  • 7802
article-image-blocking-common-attacks-using-modsecurity-25-part-3
Packt
01 Dec 2009
12 min read
Save for later

Blocking Common Attacks using ModSecurity 2.5: Part 3

Packt
01 Dec 2009
12 min read
Source code revelation Normally, requesting a file with a .php extension will cause mod_php to execute the PHP code contained within the file and then return the resulting web page to the user. If the web server is misconfigured (for example if mod_php is not loaded) then the .php file will be sent by the server without interpretation, and this can be a security problem. If the source code contains credentials used to connect to an SQL database then that opens up an avenue for attack, and of course the source code being available will allow a potential attacker to scrutinize the code for vulnerabilities. Preventing source code revelation is easy. With response body access on in ModSecurity, simply add a rule to detect the opening PHP tag: Prevent PHP source code from being disclosed SecRule RESPONSE_BODY "<?" "deny,msg:'PHP source code disclosure blocked'" Preventing Perl and JSP source code from being disclosed works in a similar manner: # Prevent Perl source code from being disclosed SecRule RESPONSE_BODY "#!/usr/bin/perl" "deny,msg:'Perl source code disclosure blocked'" # Prevent JSP source code from being disclosed SecRule RESPONSE_BODY "<%" "deny,msg:'JSP source code disclosure blocked'" Directory traversal attacks Normally, all web servers should be configured to reject attempts to access any document that is not under the web server's root directory. For example, if your web server root is /home/www, then attempting to retrieve /home/joan/.bashrc should not be possible since this file is not located under the /home/www web server root. The obvious attempt to access the /home/joan directory is, of course, easy for the web server to block, however there is a more subtle way to access this directory which still allows the path to start with /home/www, and that is to make use of the .. symbolic directory link which links to the parent directory in any given directory. Even though most web servers are hardened against this sort of attack, web applications that accept input from users may still not be checking it properly, potentially allowing users to get access to files they shouldn't be able to view via simple directory traversal attacks. This alone is reason to implement protection against this sort of attack using ModSecurity rules. Furthermore, keeping with the principle of Defense in Depth, having multiple protections against this vulnerability can be beneficial in case the web server should contain a flaw that allows this kind of attack in certain circumstances. There is more than one way to validly represent the .. link to the parent directory. URL encoding of .. yields % 2e% 2e, and adding the final slash at the end we end up with % 2e% 2e% 2f(please ignore the space). Here, then is a list of what needs to be blocked: ../ ..% 2f .% 2e/ %  2e%  2e% 2f % 2e% 2e/ % 2e./ Fortunately, we can use the ModSecurity transformation t:urlDecode. This function does all the URL decoding for us, and will allow us to ignore the percent-encoded values, and thus only one rule is needed to block these attacks: SecRule REQUEST_URI "../" "t:urlDecode,deny" Blog spam The rise of weblogs, or blogs, as a new way to present information, share thoughts, and keep an online journal has made way for a new phenomenon: blog comments designed to advertise a product or drive traffic to a website. Blog spam isn't a security problem per se, but it can be annoying and cost a lot of time when you have to manually remove spam comments (or delete them from the approval queue, if comments have to be approved before being posted on the blog). Blog spam can be mitigated by collecting a list of the most common spam phrases, and using the ability of ModSecurity to scan POST data. Any attempted blog comment that contains one of the offending phrases can then be blocked. From both a performance and maintainability perspective, using the @pmFromFile operator is the best choice when dealing with large word lists such as spam phrases. To create the list of phrases to be blocked, simply insert them into a text file, for example, /usr/local/spamlist.txt: viagra v1agra auto insurance rx medications cheap medications ... Then create ModSecurity rules to block those phrases when they are used in locations such as the page that creates new blog comments: # # Prevent blog spam by checking comment against known spam # phrases in file /usr/local/spamlist.txt # <Location /blog/comment.php> SecRule ARGS "@pmFromFile /usr/local/spamlist.txt" "t: lowercase,deny,msg:'Blog spam blocked'" </Location> Keep in mind that the spam list file can contain whole sentences—not just single words—so be sure to take advantage of that fact when creating the list of known spam phrases. SQL injection SQL injection attacks can occur if an attacker is able to supply data to a web application that is then used in unsanitized form in an SQL query. This can cause the SQL query to do completely different things than intended by the developers of the web application. Consider an SQL query like this: SELECT * FROM user WHERE username = '%s' AND password = '%s'; The flaw here is that if someone can provide a password that looks like ' OR '1'='1, then the query, with username and password inserted, will become: SELECT * FROM user WHERE username = 'anyuser' AND password = '' OR '1'='1'; This query will return all users in the results table, since the OR '1'='1' part at the end of the statement will make the entire statement true no matter what username and password is provided. Standard injection attempts Let's take a look at some of the most common ways SQL injection attacks are performed. Retrieving data from multiple tables with UNION An SQL UNION statement can be used to retrieve data from two separate tables. If there is one table named cooking_recipes and another table named user_credentials, then the following SQL statement will retrieve data from both tables: SELECT dish_name FROM recipe UNION SELECT username, password FROM user_credentials; It's easy to see how the UNION statement can allow an attacker to retrieve data from other tables in the database if he manages to sneak it into a query. A similar SQL statement is UNION ALL, which works almost the same way as UNION—the only difference is that UNION ALL will not eliminate any duplicate rows returned in the result. Multiple queries in one call If the SQL engine allows multiple statements in a single SQL query then seemingly harmless statements such as the following can present a problem: SELECT * FROM products WHERE id = %d; If an attacker is able to provide an ID parameter of 1; DROP TABLE products;, then the statement suddenly becomes: SELECT * FROM products WHERE id = 1; DROP TABLE products; When the SQL engine executes this, it will first perform the expected SELECT query, and then the DROP TABLE products statement, which will cause the products table to be deleted. Reading arbitrary files MySQL can be used to read data from arbitrary files on the system. This is done by using the LOAD_FILE() function: SELECT LOAD_FILE("/etc/passwd"); This command returns the contents of the file /etc/passwd. This works for any file to which the MySQL process has read access. Writing data to files MySQL also supports the command INTO OUTFILE which can be used to write data into files. This attack illustrates how dangerous it can be to include user-supplied data in SQL commands, since with the proper syntax, an SQL command can not only affect the database, but also the underlying file system. This simple example shows how to use MySQL to write the string some data into the file test.txt: mysql> SELECT "some data" INTO OUTFILE "test.txt"; Preventing SQL injection attacks There are three important steps you need to take to prevent SQL injection attacks: Use SQL prepared statements. Sanitize user data. Use ModSecurity to block SQL injection code supplied to web applications. These are in order of importance, so the most important consideration should always be to make sure that any code querying SQL databases that relies on user input should use prepared statements. A prepared statement looks as follows: SELECT * FROM books WHERE isbn = ? AND num_copies < ?; This allows the SQL engine to replace the question marks with the actual data. Since the SQL engine knows exactly what is data and what SQL syntax, this prevents SQL injection from taking place. The advantages of using prepared statements are twofold: They effectively prevent SQL injection. They speed up execution time, since the SQL engine can compile the statement once, and use the pre-compiled statement on all subsequent query invocations. So not only will using prepared statements make your code more secure—it will also make it quicker. The second step is to make sure that any user data used in SQL queries is sanitized. Any unsafe characters such as single quotes should be escaped. If you are using PHP, the function mysql_real_escape_string() will do this for you. Finally, let's take a look at strings that ModSecurity can help block to prevent SQL injection attacks. What to block The following table lists common SQL commands that you should consider blocking, together with a suggested regular expression for blocking. The regular expressions are in lowercase and therefore assume that the t:lowercase transformation function is used. SQL code Regular expression UNION SELECT unions+select UNION ALL SELECT unions+alls+select INTO OUTFILE intos+outfile DROP TABLE drops+table ALTER TABLE alters+table LOAD_FILE load_file SELECT * selects+* For example, a rule to detect attempts to write data into files using INTO OUTFILE looks as follows: SecRule ARGS "intos+outfile" "t:lowercase,deny,msg: 'SQL Injection'" The s+ regular expression syntax allows for detection of an arbitrary number of whitespace characters. This will detect evasion attempts such as INTO%20%20OUTFILE where multiple spaces are used between the SQL command words. Website defacement We've all seen the news stories: "Large Company X was yesterday hacked and their homepage was replaced with an obscene message". This sort of thing is an everyday occurrence on the Internet. After the company SCO initiated a lawsuit against Linux vendors citing copyright violations in the Linux source code, the SCO corporate website was hacked and an image was altered to read WE OWN ALL YOUR CODE—pay us all your money. The hack was subtle enough that the casual visitor to the SCO site would likely not be able to tell that this was not the official version of the homepage: The above image shows what the SCO homepage looked like after being defaced—quite subtle, don't you think? Preventing website defacement is important for a business for several reasons: Potential customers will turn away when they see the hacked site There will be an obvious loss of revenue if the site is used for any sort of e-commerce sales Bad publicity will tarnish the company's reputation Defacement of a site will of course depend on a vulnerability being successfully exploited. The measures we will look at here are aimed to detect that a defacement has taken place, so that the real site can be restored as quickly as possible. Detection of website defacement is usually done by looking for a specific token in the outgoing web pages. This token has been placed within the pages in advance specifically so that it may be used to detect defacement—if the token isn't there then the site has likely been defaced. This can be sufficient, but it can also allow the attacker to insert the same token into his defaced page, defeating the detection mechanism. Therefore, we will go one better and create a defacement detection technology that will be difficult for the hacker to get around. To create a dynamic token, we will be using the visitor's IP address. The reason we use the IP address instead of the hostname is that a reverse lookup may not always be possible, whereas the IP address will always be available. The following example code in JSP illustrates how the token is calculated and inserted into the page. <%@ page import="java.security.*" %> <% String tokenPlaintext = request.getRemoteAddr(); String tokenHashed = ""; String hexByte = ""; // Hash the IP address MessageDigest md5 = MessageDigest.getInstance("MD5"); md5.update(tokenPlaintext.getBytes()); byte[] digest = md5.digest(); for (int i = 0; i < digest.length; i++) { hexByte = Integer.toHexString(0xFF & digest[i]); if (hexByte.length() < 2) { hexByte = "0" + hexByte; } tokenHashed += hexByte; } // Write MD5 sum token to HTML document out.println(String.format("<span style='color: white'>%s</span>", tokenHashed)); %>   Assuming the background of the page is white, the <span style="color: white"> markup will ensure it is not visible to website viewers. Now for the ModSecurity rules to handle the defacement detection. We need to look at outgoing pages and make sure that they include the appropriate token. Since the token will be different for different users, we need to calculate the same MD5 sum token in our ModSecurity rule and make sure that this token is included in the output. If not, we block the page from being sent and sound the alert by sending an email message to the website administrator. # # Detect and block outgoing pages not containing our token # SecRule REMOTE_ADDR ".*" "phase:4,deny,chain,t:md5,t:hexEncode, exec:/usr/bin/emailadmin.sh" SecRule RESPONSE_BODY "!@contains %{MATCHED_VAR}" We are placing the rule in phase 4 since this is required when we want to inspect the response body. The exec action is used to send an email to the website administrator to let him know of the website defacement.
Read more
  • 0
  • 1
  • 7801

article-image-7-things-java-programmers-need-to-watch-for-in-2019
Prasad Ramesh
24 Jan 2019
7 min read
Save for later

7 things Java programmers need to watch for in 2019

Prasad Ramesh
24 Jan 2019
7 min read
Java is one of the most popular and widely used programming languages in the world. Its dominance of the TIOBE index ranking is unmatched for the most part, holding the number 1 position for almost 20 years. Although Java’s dominance is unlikely to waver over the next 12 months, there are many important issues and announcements that will demand the attention of Java developers. So, get ready for 2019 with this list of key things in the Java world to watch out for. #1 Commercial Java SE users will now need a license Perhaps the most important change for Java in 2019 is that commercial users will have to pay a license fee to use Java SE from February. This move comes in as Oracle decided to change the support model for the Java language. This change currently affects Java SE 8 which is an LTS release with premier and extended support up to March 2022 and 2025 respectively. For individual users, however, the support and updates will continue till December 2020. The recently released Java SE 11 will also have long term support with five and extended eight-year support from the release date. #2 The Java 12 release in March 2019 Since Oracle changed their support model, non-LTS version releases will be bi-yearly and probably won’t contain many major changes. JDK 12 is non-LTS, that is not to say that the changes in it are trivial, it comes with its own set of new features. It will be generally available in March this year and supported until September which is when Java 13 will be released. Java 12 will have a couple of new features, some of them are approved to ship in its March release and some are under discussion. #3 Java 13 release slated for September 2019, with early access out now So far, there is very little information about Java 13. All we really know at the moment is that it’s’ due to be released in September 2019. Like Java 12, Java 13 will be a non-LTS release. However, if you want an early insight, there is an early access build available to test right now. Some of the JEP (JDK Enhancement Proposals) in the next section may be set to be featured in Java 13, but that’s just speculation. https://twitter.com/OpenJDK/status/1082200155854639104 #4 A bunch of new features in Java in 2019 Even though the major long term support version of Java, Java 11, was released last year, releases this year also have some new noteworthy features in store. Let’s take a look at what the two releases this year might have. Confirmed candidates for Java 12 A new low pause time compiler called Shenandoah is added to cause minimal interruption when a program is running. It is added to match modern computing resources. The pause time will be the same irrespective of the heap size which is achieved by reducing GC pause times. The Microbenchmark Suite feature will make it easier for developers to run existing testing benchmarks or create new ones. Revamped switch statements should help simplify the process of writing code. It essentially means the switch statement can also be used as an expression. The JVM Constants API will, the OpenJDK website explains, “introduce a new API to model nominal descriptions of key class-file and run-time artifacts”. Integrated with Java 12 is one AArch64 port, instead of two. Default CDS Archives. G1 mixed collections. Other features that may not be out with Java 12 Raw string literals will be added to Java. A Packaging Tool, designed to make it easier to install and run a self-contained Java application on a native platform. Limit Speculative Execution to help both developers and operations engineers more effectively secure applications against speculative-execution vulnerabilities. #5 More contributions and features with OpenJDK OpenJDK is an open source implementation of Java standard edition (Java SE) which has contributions from both Oracle and the open-source community. As of now, the binaries of OpenJDK are available for the newest LTS release, Java 11. Even the life cycles of OpenJDK 7 and 8 have been extended to June 2020 and 2023 respectively. This suggests that Oracle does seem to be interested in the idea of open source and community participation. And why would it not be? Many valuable contributions come from the open source community. Microsoft seems to have benefitted from open sourcing with the incoming submissions. Although Oracle will not support these versions after six months from initial release, Red Hat will be extending support. As the chief architect of the Java platform, Mark Reinhold said stewards are the true leaders who can shape what Java should be as a language. These stewards can propose new JEPs, bring new OpenJDK problems to notice leading to more JEPs and contribute to the language overall. #6 Mobile and machine learning job opportunities In the mobile ecosystem, especially Android, Java is still the most widely used language. Yes, there’s Kotlin, but it is still relatively new. Many developers are yet to adopt the new language. According to an estimated by Indeed, the average salary of a Java developer is about $100K in the U.S. With the Android ecosystem growing rapidly over the last decade, it’s not hard to see what’s driving Java’s value. But Java - and the broader Java ecosystem - are about much more than mobile. Although Java’s importance in enterprise application development is well known, it's also used in machine learning and artificial intelligence. Even if Python is arguably the most used language in this area, Java does have its own set of libraries and is used a lot in enterprise environments. Deeplearning4j, Neuroph, Weka, OpenNLP, RapidMiner, RL4J etc are some of the popular Java libraries in artificial intelligence. #7 Java conferences in 2019 Now that we’ve talked about the language, possible releases and new features let’s take a look at the conferences that are going to take place in 2019. Conferences are a good medium to hear top professionals present, speak, and programmers to socialize. Even if you can’t attend, they are important fixtures in the calendar for anyone interested in following releases and debates in Java. Here are some of the major Java conferences in 2019 worth checking out: JAX is a Java architecture and software innovation conference. To be held in Mainz, Germany happening May 6–10 this year, the Expo is from May 7 to 9. Other than Java, topics like agile, Cloud, Kubernetes, DevOps, microservices and machine learning are also a part of this event. They’re offering discounts on passes till February 14. JBCNConf is happening in Barcelona, Spain from May 27. It will be a three-day conference with talks from notable Java champions. The focus of the conference is on Java, JVM, and open-source technologies. Jfokus is a developer-centric conference taking place in Stockholm, Sweden. It will be a three-day event from February 4-6. Speakers include the Java language architect, Brian Goetz from Oracle and many other notable experts. The conference will include Java, of course, Frontend & Web, cloud and DevOps, IoT and AI, and future trends. One of the biggest conferences is JavaZone attracting thousands of visitors and hundreds of speakers will be 18 years old this year. Usually held in Oslo, Norway in the month of September. Their website for 2019 is not active at the time of writing, you can check out last year’s website. Javaland will feature lectures, training, and community activities. Held in Bruehl, Germany from March 19 to 21 attendees can also exhibit at this conference. If you’re working in or around Java this year, there’s clearly a lot to look forward to - as well as a few unanswered questions about the evolution of the language in the future. While these changes might not impact the way you work in the immediate term, keeping on top of what’s happening and what key figures are saying will set you up nicely for the future. 4 key findings from The State of JavaScript 2018 developer survey Netflix adopts Spring Boot as its core Java framework Java 11 is here with TLS 1.3, Unicode 11, and more updates
Read more
  • 0
  • 0
  • 7788

article-image-how-to-build-topic-models-in-r-tutorial
Natasha Mathur
22 Apr 2019
8 min read
Save for later

How to build topic models in R [Tutorial]

Natasha Mathur
22 Apr 2019
8 min read
Topic models are a powerful method to group documents by their main topics. Topic models allow probabilistic modeling of term frequency occurrence in documents. The fitted model can be used to estimate the similarity between documents, as well as between a set of specified keywords using an additional layer of latent variables, which are referred to as topics (Grun and Hornik, 2011). In essence, a document is assigned to a topic based on the distribution of the words in that document, and the other documents in that topic will have roughly the same frequency of words. In this tutorial, we will look at a useful framework for text mining, called topic models. We will apply the framework to the State of the Union addresses. In building topic models, the number of topics must be determined before running the algorithm (k-dimensions). If no prior reason for the number of topics exists, then you can build several and apply judgment and knowledge to the final selection. There are different methods that come under Topic Modeling. We'll look at LDA with Gibbs sampling. This method is quite complicated mathematically, but my intent is to provide an introduction so that you are at least able to describe how the algorithm learns to assign a document to a topic in layperson terms. If you are interested in mastering the math associated with the method, block out a couple of hours on your calendar and have a go at it. Excellent background material can be found here.  This tutorial is an excerpt taken from the book 'Mastering Machine Learning with R - Third Edition' written by Cory Lesmeister. The book explores expert techniques for solving data analytics and covers machine learning challenges that can help you gain insights from complex projects and power up your applications. Talking about LDA  or Latent Dirichlet Allocation in topic modeling, it is a generative process, and works in the following manner to iterate to a steady state: For each document (j), there are 1 to j documents. We will randomly assign a multinomial distribution (Dirichlet distribution) to the topics (k) with 1 to k topics, for example, document A is 25 percent topic one, 25 percent topic two, and 50 percent topic three. Probabilistically, for each word (i), there are 1 to i words to a topic (k); for example, the word mean has a probability of 0.25 for the topic statistics. For each word (i) in document (j) and topic (k), calculate the proportion of words in that document assigned to that topic; note it as the probability of topic (k) with document (j), p(k|j), and the proportion of word (i) in topic (k) from all the documents containing the word. Note it as the probability of word (i) with topic (k), p(i|k). Resample, that is, assign w a new t based on the probability that t contains w, which is based on p(k|j) times p(i|k). Rinse and repeat; over numerous iterations, the algorithm finally converges and a document is assigned a topic based on the proportion of words assigned to a topic in that document. The LDA assumes that the order of words and documents does not matter. There has been work done to relax these assumptions in order to build models of language generation and sequence models over time (known as dynamic topic modeling or DTM). Applying Topic models in State of the Union addresses We will leave behind the 19th century and look at these recent times of trial and tribulation (1965 through 2016). On looking at this data, I found something interesting and troubling. Let's take a look at the 1970s: > sotu_meta[185:191, 1:4] # A tibble: 7 x 4 president year years_active party <chr> <int> <chr> <chr> 1 Richard M. Nixon 1970 1969-1973 Republican 2 Richard M. Nixon 1971 1969-1973 Republican 3 Richard M. Nixon 1972 1969-1973 Republican 4 Richard M. Nixon 1972 1969-1973 Republican 5 Richard M. Nixon 1974 1973-1974 Republican 6 Richard M. Nixon 1974 1973-1974 Republican 7 Gerald R. Ford 1975 1974-1977 Republican We see there are two 1972 and two 1974 addresses, but none for 1973. What? I went to the Nixon Foundation website, spent about 10 minutes trying to deconflict this, and finally threw my hands in the air and decided on implementing a quick fix. Be advised that there are a number of these conflicts to put in order: > sotu_meta[188, 2] <- "1972_2" > sotu_meta[190, 2] <- "1974_2" > sotu_meta[157, 2] <- "1945_2" > sotu_meta[166, 2] <- "1953_2" > sotu_meta[170, 2] <- "1956_2" > sotu_meta[176, 2] <- "1961_2" > sotu_meta[195, 2] <- "1978_2" > sotu_meta[197, 2] <- "1979_2" > sotu_meta[199, 2] <- "1980_2" > sotu_meta[201, 2] <- "1981_2" An email to the author of this package is in order. I won't bother with that, but feel free to solve the issue yourself. With this tragedy behind us, we'll go through tokenizing and removing stop words again for our relevant time frame: > sotu_meta_recent <- sotu_meta %>% dplyr::filter(year > 1964) > sotu_meta_recent %>% tidytext::unnest_tokens(word, text) -> sotu_unnest_recent > sotu_recent <- sotu_unnest_recent %>% dplyr::anti_join(stop_words, by = "word") As discussed previously, we need to put the data into a DTM before building a model. This is done by creating a word count grouped by year, then passing that to the cast_dtm() function: > sotu_recent %>% dplyr::group_by(year) %>% dplyr::count(word) -> lda_words > sotu_dtm <- tidytext::cast_dtm(lda_words, year, word, n) Let's get our model built. I'm going to create six different topics using the Gibbs method, and I specified verbose. It should run 2,000 iterations: > sotu_lda <- topicmodels::LDA( sotu_dtm, k = 6, method = "Gibbs", control = list(seed = 1965, verbose = 1) ) > sotu_lda A LDA_Gibbs topic model with 6 topics. The algorithm gives each topic a number. We can see what year is mapped to what topic. I abbreviate the output since 2002: > topicmodels::topics(sotu_lda) 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 We see a clear transition between Bush and Obama from topic 2 to topic 4. Here is a table of the count of topics: > table(topicmodels::topics(sotu_lda)) 1 2 3 4 5 6 8 7 5 18 14 5 Topic 4 is the most prevalent, which is associated with Clinton's term also. This output gives us the top five words associated with each topic: > topicmodels::terms(sotu_lda, 5) Topic 1 Topic 2 Topic 3 [1,] "future" "america" "administration" [2,] "tax" "security" "congress" [3,] "spending" "country" "economic" [4,] "government" "world" "legislation" [5,] "economic" "iraq" "energy" Topic 4 Topic 5 Topic 6 [1,] "people" "world" "federal" [2,] "american" "people" "programs" [3,] "jobs" "american" "government" [4,] "america" "congress" "program" [5,] "children" "peace" "act" This all makes good sense, and topic 2 is spot on for the time. If you drill down further to, say, 10, 15, or 20 words, it is even more revealing, but I won't bore you further. What about an application in the tidy ecosystem and a visualization? Certainly! We'll turn the model object into a data frame first and in the process capture the per-topic-per-word probabilities called beta: > lda_topics <- tidytext::tidy(sotu_lda, matrix = "beta") > ap_top_terms <- lda_topics %>% dplyr::group_by(topic) %>% dplyr::top_n(10, beta) %>% dplyr::ungroup() %>% dplyr::arrange(topic, -beta) We can explore that data further or just plot it as follows: > ap_top_terms %>% dplyr::mutate(term = reorder(term, beta)) %>% ggplot2::ggplot(ggplot2::aes(term, beta, fill = factor(topic))) + ggplot2::geom_col(show.legend = FALSE) + ggplot2::facet_wrap(~ topic, scales = "free") + ggplot2::coord_flip() + ggthemes::theme_economist_white() The output of the preceding code is as follows: This is the top 10 words per topic based on the beta probability. Another thing we can do is look at the probability an address is related to a topic. This is referred to as gamma in the model and we can pull those in just like the beta: > ap_documents <- tidytext::tidy(sotu_lda, matrix = "gamma") We now have the probabilities of an address per topic. Let's look at the 1981 Ronald Reagan values: > dplyr::filter(ap_documents, document == "1981") # A tibble: 6 x 3 document topic gamma <chr> <int> <dbl> 1 1981 1 0.286 2 1981 2 0.0163 3 1981 3 0.0923 4 1981 4 0.118 5 1981 5 0.0777 6 1981 6 0.411 Topic 1 is a close second in the topic race. If you think about it, this means that more than six topics would help to create better separation in the probabilities. However, I like just six topics for this tutorial for the purpose of demonstration. In this tutorial, we looked at topic models in R. We applied the framework to the State of the Union addresses.  If you want to stay updated with expert techniques for solving data analytics and explore other machine learning challenges in R, be sure to check out the book 'Mastering Machine Learning with R - Third Edition'. How to make machine learning based recommendations using Julia [Tutorial] The rise of machine learning in the investment industry GitHub Octoverse: top machine learning packages, languages, and projects of 2018
Read more
  • 0
  • 0
  • 7773
article-image-working-forms-dynamics-ax-part-2
Packt
06 Jan 2010
13 min read
Save for later

Working with Forms in Dynamics AX: Part 2

Packt
06 Jan 2010
13 min read
Adding form splitters Commonly used forms like Sales orders or Projects in Dynamics AX have multiple grids. Normally, one grid is in the upper section and another one is in the bottom section of the form. Sometimes grids are placed next to each other. The size of the data in each grid may vary, and that's why most of the forms with multiple grids have splitters in the middle so users can resize both grids at once by dragging the splitter with the help of a mouse. It is a good practice to add splitters to newly created forms. Although Microsoft developers did a good job by adding splitters to most of the multi-grid forms, there is still at least one that has not got it. It is the Account reconciliation form in the Bank module, which is one of the most commonly used forms. It can be opened from Bank | Bank Account Details, Functions | Account reconciliation button, and then the Transactions button. In the following screenshot, you can see that the size of the bottom grid cannot be changed: In this recipe, we will demonstrate the usage of splitters by resolving this situation. We will add a form splitter in the middle of the two grids in the mentioned form. It will allow users to define the sizes of both grids to make sure that the data is displayed optimally. How to do it... Open the BankReconciliation form in AOT, and create a new Group at the very top of the form's design with the following properties:   Property Value Name Top AutoDeclaration Yes FrameType None Width Column width   Move the AllReconciled, Balances, and Tab controls into the newly created group. Create a new Group right below the Top group with properties: Property Value Name Splitter AutoDeclaration Yes Width Column width Height 5 FrameType Raised 3D BackgroundColor Window background HideIfEmpty No AlignChild No Add the following line of code to the bottom of the form's class declaration: SysFormSplitter_Y fs; Add the following line of code to the bottom of the form's init(): fs = new SysFormSplitter_Y(Splitter, Top, element); Override three methods in the Splitter group with the following code: public int mouseDown( int _x, int _y, int _button, boolean _ctrl, boolean _shift) { return fs.mouseDown(_x, _y, _button, _ctrl, _shift); } public int mouseMove( int _x, int _y, int _button, boolean _ctrl, boolean _shift) { return fs.mouseMove(_x, _y, _button, _ctrl, _shift); } public int mouseUp( int _x, int _y, int _button, boolean _ctrl, boolean _shift) { return fs.mouseUp(_x, _y, _button, _ctrl, _shift); } Change the following properties of the existing BankTransTypeGroup group: Property Value Top Auto Width Column width Height Column height Change the following property of the exiting TypeSums grid located in BankTransTypeGroup group: Property Value Height Column height In AOT the Modified BankReconciliation form should look like the following screenshot: Now, to test the results, open Bank | Bank Account Details, select any bank account, click Functions | Account reconciliation, choose an existing or create a new account statement, and click the Transactions button. Notice that now the form has a nice splitter in the middle, which makes the form look better and allows defining the size of each grid: How it works... Normally a splitter is placed between two form groups. In this recipe, to follow that rule, we need to adjust the BankReconciliation form's design. The filter AllReconciled, the group Balances and the tab Tab are moved to a new group called Top. We do not want this new group to be visible to user, so we set FrameType to None. Setting AutoDeclaration to Yes allows us to access this object from X++ code. And finally, we make this group automatically expand in the horizontal direction by setting its Width to Column width. At this stage, visual form layout did not change, but now we have the upper group ready. The BankTransTypeGroup group could be used as a bottom group with slight changes. We change its Top behavior to Auto and make it fully expandable in the horizontal and vertical directions. The Height of the grid inside this group also has to be changed to Column height in order to fill all the vertical space. In the middle of those two groups, we add a splitter. The splitter is nothing else but another group, which looks like a splitter. In order to achieve that, we set Height to 5, FrameType to Raised 3D, and BackgroundColor to Windows background. This group does not hold any other controls inside. Therefore, in order to make it visible, we have to set the property HideIfEmpty to No. The value No of the property AlignChild makes the splitter begin on the very left side of the form and the Column width value of the property Width forces the splitter to automatically fill the form's width. Mouse events are handled by the SysFormSplitter_Y application class. After it has been declared in the form's class declaration, we create the actual object in the form's init(). We pass the name of the splitter control, the name of the top group and the form itself as arguments when creating it. A fully working splitter requires three mouse event handlers. It is implemented by overriding the mouseMove(), mouseDown(), and mouseUp() methods in the splitter group control. All arguments are passed to the respective member methods of the SysFormSplitter_Y class which does all the job. In this way, horizontal splitters can be easily added to any form. The Dynamics AX application also contains nice examples about splitters, which can be found in AOT in the Tutorial_Form_Split form. Vertical splitters can also be added to forms using a very similar approach. For this, we need to use another application class called SysFormSplitter_X. Creating modal forms During my trainings and working with Dynamics AX users, I noticed that people who are not familiar with computers and software tend to get lost among open application windows. The same could be applied to Dynamics AX. I experienced many times when a user opened one form, clicked some button to open another one, and then went back to the first one without closing the second one. Sometimes this happens intentionally, sometimes—not, but the result is that the second form is hidden behind the first one and the user starts wondering why it is not possible to close or edit the first form. Such issues could be easily solved by making the child form a modal window. In other words, the second form always stays on top of the fi rst one until closed. In this recipe, we will do exactly that. As an example, we will make the Create sales order form a modal window. How to do it... Open the SalesCreateOrder form in AOT, and set its Design property: Property Value WindowType Popup To test, open Accounts receivable | Sales Order Details, and start creating a new order. Notice that now the sales order creation form always stays on top of the Sales order form: How it works... Dynamics AX form design has a WindowType property, which is set to Standard by default. In order to make a form behave as a modal window, we have to change it to Popup. Such forms will always stay on top of the parent form. There's more... We already know that some of the Dynamics AX forms are created dynamically using the Dialog class. If we look deeper into the code, we could find that the Dialog class actually creates a runtime Dynamics AX form. That means we can apply the same principle, i.e. change the relevant form's design property. The following code could be added to the Dialog object and would do the job: dialog.dialogForm().buildDesign().windowType( FormWindowType::Popup); We get a reference to the form's design, by first using dialogForm() of the Dialog object to get a reference to the DialogForm object, and then we call buildDesign() on the latter object. Then, we set the design's property by calling its windowType() with an argument FormWindowType::Popup. Changing common form appearance In every single multi-company Dynamics AX project, in order to prevent user mistakes, I was asked to add functionality that allows setting the background color of every form per company. By doing that, users clearly see in which company account they are at the moment and can easily work within multiple companies at the same time. In this recipe, we will modify SysSetupFormRun class to change the background color for every form in Dynamics AX. How to do it... Open SysSetupFormRun in AOT, and override its run() with the following code: public void run() {; super(); this.design().colorScheme(FormColorScheme::RGB); this.design().backgroundColor(WinAPI::RGB2int(255,0,0)); } To test the results, open any Dynamics AX form, for example, General ledger | Chart of Accounts Details and notice how the background color is changed to red: How it works... SysSetupFormRun is the application class that is called by the system every time a user runs a form. The best place to add our custom code is to override the run() method and place it under the super(). We use this.design() to get a reference to the form's design. By calling colorScheme() and backgroundColor(), we set the color scheme to red/green/blue and the color code to red. We use WinAPI::RGB2int() to transform the human-readable red/green/blue code into the numeric color code. There's more... This recipe showed a very basic principle of how to change the common appearance of all forms with few lines of code. You noticed that the color in this recipe does not fi ll all areas of the form, which does not make the form look nice. An alternative to this could be to dynamically add a colored rectangle or something similar to the top of the form. The possibilities are endless here. New controls like input fields, buttons, menu items, and others could also be added to all forms dynamically using this class. But do not overdo as it may impact system performance. Storing last form values Dynamics AX has a very useful feature, which allows saving the latest user choices per user per form. This feature is implemented across a number of standard reports, periodic jobs, and other objects, which require user input. When developing a new functionality for Dynamics AX, I always try to keep that practice. One of the frequently used areas is custom filters for grid-based forms. Although, Dynamics AX allows users to use standard filtering for any grid, in practice sometimes it is not very useful, especially when the user requires something specific. In this recipe, we will see how to store the latest user filter selections. To make it as simple as possible, we will use existing filters on the General journal form, which can be opened from General ledger | Journals | General journal. This form contains two filters—Show and Show user-created only. Show allows displaying journals by their posting status and Show user-created only toggles between all journals and the currently logged user's journals. How to do it... Find the LedgerJournalTable form in AOT, and add the following code to the bottom of its class declaration: AllOpenPosted showStatus; NoYes showCurrentUser; #define.CurrentVersion(1) #localmacro.CurrentList showStatus, showCurrentUser #endmacro Create these additional form methods: public void initParmDefault() {; showStatus = AllOpenPosted::Open; showCurrentUser = true; } public container pack() { return [#CurrentVersion,#CurrentList]; } public boolean unpack(container packedClass) { int version = RunBase::getVersion(packedClass); ; switch (version) { case #CurrentVersion: [version,#CurrentList] = packedClass; return true; default: return false; } return false; } public identifiername lastValueDesignName() { return element.args().menuItemName(); } public identifiername lastValueElementName() { return this.name(); } public UtilElementType lastValueType() { return UtilElementType::Form; } public userId lastValueUserId() { return curuserid(); } public dataAreaId lastValueDataAreaId() { return curext(); } xSysLastValue::getLast(this); AllOpenPostedField.selection(showStatus); ShowUserCreatedOnly.value(showCurrentUser); journalFormTable.designSelectionChangeAllOpenPosted(); journalFormTable.designSelectionChangeShowUserCreateOnly(); And the following code to the bottom of the form's close(): showStatus = AllOpenPostedField.selection(); showCurrentUser = ShowUserCreatedOnly.value(); xSysLastValue::saveLast(this); Now to test the form, open General ledger | Journals | General journal, change filter values, close it, and run again. The latest filter selections should stay: How it works... First of all, we define some variables. We will store the journal posting status filter value in showStatus and the current user filter value in showCurrentUser. The macro #CurrentList is used to define a list of variables that we are going to store. Currently, we have two variables. The macro #CurrentVersion defines a version of saved values. In other words, it says that the variables defined by the #CurrentList, which will be stored in system cache later, can be addressed using the number 1. Normally, when implementing last value saving for the first time for particular object, #CurrentVersion is set to 1. Later on, if we decide to add new values or change existing ones, we have to change the value of #CurrentVersion, normally increasing it by 1. This ensures that the system addresses the correct list of variables in the cache and does not break existing functionality. The initParmDefault()method specifies default values if nothing is found in the system cache. Normally, this happens if we run a form for the first time, we change #CurrentVersion or clean the cache. Later, this method is called automatically by the xSysLastValue object. The methods pack() and unpack() are responsible for formatting a storage container from variables and extracting variables from a storage container respectively. In our case, pack() returns a container consisting of three values: version number, posting status, and current user toggle. Those values will be sent to the system cache after the form is closed. During an opening of the form, the xSysLastValue object uses unpack() to extract values from the stored container. It checks the container version from cache first, and if it matches the current version number, then the values from the cache are considered correct and are assigned to the form variables. A combination of lastValueDesignName(), lastValueElementName(), lastValueType(), and lastValueDataAreaId() return values form unique string representing saved values. This ensures that different users can store last values for different objects without overriding each other's values in cache. The lastValueDesignName() method is meant to return the name of the object's current design in cases where the object can have several designs. In this recipe, there is only one design, so instead of leaving it empty, I used it for a slightly different purpose. The same LedgerJournalTable AOT form can represent different user forms like Ledger journal, Periodic journal, Vendor payment journal, and so on depending on the location from which it was opened. To ensure that the user's latest choices are saved correctly, we included the opening menu item name as part of the unique string. The last two pieces of code need to be added to the bottom of the form's run() and close(). In the run() method, xSysLastValue::getLast(this) retrieves saved user values from cache and assigns them to the form's variables. The next two lines assign the same values to the respective form controls. designSelectionChangeAllOpenPosted() and designSelectionChangeShowUserCreateOnly() execute a form query to apply updated filters. Although both of those methods currently perform exactly the same action, we keep both for the future in case this functionality is updated. Code lines in close() are responsible for assigning user selections to variables and saving them to cache by calling xSysLastValue::saveLast(this).
Read more
  • 0
  • 0
  • 7770

article-image-r-statistical-package-interfacing-python
Janu Verma
17 Nov 2016
8 min read
Save for later

The R Statistical Package Interfacing with Python

Janu Verma
17 Nov 2016
8 min read
One of my coding hobbies is to explore different Python packages and libraries. In this post, I'll talk about the package rpy2, which is used to call R inside python. Being an avid user of R and a huge supporter of R graphical packages, I had always desired to call R inside my Python code to be able to produce beautiful visualizations. The R framework offers machinery for a variety of statistical and data mining tasks. Let's review the basics of R before we delve into R-Python interfacing. R is a statistical language which is free, is open source, and has comprehensive support for various statistical, data mining, and visualization tasks. Quick-R describes it as: "R is an elegant and comprehensive statistical and graphical programming language." R is one of the fastest growing languages, mainly due to the surge in interest in statistical learning and data science. The Data Science Specialization on Coursera has all courses taught in R. There are R packages for machine learning, graphics, text mining, bioinformatics, topics modeling, interactive visualizations, markdown, and many others. In this post, I'll give a quick introduction to R. The motivation is to acquire some knowledge of R to be able to follow the discussion on R-Python interfacing. Installing R R can be downloaded from one of the Comprehensive R Archive Network (CRAN) mirror sites. Running R To run R interactively on the command line, type r. Launch the standard GUI (which should have been included in the download) and type R code in it. RStudio is the most popular IDE for R. It is recommended, though not required, to install RStudio and run R on it. To write a file with R code, create a file with the .r extension (for example, myFirstCode.r). And run the code by typing the following on the terminal: Rscript file.r Basics of R The most fundamental data structure in R is a vector; actually everything in R is a vector (even numbers are 1-dimensional vectors). This is one of the strangest things about R. Vectors contain elements of the same type. A vector is created by using the c() function. a = c(1,2,5,9,11) a [1] 1 2 5 9 11 strings = c("aa", "apple", "beta", "down") strings [1] "aa" "apple" "beta" "down" The elements in a vector are indexed, but the indexing starts at 1 instead of 0, as in most major languages (for example, python). strings[1] [1] "aa" The fact that everything in R is a vector and that the indexing starts at 1 are the main reasons for people's initial frustration with R (I forget this all the time). Data Frames A lot of R packages expect data as a data frame, which are essentially matrices but the columns can be accessed by names. The columns can be of different types. Data frames are useful outside of R also. The Python package Pandas was written primarily to implement data frames and to do analysis on them. In R, data frames are created (from vectors) as follows: students = c("Anne", "Bret", "Carl", "Daron", "Emily") scores = c(7,3,4,9,8) grades = c('B', 'D', 'C', 'A', 'A') results = data.frame(students, scores, grades) results students scores grades 1 Anne 7 B 2 Bret 3 D 3 Carl 4 C 4 Daron 9 A 5 Emily 8 A The elements of a data frame can be accessed as: results$students [1] Anne Bret Carl Daron Emily Levels: Anne Bret Carl Daron Emily This gives a vector, the elements of which can be called by indexing. results$students[1] [1] Anne Levels: Anne Bret Carl Daron Emily Reading Files Most of the times the data is given as a comma-separated values (csv) file or a tab-separated values (tsv) file. We will see how to read a csv/tsv file in R and create a data frame from it. (Aside: The datasets in most Kaggle competitions are given as csv files and we are required to do machine learning on them. In Python, one creates a pandas data frame or a numpy array from this csv file.) In R, we use a read.csv or read.table command to load a csv file into memory, for example, for the Titanic competition on Kaggle: training_data <- read.csv("train.csv", header=TRUE) train <- data.frame(survived=train_all$Survived, age=train_all$Age, fare=train_all$Fare, pclass=train_all$Pclass) Similarly, a tsv file can be loaded as: data <- read.csv("file.tsv";, header=TRUE, delimiter="t") Thus given a csv/tsv file with or without headers, we can read it using the read.csv function and create a data frame using: data.frame(vector_1, vector_2, ... vector_n). This should be enough to start exploring R packages. Another command that is very useful in R is head(), which is similar to the less command on Unix. rpy2 First things first, we need to have both Python and R installed. Then install rpy2 from the Python package index (Pypi). To do this, simply type the following on the command line: pip install rpy2 We will use the high-level interface to R, the robjects subpackage of rpy2. import rpy2.robjects as ro We can pass commands to the R session by putting the R commands in the ro.r() method as strings. Recall that everything in R is a vector. Let's create a vector using robjects: ro.r('x=c(2,4,6,8)') print(ro.r('x')) [1] 2 4 6 8 Keep in mind that though x is an R object (vector), ro.r('x') is a Python object (rpy2 object). This can be checked as follows: type(ro.r('x')) <class 'rpy2.robjects.vectors.FloatVector'> The most important data types in R are data frames, which are essentially matrices. We can create a data frame using rpy2: ro.r('x=c(2,4,6,8)') ro.r('y=c(4,8,12,16)') ro.r('rdf=data.frame(x,y)') This created an R data frame, rdf. If we want to manipulate this data frame using Python, we need to convert it to a python object. We will convert the R data frame to a pandas data frame. The Python package pandas contains efficient implementations of data frame objects in python. import pandas.rpy.common as com df = com.load_data('rdf') print type(df) <class 'pandas.core.frame.DataFrame'> df.x = 2*df.x Here we have doubled each of the elements of the x vector in the data frame df. But df is a Python object, which we can convert back to an R data frame using pandas as: rdf = com.convert_to_r_dataframe(df) print type(rdf) <class 'rpy2.robjects.vectors.DataFrame'> Let's use the plotting machinery of R, which is the main purpose of studying rpy2: ro.r('plot(x,y)') Not only R data types, but rpy2 lets us import R packages as well (given that these packages are installed on R) and use them for analysis. Here we will build a linear model on x and y using the R package stats: from rpy2.robjects.packages import importr stats = importr('stats') base = importr('base') fit = stats.lm('y ~ x', data=rdf) print(base.summary(fit)) We get the following results: Residuals: 1 2 3 4 0 0 0 0 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0 0 NA NA x 2 0 Inf <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0 on 2 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: Inf on 1 and 2 DF, p-value: < 2.2e-16 R programmers will immediately recognize the output as coming from applying linear model function lm() on data. I'll end this discussion with an example using my favorite R package ggplot2. I have written a lot of posts on data visualization using ggplot2. The following example is borrowed from the official documentation of rpy2. import math, datetime import rpy2.robjects.lib.ggplot2 as ggplot2 import rpy2.robjects as ro from rpy2.robjects.packages import importr base = importr('base') datasets = importr('datasets') mtcars = datasets.data.fetch('mtcars')['mtcars'] pp = ggplot2.ggplot(mtcars) + ggplot2.aes_string(x='wt', y='mpg', col='factor(cyl)') + ggplot2.geom_point() + ggplot2.geom_smooth(ggplot2.aes_string(group = 'cyl'), method = 'lm') pp.plot() Author: Janu Verma is a researcher in the IBM T.J. Watson Research Center, New York. His research interests are in mathematics, machine learning, information visualization, computational biology, and healthcare analytics. He has held research positions at Cornell University, Kansas State University, Tata Institute of Fundamental Research, Indian Institute of Science, and the Indian Statistical Institute. He has written papers for IEEE Vis, KDD, International Conference on HealthCare Informatics, Computer Graphics and Applications, Nature Genetics, IEEE Sensors Journals and so on. His current focus is on the development of visual analytics systems for prediction and understanding. He advises start-ups and other companies on data science and machine learning in the Delhi-NCR area. He can be found at Here.
Read more
  • 0
  • 0
  • 7770