Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

6719 Articles
article-image-data-modeling-erwin
Packt
14 Oct 2009
3 min read
Save for later

Data Modeling with ERWin

Packt
14 Oct 2009
3 min read
Depending on your data modeling need and what you already have, there are two other ways to create a data model: Derive from an existing model and Reverse Engineer an existing database. Let’s start with creating a new model by clicking the Create model button. We’d like to create both logical and physical models, so select Logical/Physical. You can see in the Model Explorer that our new model gets Model_1 name, ERWin’s default name. Let’s rename our model to Packt Model. Confirm by looking at the Model Explorer that our model is renamed correctly. Next, we need to choose the ER notation. ERWin offers two notations: IDEF1X and IE. We’ll use IE for our logical and physical models. It’s a good practice during model development to save our work from time to time. Our model is still empty, so let’s next create a logical model in it. Logical Model Logical model in ERWin is basically ER model. An ER model consists of entities and their attributes, and their relationships. Let’s start by creating our first entity: CUSTOMER and its attributes. To add an entity, load (click) the Entity button on the toolbar, and drop it on the diagramming canvas by clicking your mouse on the canvas. Rename the entity’s default E/1 name to CUSTOMER by clicking on the name (E/1) and typing its new name over it. To add an attribute to an entity, right-click the entity and select Attributes. Click the New button. Type in our first attribute name (CUSTOMER_NO) over its name, and select Number as its data type, and then click OK. We want this attribute as the entity’s primary key, so check the Primary Key box. In the same way, add an attribute: CUSTOMER_NAME with a String data type. Our CUSTOMER entity now has two attributes as seen in its ER diagram. Notice that the CUSTOMER_NO attribute is at the key area, the upper part of the entity box; while the CUSTOMER_NAME is in the common attribute area, the bottom part. Similarly, add the rest of the entities and their attributes in our model. When you’re done, we’ll have six entities in our ER diagram. Our entities are not related yet. To rearrange the entities in the diagram, you can move around the entities by clicking and dragging them.
Read more
  • 0
  • 0
  • 7298

article-image-data-migration-scenarios-sap-business-one-application-part-1
Packt
20 Oct 2009
25 min read
Save for later

Data Migration Scenarios in SAP Business ONE Application- part 1

Packt
20 Oct 2009
25 min read
Just recently, I found myself in a data migration project that served as an eye-opener. Our team had to migrate a customer system that utilized Act! and Peachtree. Both systems are not very famous for having good accessibility to their data. In fact, Peachtree is a non-SQL database that does not enforce data consistency. Act! also uses a proprietary table system that is based on a non-SQL database. The general migration logic was rather straightforward. However, our team found that the migration and consolidation of data into the new system posed multiple challenges, not only on the technical front, but also for the customer when it came to verifying the data. We used the on-the-edge tool xFusion Studio for data migration. This tool allows migrating and synchronizing data by using simple and advanced SQL data messaging techniques. The xFusion Studio tool has a graphical representation of how the data flows from the source to the target. When I looked at one section of this graphical representation, I started humming the song Welcome to the Jungle. Take a look at the following screenshot and find out why Guns and Roses may have provided the soundtrack for this data migration project: What we learned from the above screenshot is quite obvious and I have dedicated this article to helping you overcome these potential issues. Keep it simple and focus on information rather than data. You know that just by having more data does not always mean you’ve added more information. Sometimes, it just means a data jungle has been created. Making the right decisions at key milestones during the migration can keep the project simple and guarantee the success. Your goal should be to consolidate the islands of data into a more efficient and consistent database that provides real-time information. What you will learn about data migration In order to accomplish the task of migrating data from different sources into SAP Business ONE application, a strategy must be designed that addresses the individual needs of the project at hand. The data migration strategy uses proven processes and templates. The data migration itself can be managed as a mini project depending on the complexity. During the course of this article, the following key topics will be covered. The goal is to help you make crucial decisions, which will keep a project simple and manageable: Position the data migration tasks in the project plan – We will start by positioning the data migration tasks in the project plan. I will further define the required tasks that you need to complete as a part of the data migration. Data types and scenarios – With the general project plan structure in place, it is time to cover the common terms related to data migration. I will introduce you to the main aspects, such as master data and transactional data, as well as the impact they have on the complexity of data migration. SAP tools available for migration – During the course of our case study, I will introduce you to the data migration tools that come with SAP. However, there are also more advanced tools for complex migrations. You will learn about the main player in this area and how to use it. Process of migration – To avoid problems and guarantee success, the data migration project must follow a proven procedure. We will update the project plan to include the procedure and will also use the process during our case study. Making decisions about what data to bring – I mentioned that it is important to focus on information versus data. With the knowledge of the right tools and procedures, it is a good time to summarize the primary known issues and explain how to tackle them. The project plan We are still progressing in Phase 2 – Analysis and Design. The data migration is positioned in the Solution Architecture section and is called Review Data Conversion Needs (Amount and Type of Data). A thorough evaluation of the data conversion needs will also cover the next task in the project plan called Review Integration Points with any 3rd Party Solution. As you can see, the data migration task stands as a small task in the project plan. But as mentioned earlier, it can wind up being a large project depending on the number and size of data sources that need to be migrated. To honor this, we will add some more details to this task. As the task name suggests, we must review data conversion needs and identify the amount and type of data. This simple task must be structured in phases, just like the entire project that is structured in phases. Therefore, data migration needs to go through the following phases to be successful: 1. Design - Identify all of the Data Sources 2. Extraction of data into Excel or SQL for review and consistency 3. Review of Data and Verification(Via Customer Feedback) 4. Load into SAP System and Verification Note that the validation process and the consequential load could be iterative processes. For example, if the validated data has many issues, it only makes sense to perform a load into SAP if an additional verification takes place before the load. You only want to load data into an SAP system for testing if you know the quality of the records going to be loaded is good. Therefore, new phases were added in the project plan (seen below). Please do this in your project too based on the actual complexity and the number of data sources you have. A thorough look at the tasks above will be provided when we talk about the process of migration. Before we do that, the basic terms related to data migration will be covered. Data sources—where is my data There is a great variety in the potential types data sources. We will now identify the most common sources and explain their key characteristics. However, if there is a source that is not mentioned here, you can still migrate the data easily by transitioning it into one of the following formats. Microsoft Excel and text data migration The most common format for data migration is Excel, or text-based files. Text-based files are formatted using commas or tabs as field separators. When a comma is used as a field separator, the file format is referred to as Comma Separated Values (CSV). Most of the migration templates and strategies are based on Excel files that have specific columns where you can manually enter data, or copy and paste larger chunks. Therefore, if there is any way for you to extract data from your current system and present it in Excel, you have already done a great deal of data migration work. Microsoft Access An Access database is essentially an Excel sheet on a larger scale with added data consistency capability. It is a good idea to consider extracting Access tables to Excel in order to prepare for data migration. SQL If you have very large sets of data, then instead of using Excel, we usually employ an SQL database. The database then has a set of tables instead of Excel sheets. Using SQL tables, we can create SQL statements that can verify data and analyze results sets. Please note that you can use any SQL database, such as Microsoft SQL Server, Oracle, IBM DB, and so on. SaaS (Netsuite, Salesforce) SaaS stands for Software as a Service. Essentially, it means you can use software functionality based on a subscription. However, you don't own the solution. All of the hardware and software is installed at the service center, so you don't need to worry about hardware and software maintenance. However, keep in mind that these services don't allow you to manage the service packs according to your requirements. You need to adjust your business to the schedule of the SaaS company. If you are migrating from a modern SaaS solution, such as Salesforce or Netsuite, you will probably know that the data is not at your site, but rather stored at your solution hosting provider. Getting the data out to migrate to another solution may be done by obtaining reports, which could then be saved in an Excel format. Other legacy data The term legacy data is often mentioned when evaluating larger old systems. Legacy data basically comprises a large set of data that a company is using on mostly obsolete systems. AS/400 or Mainframe The IBM AS/400 is a good example of a legacy data source. Experts who are capable of extracting data from these systems are highly sought after, and so the budget must be on a higher scale. AS/400 data can often be extracted into a text or an Excel format. However, the data may come without headings. The headings are usually documented in a file that describes the data. You need to make sure that you get the file definitions, without which the pure text files may be meaningless. In addition, the media format is worth considering. An older AS/400 system may utilize a backup tape format which is not available on your Intel server. Peachtree, QuickBooks, and Act! Another potential source for data migration may be a smaller PC-based system, such as Peachtree, QuickBooks, or Act!. These systems have a different data format, and are based on non-SQL databases. This means the data cannot be accessed via SQL. In order to extract data from those systems, the proprietary API must be used. For example, if Peachtree displays data in the applications forms, it uses the program logic to put the pieces together from different text files. Getting data out from these types of systems is difficult and sometimes impossible. It is recommended to employ the relevant API to access the data in a structured way. You may want to run reports and export the results to text or Excel. Data classification in SAP Business ONE There are two main groups of data that we will migrate to the SAP Business ONE application: master data and transaction data. Master data Master data is the basic information that SAP Business ONE uses to record transactions (for example, business partner information). In addition, information about your products, such as items, finished goods, and raw materials are considered master data. Master data should always be migrated if possible. It can easily be verified and structured in an Excel or SQL format. For example, the data could be displayed using Excel sheets. You can then quickly verify that the data is showing up in the correct columns. In addition, you can see if the data is broken down into its required components. For example, each Excel column should represent a target field in SAP Business ONE. You should avoid having a single column in Excel that provides data for more than one target in SAP Business ONE. Transaction data Transaction data are proposals, orders, invoices, deliveries, and other similar information that comprise a combination of master data to create a unique business document. Customers often will want to migrate historical transactions from older systems. However, the consequences of doing this may have a landslide effect. For example, inventory is valuated based on specific settings in the finance section of a system. If these settings are not identical in the new system, transactions may look different in the old and the new system. This makes the migration very risky as the data verification becomes difficult on the customer side. I recommend making historical transactions available via a reporting database. For example, often, sales history must be available when migrating data. You can create a reporting database that provides sales history information. The user can use this data via reports within the SAP Business ONE application. Therefore, closed transactions should be migrated via a reporting database . Closed transactions are all of the business-related activities that were fully completed in the old system. Open transactions, on the other hand, are all of the business-related activities that are currently not completed. It makes sense that the open transactions be migrated directly to SAP, and not to a history database because they will be completed within the new SAP system. As a result of the data migration, you would be able to access sales history information from within SAP by accessing a reporting database. Open transactions will be completed within SAP, and then consequently lead to new transactions in SAP. Create a history database for sales history and manually enter open transactions. SAP DI-API Now that we know the main data types for an SAP migration, and the most common sources, we can take a brief look at the way the data is inserted into the SAP system. Based on the SAP guidelines, you are not allowed to insert data directly in the underlying SQL tables. The reason for that is that it can cause inconsistencies. When SAP works with the database, multiple tables are often updated. If you manually update a table to insert data, there is a good chance that another table has a link that also requires updating. Therefore, unless you know the exact table structure for the data you are trying to update, don't mess with the SAP SQL tables. If you carefully read this and understand the table structure, you will now know that there may be situations where you decide to access the tables directly. If you decide to insert data directly into the SAP database tables, you run the risk of losing your warranty. Migration scenarios and key decisions Data migration not only takes place as a part of a new SAP implementation, but also if you have a running system and you want to import leads or a list of new items. Therefore, it is a good idea to learn about the scenarios that you may come across and be able to select the right migration and integration tools. As outlined before, data can be divided into two groups: master data and transaction data. It is important that you separate the two, and structure each data migration accordingly. Master data is an essential component for manifesting transactions. Therefore, even if you need to bring over transactional data, the master data must already be in place. Always start with the master data alongside a verification procedure, and then continue with the relevant transaction data. Let’s now briefly look at the most common situations where you may require the evaluation of potential data migration options. New company (start-up) In this setup, you may not have extensive amounts of existing data to migrate. However, you may want to bring over lead lists or lists of items. During the course of this article, we will import a list of leads into SAP using the Excel Import functionality. Many new companies require the capability to easily import data into SAP. As you already know by now, the import of leads and item information will be considered as importing master data. Working with this master data by entering sales orders and so forth, would constitute transaction data. Transaction data is considered closed if all of the relevant actions are performed. For example, a sales order is considered closed if the items are delivered, invoiced, and paid for. If the chain of events is not complete, the transaction is open. Islands of data scenario This is the classic situation for an SAP implementation. You will first need to identify the available data sources and their formats. Then, you select the master data you want to bring over. With multiple islands of data, an SAP master record may result from more than one source. A business partner record may come, in part, from an existing accounting system, such as QuickBooks or Peachtree. Whereas other parts may come from a CRM system, such as Act!. For example, the billing information may be retrieved from the finance system and the relevant lead and sales information, such as specific contacts and notes, may come from the CRM system. In such a case, you need to merge this information into a new consistent master record in SAP. For this situation, first manually put the pieces together. Once the manual process works, you can attempt to automate the process. Don't try to directly import all of the data. You should always establish an intermediary level that allows for data verification. Only then import the data into SAP. For example, if you have QuickBooks and Act!, first merge the information into Excel for verification, and then import it into SAP. If the amount of data is large, you can also establish an SQL database. In that case, the Excel sheets would be replaced by SQL tables. IBM legacy data migration The migration of IBM legacy data is potentially the most challenging because the IBM systems are not directly compatible with Windows-based systems. Therefore, almost naturally, you will establish a text-based, or an Excel-formatted, representation of the IBM data. You can then proceed with verifying the information. SQL migration The easiest migration type is obviously the one where all of the data is already structured and consistent. However, you will not always have documentation of the table structure where the data resides. In this case, you need to create queries against the SQL tables to verify the data. The queries can then be saved as views. The views you create should always represent a consistent set of information that you can migrate. For example, if you have one table with address information, and another table with customer ID fields, you can create a view that consolidates this information into a single consistent set. Process of migration for your project I briefly touched upon the most common data migration scenarios so you can get a feel for the process. As you can see, whatever the source of data is, we always attempt to create an intermediary platform that allows the data to be verified. This intermediary platform is most commonly Excel or an SQL database. The process of data migration has the following subtasks: Identify available data sources Structure data into master data and transaction data Establish an intermediary platform with Excel or SQL Verify data Match data columns with Excel templates Run migration based on templates and verify data Based on this procedure, I have added more detail to the project plan. As you can see in this example, based on the required level of detail, we can make adjustments to the project plan to address the requirements: SAP standard import features Let's take a look at the available data exchange features in SAP. SAP provides two main tools for data migration. The fi rst option is to use the available menu in the SAP Business ONE client interface to exchange data. The other option is to use the Data Transfer Workbench (DTW). Standard import/export features— walk-through You can reach the Import from Excel form via Administration | Data Import/Export. As you can see in the following screenshot on the right top section of the form, the type of import is a drop-down selection. The options are BP and Items. In the screenshot, we have selected BP, which allows business partner information to be imported. There are drop-down fields that you can select based on the data you want to import. However, keep in mind that certain fields are mandatory, such as the BP Code field, whereas others are optional. The fields you select are associated with a column as you can see here: If you want to find out if a field is mandatory or not, simply open SAP and attempt to enter the data directly in the relevant SAP form. For example, if you are trying to import business partner information, enter the fields you want to import and see if the record can be saved. If you are missing any mandatory fields, SAP will provide an error message. You can modify the data that you are planning to import based on that. When you click on the OK button in the Import from Excel form (seen above), the Excel sheet with all of the data needs to be selected. In the following screenshot, you can see how the Excel sheet in our example looks. For example, column A has all of the BP Codes. This is in line with the mapping of columns to fields that we can see on the Import from Excel form. Please note that the file we select must be in a .txt format. For this example, I used the Save As feature in Excel (seen in the following screenshot) to save the file in the Text MS-DOS (*.txt) format. I was then able to select the BP Migration.txt file. This is actually a good thing because it points to the fact that you can use any application that can save data in the .txt format as the data source. The following screenshot shows the Save As screen: I imported the file and a success message confirms that the records were imported into SAP: A subsequent check in SAP confirms that the BP records that I had in the text file are now available in SAP: In the example, we only used two records. It is recommended to start out with a limited number of records to verify that the import is working. Therefore, you may start by reducing your import file to five records. This has the advantage of the import not taking a long time and you can immediately verify the result. See the following screenshot: Sometimes, it is not clear what kind of information SAP expects when importing. For example, at first Lead, Customer, Vendor were used in Column C to indicate the type of BP that was to be imported. However, this resulted in an error message upon completion of the import. Therefore, system information was activated to check what information SAP requires for the BP Type representation. As you can see in the screenshot of the Excel sheet you get when you click on the OK button in the Import from Excel form, the BP Type information is indicated by only one letter using L, C, or V. In the example screenshot above, you can clearly see L in the lower left section. The same thing is done for Country in the Addresses section. You can try that by navigating to Administration | Sales | Countries, and then hovering over the country you will be importing. In my example, USA was internally represented by SAP as US. It is a minor issue. However, when importing data, all of these issues need to be addressed. Please note that the file you are trying to import should not be open in Excel at the same time, as this may trigger an error. The Excel or text file does not have a header with a description of the data. Standard import/export features for your own project SAP’s standard import functionality for business partners and items is very straightforward. For your own project, you can prepare an Excel sheet for business partners and items. If you need to import BP or item information from another system, you can get this done quickly. If you get an error during the import process, try to manually enter the data in SAP. In addition, you can use the System Information feature to identify how SAP stores information in the database. I recommend you first create an Excel sheet with a maximum of two records to see if the basic information and data format is correct. Once you have this running, you can add all of the data you want to import. Overall, this functionality is a quick way to get your own data into the system. This feature can also be used in case you regularly receive address information. For example, if you have salespeople visiting trade fairs, you can provide them with the Excel sheet that you may have prepared for BP import. The salespeople can directly add their information there. Once they return from the trade fair with the Excel fi les, you can easily import the information into SAP and schedule follow-up activities using the Opportunity Management System. The item import is useful if you work with a vendor who updates his or her price lists and item information on a monthly basis. You can prepare an Excel template where the item information will regularly be entered and you can easily import the updates into SAP. Data Migration Workbench (DTW) The SAP standard import/export features are straightforward, but may not address the full complexity of the data that you need to import. For this situation, you may want to evaluate the SAP Data Migration Workbench (DTW). The functionality of this tool provides a greater level of detail to address the potential data structures that you want to import. To understand the basic concept of the DTW, it is a good idea to look at the different master data sections in SAP as business objects. A business object groups related information together. For example, BP information can have much more detail than what was previously shown in the standard import. The DTW templates and business objects To better understand the business object metaphor, you need to navigate to the DTW directory and evaluate the Templates folder. The templates are organized by business objects. The oBusinessPartners business object is represented by the folder with the same name (seen below). In this folder, you can find Excel template files that can be used to provide information for this type of business object. The following objects are available as Excel templates: BPAccountReceivables BPAddresses BPBankAccounts BPPaymentDates BPPaymentMethods BPWithholdingTax BusinessPartners ContactEmployees Please notice that these templates are Excel .xlt files, which is the Excel template extension. It is a good idea to browse through the list of templates and see the relevant templates. In a nutshell, you essentially add your own data to the templates and use DTW to import the data. Connecting to DTW In order to work with DTW, you need to connect to your SAP system using the DTW interface. The following screenshot shows the parameters I used to connect to the Lemonade Stand database: Once you are connected, a wizard-type interface walks you through the required steps to get started. Look at the next screenshot: The DTW examples and templates There is also an example folder in the DTW installation location on your system. This example folder has information about how to add information to your Excel templates. The following screenshot shows an example for business partner migration. You can see that the Excel template does have a header line on top that explains the content in the particular column. The actual template files also have comments in the header fi le, which provide information about the data format expected, such as String, Date, and so on. See the example in this screenshot: The actual template is empty and you need to add your information as shown here:   DTW for your own project If you realize that the basic import features in SAP are not sufficient, and your requirements are more challenging, evaluate DTW. Think of the data you want to import as business objects where information is logically grouped. If you are able to group your data together, you can modify the Excel templates with your own information. The DTW example folder provides working examples that you can use to get started. Please note that you should establish a test database before you start importing data this way. This is because once new data arrives in SAP, you need to verify the results based on the procedure discussed earlier. In addition, be prepared to fine-tune the import. Often, an import and data verification process takes four attempts of data importing and verification. Summary In this article, we covered the tasks related to data migration. This also included some practical examples for simple data imports related to business partner information and items. In addition, more advanced topics were covered by introducing the SAP DTW (Data Transfer Workbench) and the related aspects to get you started. During the course of this article, we positioned the data migration task in the project plan. The project plan was then fine-tuned with more detail to give some justice to the potential complexity of a data migration project. The data migration tasks established a process, from design to data mapping and verification of the data. Notably, the establishment of an intermediary data platform was recommended for your projects. This will help you verify data at each step of the migration. The key message of keeping it simple will be the basis for every migration project. The data verification task ensures simplicity and the quality of your data. If you have read this article you may be interested to view : Competitive Service and Contract Management in SAP Business ONE Implementation: Part 1 Competitive Service and Contract Management in SAP Business ONE Implementation: Part 2 Data Migration Scenarios in SAP Business ONE Application- part 2
Read more
  • 0
  • 0
  • 7274

article-image-5-pitfalls-of-react-hooks-you-should-avoid-kent-c-dodds
Sugandha Lahoti
09 Sep 2019
7 min read
Save for later

5 pitfalls of React Hooks you should avoid - Kent C. Dodds

Sugandha Lahoti
09 Sep 2019
7 min read
The React community first introduced Hooks, back in October 2018 as a JavaScript function to allow using React without classes. The idea was simple - With the help of Hooks, you will be able to “hook into” or use React state and other React features from function components. In February, React 16.8 released with the stable implementation of Hooks. As much as Hooks are popular, there are certain pitfalls which developers should avoid when they are learning and adopting React Hooks. In his talk, “React Hook Pitfalls” at React Rally 2019 (August 22-23 2019), Kent C. Dodds talks about 5 common pitfalls of React Hooks and how to avoid/fix them. Kent is a world renowned speaker, maintainer and contributor of hundreds of popular npm packages. He's actively involved in the open source community of React and general JavaScript ecosystem. He’s also the creator of react-testing-library which provides simple and complete React DOM testing utilities that encourage good testing practices. Tl;dr Problem: Starting without a good foundation Solution: Read the React Hooks docs and the FAQ Problem: Not using (or ignoring) the ESLint plugin Solution: Install, use, and follow the ESLint plugin Problem: Thinking in Lifecycles Solution: Don't think about Lifecycles, think about synchronizing side effects to state Problem: Overthinking performance Solution: React is fast by default and so research before applying performance optimizations pre-maturely Problem: Overthinking the testing of React hooks Solution: Avoid testing ‘implementation details’ of the component. Pitfall #1 Starting without a good foundation Often React developers begin coding without reading the documentation and that leads to a number of issues and small problems. Kent recommends developers to start by reading the React Hooks documentation and the FAQ section thoroughly. He jokingly adds, “Once you read the frequently asked questions, you can ask the infrequently asked questions. And then maybe those will get in the docs, too. In fact, you can make a pull request and put it in yourself.” Pitfall #2: Not using or (ignoring) the ESLint plugin The ESLint plugin is the official plugin built by the React team. It has two rules: "rules of hooks" and "exhaustive deps." The default recommended configuration of these rules is to set "rules of hooks" to an error, and the "exhaustive deps" to a warning. The linter plugin enforces these rules automatically. The two “Rules of Hooks” are: Don’t call Hooks inside loops, conditions, or nested functions Instead, always use Hooks at the top level of your React function. By following this rule, you ensure that Hooks are called in the same order each time a component renders. Only Call Hooks from React Functions Don’t call Hooks from regular JavaScript functions. Instead, you can either call Hooks from React function components or call them from custom Hooks. Kent agrees that sometimes the rule is incapable of performing static analysis on your code properly due to limitations of ESLint. “I believe”, he says, “ this is why it's recommended to set the exhaustive deps rule to "warn" instead of "error." When this happens, the plugin will tell you so in the warning. He recommends  developers should restructure their code to avoid that warning. The solution Kent offers for this pitfall is to Install, follow, and use the ESLint plugin. The ESLint plugin, he says will not only catch easily missable bugs, but it will also teach you things about your code and hooks in the process. Pitfall #3: Thinking in Lifecycles In Hooks the components are declarative. Kent says that this feature allows you to stop thinking about "when things should happen in the lifecycle of the component" (which doesn't matter that much) and more about "when things should happen in relation to state changes" (which matters much more.) With React Hooks, he adds, you're not thinking about component Lifecycles, instead you're thinking about synchronizing the state of the side-effects with the state of the application. This idea is difficult for React developers to grasp initially, however once you do it, he adds, you will naturally experience fewer bugs in your apps thanks to the design of the API. https://twitter.com/ryanflorence/status/1125041041063665666 Solution: Think about synchronizing side effects to state, rather than lifecycle methods. Pitfall #4: Overthinking performance Kent says that even though it's really important to be considerate of performance, you should also think about your code complexity. If your code is complex, you can't give people the great features they're looking for, as you will be spending all your time, dealing with the complexity of your code. He adds, "unnecessary re-renders" are not necessarily bad for performance. Just because a component re-renders, doesn't mean the DOM will get updated (updating the DOM can be slow). React does a great job at optimizing itself; it’s fast by default. For this, he mentions. “If your app's unnecessary re-renders are causing your app to be slow, first investigate why renders are slow. If rendering your app is so slow that a few extra re-renders produces a noticeable slow-down, then you'll likely still have performance problems when you hit "necessary re-renders." Once you fix what's making the render slow, you may find that unnecessary re-renders aren't causing problems for you anymore.” If still unnecessary re-renders are causing you performance problems, then you can unpack the built-in performance optimization APIs like React.memo, React.useMemo, and React.useCallback. More information on this on Kent’s blogpost on useMemo and useCallback. Solution: React is fast by default and so research before applying performance optimizations pre-maturely; profile your app and then optimize it. Pitfall #5: Overthinking the testing of React Hooks Kent says, that people are often concerned that they need to rewrite their tests along with all of their components when they refactor to hooks from class components. He explains, “Whether your component is implemented via Hooks or as a class, it is an implementation detail of the component. Therefore, if your test is written in such a way that reveals that, then refactoring your component to hooks will naturally cause your test to break.” He adds, “But the end-user doesn't care about whether your components are written with hooks or classes. They just care about being able to interact with what those components render to the screen. So if your tests interact with what's being rendered, then it doesn't matter how that stuff gets rendered to the screen, it'll all work whether you're using classes or hooks.” So, to avoid this pitfall, Kent’s recommendation is that you write tests that will work irrespective of whether you're using classes or hook. Before you upgrade to Hooks, start writing your tests free of implementation detail and your refactored hooks can be validated by the tests that you've written for your classes. The more your tests resemble the way your software is used, the more confidence they can give you. In review: Read the docs and the FAQ. Install, use and follow the ESLint plugin. Think about synchronizing side effects to state. Profile your app and then optimize it. Avoid testing implementation details. Watch the full talk on YouTube. https://www.youtube.com/watch?v=VIRcX2X7EUk Read more about React #Reactgate forces React leaders to confront community’s toxic culture head on React.js: why you should learn the front end JavaScript library and how to get started Ionic React RC is now out!
Read more
  • 0
  • 0
  • 7274

article-image-visualizing-my-social-graph-d3js
Packt
24 Oct 2013
7 min read
Save for later

Visualizing my Social Graph with d3.js

Packt
24 Oct 2013
7 min read
(For more resources related to this topic, see here.) The Social Networks Analysis Social Networks Analysis (SNA) is not new, sociologists have been using it for a long time to study human relationships (sociometry), to find communities and to simulate how information or a disease is spread in a population. With the rise of social networking sites such as Facebook, Twitter, LinkedIn, and so on. The acquisition of large amounts of social network data is easier. We can use SNA to get insight about customer behavior or unknown communities. It is important to say that this is not a trivial task and we will come across sparse data and a lot of noise (meaningless data). We need to understand how to distinguish between false correlation and causation. A good start is by knowing our graph through visualization and statistical analysis. Social networking sites bring us the opportunities to ask questions that otherwise are too hard to approach, because polling enough people is time-consuming and expensive. In this article, we will obtain our social network's graph from Facebook (FB) website in order to visualize the relationships between our friends. Finally we will create an interactive visualization of our graph using D3.js. Getting ready The easiest method to get our friends list is by using a third-party application. Netvizz is a Facebook app developed by Bernhard Rieder, which allows exporting social graph data to gdf and tab formats. Netvizz may export information about our friends such as gender, age, locale, posts, and likes. In order to get our social graph from Netvizz we need to access the link below and giving access to your Facebook profile. https://apps.facebook.com/netvizz/ As is shown in the following screenshot, we will create a gdf file from our personal friend network by clicking on the link named here in the Step 2. Then we will download the GDF (Graph Modeling Language) file. Netvizz will give us the number of nodes and edges (links); finally we will click on the gdf file link, as we can see in the following screenshot: The output file myFacebookNet.gdf will look like this: nodedef>name VARCHAR,label VARCHAR,gender VARCHAR,locale VARCHAR,agerankINT23917067,Jorge,male,en_US,10623931909,Haruna,female,en_US,10535702006,Joseph,male,en_US,104503839109,Damian,male,en_US,103532735006,Isaac,male,es_LA,102. . .edgedef>node1 VARCHAR,node2 VARCHAR23917067,3570200623917067,62939583723917067,74734348223917067,75560507523917067,1186286815. . . In the following screenshot we may see the visualization of the graph (106 nodes and 279 links). The nodes represent my friends and the links represent how my friends are connected between them. Transforming GDF to JSON In order to work with the graph in the web with d3.js, we need to transform our gdf file to json format. Firstly, we need to import the libraries numpy and json. import numpy as npimport json The numpy function, genfromtxt, will obtain only the ID and name from the nodes.csv file using the usecols attribute in the 'object' format. nodes = np.genfromtxt("nodes.csv",dtype='object',delimiter=',',skip_header=1,usecols=(0,1)) Then, the numpy function, genfromtxt, will obtain links with the source node and target node from the links.csv file using the usecols attribute in the 'object' format. links = np.genfromtxt("links.csv",dtype='object',delimiter=',',skip_header=1,usecols=(0,1)) The JSON format used in the D3.js Force Layout graph implemented in this article requires transforming the ID (for example, 100001448673085) into a numerical position in the list of nodes. Then, we need to look for each appearance of the ID in the links and replace them by their position in the list of nodes. for n in range(len(nodes)):for ls in range(len(links)):if nodes[n][0] == links[ls][0]:links[ls][0] = nif nodes[n][0] == links[ls][1]:links[ls][1] = n Now, we need to create a dictionary "data" to store the JSON file. data ={} Next, we need to create a list of nodes with the names of the friends in the format as follows: "nodes": [{"name": "X"},{"name": "Y"},. . .] and add it to thedata dictionary.lst = []for x in nodes:d = {}d["name"] = str(x[1]).replace("b'","").replace("'","")lst.append(d)data["nodes"] = lst Now, we need to create a list of links with the source and target in the format as follows: "links": [{"source": 0, "target": 2},{"source": 1, "target":2},. . .] and add it to the data dictionary.lnks = []for ls in links:d = {}d["source"] = ls[0]d["target"] = ls[1]lnks.append(d)data["links"] = lnks Finally, we need to create the file, newJson.json, and write the data dictionary in the file with the function dumps of the json library. with open("newJson.json","w") as f:f.write(json.dumps(data)) The file newJson.json will look as follows: {"nodes": [{"name": "Jorge"},{"name": "Haruna"},{"name": "Joseph"},{"name": "Damian"},{"name": "Isaac"},. . .],"links": [{"source": 0, "target": 2},{"source": 0, "target": 12},{"source": 0, "target": 20},{"source": 0, "target": 23},{"source": 0, "target": 31},. . .]} Graph visualization with D3.js D3.js provides us with the d3.layout.force() function that use the Force Atlas layout algorithm and help us to visualize our graph. First, we need to define the CSS style for the nodes, links, and node labels. <style>.link {fill: none;stroke: #666;stroke-width: 1.5px;}.node circle{fill: steelblue;stroke: #fff;stroke-width: 1.5px;}.node text{pointer-events: none;font: 10px sans-serif;}</style> Then, we need to refer the d3js library. <script src = "http://d3js.org/d3.v3.min.js"></script> Then, we need to define the width and height parameters for the svg container and include into the body tag. var width = 1100,height = 800var svg = d3.select("body").append("svg").attr("width", width).attr("height", height); Now, we define the properties of the force layout such as gravity, distance, and size. var force = d3.layout.force().gravity(.05).distance(150).charge(-100).size([width, height]); Then, we need to acquire the data of the graph using the JSON format. We will configure the parameters for nodes and links. d3.json("newJson.json", function(error, json) {force.nodes(json.nodes).links(json.links).start(); For a complete reference about the d3js Force Layout implementation, visit the link https://github.com/mbostock/d3/wiki/Force-Layout. Then, we define the links as lines from the json data. var link = svg.selectAll(".link").data(json.links).enter().append("line").attr("class", "link");var node = svg.selectAll(".node").data(json.nodes).enter().append("g").attr("class", "node").call(force.drag); Now, we define the node as circles of size 6 and include the labels of each node. node.append("circle").attr("r", 6);node.append("text").attr("dx", 12).attr("dy", ".35em").text(function(d) { return d.name }); Finally, with the function, tick, run step-by-step the force layout simulation. force.on("tick", function(){link.attr("x1", function(d) { return d.source.x; }).attr("y1", function(d) { return d.source.y; }).attr("x2", function(d) { return d.target.x; }).attr("y2", function(d) { return d.target.y; });node.attr("transform", function(d){return "translate(" + d.x + "," + d.y + ")";})});});</script> In the image below we can see the result of the visualization. In order to run the visualization we just need to open a Command Terminal and run the following Python command or any other web server. >>python –m http.server 8000 Then you just need to open a web browser and type the direction http://localhost:8000/ForceGraph.html. In the HTML page we can see our Facebook graph with a gravity effect and we can interactively drag-and-drop the nodes. All the codes and datasets of this article may be found in the author github repository in the link below.https://github.com/hmcuesta/PDA_Book/tree/master/Chapter10 Summary In this article we developed our own social graph visualization tool with D3js, transforming the data obtained from Netvizz with GDF format into JSON. Resources for Article: Further resources on this subject: GNU Octave: Data Analysis Examples [Article] Securing data at the cell level (Intermediate) [Article] Analyzing network forensic data (Become an expert) [Article]
Read more
  • 0
  • 0
  • 7273

article-image-microsoft-dynamics-nav-2009-designing-forms
Packt
25 Oct 2010
8 min read
Save for later

Microsoft Dynamics NAV 2009: Designing Forms

Packt
25 Oct 2010
8 min read
Forms are a predominant visual element in Dynamics NAV. There are 937 tables in the base NAV software and 1,820 forms that display information from those tables. Apart from learning how to create a form using the wizard, this article will not discuss the basic elements of form design. That information can be found in the C/SIDE Reference Guide and Development Coursework from Microsoft. If you have not designed a form before, it is highly recommended that you acquaint yourself with forms first. With NAV 2009, Microsoft released the RoleTailored client, or RTC. This was a huge change from the existing NAV product. In this release, Microsoft introduced the RTC as a second client or interface in addition to what is called the Classic client, or more traditional interface. While the future of NAV is definitely with the RTC, it is still important to understand what forms are and how they work, in order to support customers who might not upgrade to the latest version of the product. Obtaining input without a form Sometimes you don't want to use an entire form to get user input. Dialog boxes are not a substitute for forms, but they work just fine for quick input. How to do it... Create a new codeunit from Object Designer. Add the following global variables: Add the following code to the OnRun trigger of the codeunit: Window.OPEN('Customer No: #1####################'); Window.INPUT(1, CustomerNo); Window.CLOSE; IF Customer.GET(CustomerNo) THEN MESSAGE('Customer Name: %1', Customer.Name) ELSE MESSAGE('No customer found!); Save and close the codeunit. How it works... The first line of code opens an input dialog window that looks like one shown in the following screenshot: The next line lets the user input a value and stores it in the CustomerNo variable. The dialog window then closes and the result can be used later in code. There's more... As you can tell from the input window, dialogs are much weaker than forms when it comes to functionality. You can't do lookups, data validation, or anything other than basic text input. From a licensing aspect, forms are one of the cheapest objects to buy. They also don't match the look and feel for the rest of the system. For these reasons it is almost always better to use a form than an input dialog, but it is important to know what you can do using dialogs. Using the Form Generation Wizard You can always create a form manually, but using the Form Generation Wizard is a quick and painless way to create the skeleton. How to do it... With the form selected in Object Designer click the New button. Choose the Customer table. Select Create a form using a wizard. Select Tabular-Type Form. Click OK. Use the arrow buttons between the two lists to add the No. and Name fields. Click on Finish. How it works... The Form Generation Wizard allows you to tell the system what fields you want on the form and the format or order in which you want them to appear. NAV will then automatically place the fields on the form for you. There is no manual positioning of labels or textboxes; no creating tabs or list boxes. It is all done automatically. There's more... The wizard will only create a basic form for you. If you need to create special functions or do any specific data validation, you will have to code that manually. A wizard is only designed to get you started, not to do anything advanced. Changing text appearance A great way to improve the user experience is to change the way text appears on the screen. This recipe will explore several options that are available to you. Getting ready Design the Customer List form and save it as a new object. How to do it... Design the copy of the Customer List form. Create a function named GetColor that returns an integer. Add the following code to the function: IF "Location Code" = 'BLUE' THEN EXIT(16711680) ELSE IF "Location Code" = 'GREEN' THEN EXIT(65280) ELSE IF "Location Code" = 'RED' THEN EXIT(255) ELSE IF "Location Code" = 'YELLOW' THEN EXIT(65535) Create a function named GetBold that returns a boolean value. Add the following code to the function: EXIT("Credit Limit (LCY)" > 1000); In the OnFormat trigger for the name column, add the following code: CurrForm.Name.UPDATEFORECOLOR(GetColor); CurrForm.Name.UPDATEFONTBOLD(GetBold); Save and close the form. How it works... The trigger that controls the appearance of text is the OnFormat trigger. The first function we use is UPDATEFORECOLOR. This method is found on every text field in a form. It takes one parameter—the color we want the text to be. In our example, we pass a function as the parameter and that function returns the color we should use. UPDATEFONTBOLD works in a similar way. It takes a boolean parameter that tells the form whether or not to emphasize the text. The resulting form will look similar to the one shown in the following screenshot: There's more... The look and feel of a system is important for user satisfaction. Finding ways to make the information easier to understand, such as displaying the text in the same color as the warehouse location, can improve user understanding and decrease the time it takes to look up information. That said, don't go overboard. Having a form with multiple colors that have no direct relation to the data can be confusing. You don't want to the user to have a "cheat sheet" of what everything means. If it takes longer than a couple of minutes to explain what certain characteristics mean and you can't remember them an hour later, then you probably have gone too far. It also makes your upgrade-time to the RoleTailored client longer because display colors only have limited support. Preventing editable lookup forms You may want users to only add records when running a form from a setup location. This example will show you how to prevent users from adding or modifying values when only trying to look up a record. Getting ready This example will use the Salesperson/Purchasers form (14). How to do it... Design the Salesperson/Purchasers form from Object Designer. In the OnOpen trigger for the form, add the following code: IF CurrForm.LOOKUPMODE THEN CurrForm.EDITABLE := FALSE; Save and close the form. How it works... The code here is pretty self-explanatory. If the form is in lookup mode, it will not be editable. There's more... The Lookup mode is a special mode in which forms can run. Essentially, when in lookup mode, the OK and Cancel buttons are displayed; when not in lookup mode, they are hidden. When using these buttons you can retrieve the selected value from the form. It is often a good idea to make forms uneditable in lookup mode, although you will find many forms in base NAV where this is not the case. When the purpose of running a form is only to retrieve a value, it is a good idea to make sure that the form is not editable to make sure those values are not accidentally changed. Adding an editable field to a non-editable form Have you ever needed to make a form uneditable rather than just one field? This recipe will show you a quick and easy way to do it. Getting ready Create a list form based on the Customer table that displays the number and name of the customer. The Editable property of the form should be set to No. How to do it... View the code for the Name column in the list form. In the OnActivate trigger, add the following code: CurrForm.EDITABLE := TRUE; In the OnDeactivate trigger add the following code: CurrForm.EDITABLE := FALSE; Save and close the form. How it works... When you click on a textbox its OnActivate trigger is executed. In our form, we have told the system to override the default Editable property when we click on the textbox. We set it to true so that the field becomes editable. In fact, the entire form becomes editable. We must make the entire form editable because that overrides the editable property of the controls on the form. But when we click-off or tab-off of the field the OnDeactivate trigger fires. We then reset the form back to uneditable. Whenever the field is activated you can edit it, otherwise you cannot edit anything. In the RoleTailored client there is no OnActivate or OnDeactivate trigger. You will have to do it the hard way, that is, by setting the Editable property on every field. Further resources on this subject: Microsoft Dynamics NAV 2009: Creating a Matrix Form Microsoft Dynamics NAV 2009: Creating a Wizard-style Form
Read more
  • 0
  • 0
  • 7273

article-image-active-directory-design-principles-part-1-2
Packt
30 Sep 2009
10 min read
Save for later

Active Directory Design Principles: Part 1

Packt
30 Sep 2009
10 min read
The one thing to keep in mind is that when designing your Active Directory, never go at it from a, present needs, point of view. Technology and systems are changing so fast nowadays that you have to design with the most open and future-proof concept that you can think of. It was only a few years back when Windows 95 revolutionized the personal computing platform by pushing 32-bit addressing to the mainstream. Before that it was 14 years where everyone ran 16-bit programs on 16- or 32-bit processors. In April 2003, Microsoft launched the 64-bit version of its Server Operating System and in April 2005, the 64-bit version of its Desktop Operating System, Windows XP. These are less then a decade after the big Windows 95 push. Active Directory was introduced with Windows 2000, which is only Five years after Windows NT 4's "enhanced omain structure". The trend is that new features and new technologies are constantly being invented and introduced. While there are quite a few companies that have a proper open and flexible design in their Active Directory structures, there are a lot more organizations that see Active Directory as the answer to all their prayers and just keep adding things to it and to the schema. To read more about the technical aspects of the AD schema, please refer to http://msdn2.microsoft.com/en-us/library/ms675085.aspx. Software companies nowadays are pushing "Active Directory compatible" features more and more, and problems can arise when these packages need complete domain administrator rights in order to function (or modify the Active Directories' inner workings), which they usually do not advertise up-front. The need for proper planning and design of the AD is extremely high in order to ensure that your DR strategies will work and are easy to implement. A properly designed AD is extremely resilient and still very flexible. Whenever you intend to add new services, make sure that you test and re-test the things that are necessary for the service to function properly. As the IT department, you are responsible to keep the systems going and ensure business continuity. Active Directory elements When designing an Active Directory, you need to be completely clear of what each element or part actually means and how it fits into the overall design. The old saying goes: You can't see the forest because of the trees, and you can apply this to Active Directory as well. It is all about trees and forests and leaves and branches. The Active Directory forest The forest, in terms of Active Directory, basically means every domain, organizational unit, and any other object stored within its database. The forest is the absolute top level of your Active Directory infrastructure. Of course, you can have more than one forest in a company, which actually represent security boundaries, and can therefore improve security between different business units or companies belonging to a single organization. The point behind the forest is that you have all your domains and domain tree within your organization contained within it. It is designed so that you can have transitive trust-links between all of the trees within one forest. To read more about the technical layout of AD, please read Domains and Forests Technical Reference at: http://technet2.microsoft.com/windowsserver/en/library/16a2bdb3-d0a3-4435-95fd-50116e300c571033.mspx. To visualize a forest with its parts, please see the following image. The Active Directory tree A tree in Active Directory refers to a domain and all of its objects that adhere to a single DNS name. For example, a tree of nailcorp.com would contain all other domains that end with "nailcorp.com". So, americas.nailcorp.com, europe.nailcorp.com, and asia.nailcorp.com all belong to the Active Directory tree of nailcorp.com. You cannot separate these unless you create a whole new forest for a sub-domain. Organizational Units and Leaf Objects In Active Directory, Organizational Units (OU), which are also called Containers, and Leaf Objects, which are non-containing objects such as computer accounts and user accounts, are directly related and even though you could have objects that do not belong to an OU, it isn't recommended and isn't really feasible. Organizational Units are comparable to folders in a filing cabinet, and objects are the files. You can move files between different folders, and classifications or properties are applied to the files within a folder. For example, if you move a file into a folder classified "Top Secret", the file will automatically fall under that classification. The same applies to objects within an OU, all properties or rules that apply to the OU apply to the objects within it. OUs, however, are mostly useful from an administrative point of view, not from an user's point of view. If you think of how your files are organized, for example, on your computer, they are most likely to be organized into different folders. You can go ahead and set different folder settings, such as permissions, and it will affect all of the files within that folder, but anything outside that folder won't have its permission affected. It is exactly the same principle with OUs. Any OU that will be created within an OU will contain all of the policy settings of the parent unless you change them. An object can also only belong to a single OU, just as a single file can only be contained within a single folder. Leaf objects in Active Directory can be users, contacts, and computers. Or in short, any object that cannot contain other objects. They are called leaf objects because they are like leaves on a tree. And, as you can guess, they are the "lowest" class of objects within Active Directory. But if you now look at the forest-tree-branch-leaf concept, it is starting to make sense. You can access the OUs and other objects through the Microsoft Management Console (MMC) or through the Users and Computers tool in the Administrative Tools. This second option actually just invokes the MMC with the correct view and is a lot quicker, as seen in the following screenshot: Active Directory Sites The Sites and Services MMC snap-in is a utility that a lot of Windows administrators, particularly in smaller organizations, completely overlook. This part of Active Directory, however, is one of the most crucial parts to understand and implement correctly. If you have several locations in your organization, you need to know about Active Directory Sites. Sites give you a very unique and well-designed approach to separate specific locations within your Active Directory. As the principle of an Active Directory domain is global-meaning that it is meant to be the same anywhere-it could present a problem for users who move from office to office, or for offices with network connections that are slow. Active Directory sites allow you to specify the IP address spaces or subnets used within your organization and, therefore, bring the structure of your network into Active Directory. The usefulness of having properly organized and maintained Sites becomes more evident when you consider that any machine within an address space will use that Sites' DC to authenticate. This is a great feature of AD and reduces unnecessary traffic. However, it requires having Sites and subnets properly updated and maintained. This is also particularly useful for defining different replication schedules for different locations within the same domain, and also to support users who travel. Once they log on through the other location, they are assigned an IP address from that network. The Windows locator service will then look up which DC is the nearest one, and the user won't log on all the way to their usual DC (to read more about how the locator service works please refer to http://support.microsoft.com/kb/314861). This saves bandwidth and speeds up the authentication process. Bandwidth nowadays is cheap, especially in developed countries such as the USA or most parts of Western Europe. But just because it is cheap in some parts, does not mean it is cheap in other parts of your organization. If you are primarily located within developed countries, but your then company decides to open 10 or 20 small sales offices within not-so-developed countries, where bandwidth is expensive, then you really need to start using AD sites. Of course, the problem then is that because you haven't used AD sites before, you need to make the appropriate changes to your infrastructure to accommodate them, and train staff appropriately in order to be able to implement, support, and manage them. In this example, the argument might be brought up that each of these small branch offices has a local DC that also functions as a File and Print server where the local employees collaborate. This is great, but what about replication to and from your Hub site? Which is the data centre that hosts a critical part of your Active Directory backend? If changes to your AD are fairly frequent, for example, adding and removing users on a regular basis, then the Active Directory will replicate—if the Site links are properly configured—without compression every 15 minutes. Of course, depending on the size of your organization, this can be quite a strain on the link you have from that office. If the people at that office receive email and browse the Web over the same link, network performance will degrade significantly for users and cause unnecessary inconvenience. To see what Sites look like in the Active Directory Sites and Services console, see the following screenshot: Group Policy Objects Group Policy Objects in Active Directory are a set of defined rules for settings about the user environment or the operating environment for a particular PC. They are treated as standalone objects because they can be linked to different OUs. This gives you the flexibility of creating one set of rules and applying it to different OUs in different OU structures, making settings deployment much easier and administratively quick. The policy settings are quite extensive and if you want to get your hands dirty, you can create your own policy templates, giving you even more control over the machines and application settings located in your domain. There are templates available for many settings, ready to use. The templates for these settings are called ADM templates and there are quite a few already included in the Windows 2003 installation. Some applications, such as Microsoft Office 2007, also provide ADM templates that can be loaded and modified (see http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=92d8519a-e143-4aee-8f7a-e4bbaeba13e7 for Office 2007 ADM templates). Using ADM templates, you do not need to write anything by yourself, and so it is a quicker way apply to GPO settings. The following screenshot shows Office 2007 ADM templates loaded in the Group Policy Editor.
Read more
  • 0
  • 1
  • 7270
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $15.99/month. Cancel anytime
article-image-creating-a-basic-julia-project-for-loading-and-saving-data-tutorial
Prasad Ramesh
09 Feb 2019
11 min read
Save for later

Creating a basic Julia project for loading and saving data [Tutorial]

Prasad Ramesh
09 Feb 2019
11 min read
In this article, we take a look at the common Iris dataset using simple statistical methods. Then we create a simple Julia project to load and save data from the Iris dataset. This article is an excerpt from a book written by Adrian Salceanu titled Julia Programming Projects. In this book, you will develop and run a web app using Julia and the HTTP package among other things. To start, we'll load, the Iris flowers dataset, from the RDatasets package and we'll manipulate it using standard data analysis functions. Then we'll look more closely at the data by employing common visualization techniques. And finally, we'll see how to persist and (re)load our data. But, in order to do that, first, we need to take a look at some of the language's most important building blocks. Here are the external packages used in this tutorial and their specific versions: [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] In order to install a specific version of a package you need to run: pkg> add [email protected] For example: pkg> add [email protected] Alternatively, you can install all the used packages by downloading the Project.toml file using pkg> instantiate as follows: julia> download("https://raw.githubusercontent.com/PacktPublishing/Julia-Programming-Projects/master/Chapter02/Project.toml", "Project.toml") pkg> activate . pkg> instantiate Using simple statistics to better understand our data Now that it's clear how the data is structured and what is contained in the collection, we can get a better understanding by looking at some basic stats. To get us started, let's invoke the describe function: julia> describe(iris) The output is as follows: This function summarizes the columns of the iris DataFrame. If the columns contain numerical data (such as SepalLength), it will compute the minimum, median, mean, and maximum. The number of missing and unique values is also included. The last column reports the type of data stored in the row. A few other stats are available, including the 25th and the 75th percentile, and the first and the last values. We can ask for them by passing an extra stats argument, in the form of an array of symbols: julia> describe(iris, stats=[:q25, :q75, :first, :last]) The output is as follows: Any combination of stats labels is accepted. These are all the options—:mean, :std, :min, :q25, :median, :q75, :max, :eltype, :nunique, :first, :last, and :nmissing. In order to get all the stats, the special :all value is accepted: julia> describe(iris, stats=:all) The output is as follows: We can also compute these individually by using Julia's Statistics package. For example, to calculate the mean of the SepalLength column, we'll execute the following: julia> using Statistics julia> mean(iris[:SepalLength]) 5.843333333333334 In this example, we use iris[:SepalLength] to select the whole column. The result, not at all surprisingly, is the same as that returned by the corresponding describe() value. In a similar way we can compute the median(): julia> median(iris[:SepalLength]) 5.8 And there's (a lot) more, such as, for instance, the standard deviation std(): julia> std(iris[:SepalLength]) 0.828066127977863 Or, we can use another function from the Statistics package, cor(), in a simple script to help us understand how the values are correlated: julia> for x in names(iris)[1:end-1] for y in names(iris)[1:end-1] println("$x \t $y \t $(cor(iris[x], iris[y]))") end println("-------------------------------------------") end Executing this snippet will produce the following output: SepalLength SepalLength 1.0 SepalLength SepalWidth -0.11756978413300191 SepalLength PetalLength 0.8717537758865831 SepalLength PetalWidth 0.8179411262715759 ------------------------------------------------------------ SepalWidth SepalLength -0.11756978413300191 SepalWidth SepalWidth 1.0 SepalWidth PetalLength -0.42844010433053953 SepalWidth PetalWidth -0.3661259325364388 ------------------------------------------------------------ PetalLength SepalLength 0.8717537758865831 PetalLength SepalWidth -0.42844010433053953 PetalLength PetalLength 1.0 PetalLength PetalWidth 0.9628654314027963 ------------------------------------------------------------ PetalWidth SepalLength 0.8179411262715759 PetalWidth SepalWidth -0.3661259325364388 PetalWidth PetalLength 0.9628654314027963 PetalWidth PetalWidth 1.0 ------------------------------------------------------------ The script iterates over each column of the dataset with the exception of Species (the last column, which is not numeric), and generates a basic correlation table. The table shows strong positive correlations between SepalLength and PetalLength (87.17%), SepalLength and PetalWidth (81.79%), and PetalLength and PetalWidth (96.28%). There is no strong correlation between SepalLength and SepalWidth. We can use the same script, but this time employ the cov() function to compute the covariance of the values in the dataset: julia> for x in names(iris)[1:end-1] for y in names(iris)[1:end-1] println("$x \t $y \t $(cov(iris[x], iris[y]))") end println("--------------------------------------------") end This code will generate the following output: SepalLength SepalLength 0.6856935123042507 SepalLength SepalWidth -0.04243400447427293 SepalLength PetalLength 1.2743154362416105 SepalLength PetalWidth 0.5162706935123043 ------------------------------------------------------- SepalWidth SepalLength -0.04243400447427293 SepalWidth SepalWidth 0.189979418344519 SepalWidth PetalLength -0.3296563758389262 SepalWidth PetalWidth -0.12163937360178968 ------------------------------------------------------- PetalLength SepalLength 1.2743154362416105 PetalLength SepalWidth -0.3296563758389262 PetalLength PetalLength 3.1162778523489933 PetalLength PetalWidth 1.2956093959731543 ------------------------------------------------------- PetalWidth SepalLength 0.5162706935123043 PetalWidth SepalWidth -0.12163937360178968 PetalWidth PetalLength 1.2956093959731543 PetalWidth PetalWidth 0.5810062639821031 ------------------------------------------------------- The output illustrates that SepalLength is positively related to PetalLength and PetalWidth, while being negatively related to SepalWidth. SepalWidth is negatively related to all the other values. Moving on, if we want a random data sample, we can ask for it like this: julia> rand(iris[:SepalLength]) 7.4 Optionally, we can pass in the number of values to be sampled: julia> rand(iris[:SepalLength], 5) 5-element Array{Float64,1}: 6.9 5.8 6.7 5.0 5.6 We can convert one of the columns to an array using the following: julia> sepallength = Array(iris[:SepalLength]) 150-element Array{Float64,1}: 5.1 4.9 4.7 4.6 5.0 # ... output truncated ... Or we can convert the whole DataFrame to a matrix: julia> irisarr = convert(Array, iris[:,:]) 150×5 Array{Any,2}: 5.1 3.5 1.4 0.2 CategoricalString{UInt8} "setosa" 4.9 3.0 1.4 0.2 CategoricalString{UInt8} "setosa" 4.7 3.2 1.3 0.2 CategoricalString{UInt8} "setosa" 4.6 3.1 1.5 0.2 CategoricalString{UInt8} "setosa" 5.0 3.6 1.4 0.2 CategoricalString{ UInt8} "setosa" # ... output truncated ... Loading and saving our data Julia comes with excellent facilities for reading and storing data out of the box. Given its focus on data science and scientific computing, support for tabular-file formats (CSV, TSV) is first class. Let's extract some data from our initial dataset and use it to practice persistence and retrieval from various backends. We can reference a section of a DataFrame by defining its bounds through the corresponding columns and rows. For example, we can define a new DataFrame composed only of the PetalLength and PetalWidth columns and the first three rows: julia> iris[1:3, [:PetalLength, :PetalWidth]] 3×2 DataFrames.DataFrame │ Row │ PetalLength │ PetalWidth │ ├─────┼─────────────┼────────────┤ │ 1 │ 1.4 │ 0.2 │ │ 2 │ 1.4 │ 0.2 │ │ 3 │ 1.3 │ 0.2 │ The generic indexing notation is dataframe[rows, cols], where rows can be a number, a range, or an Array of boolean values where true indicates that the row should be included: julia> iris[trues(150), [:PetalLength, :PetalWidth]] This snippet will select all the 150 rows since trues(150) constructs an array of 150 elements that are all initialized as true. The same logic applies to cols, with the added benefit that they can also be accessed by name. Armed with this knowledge, let's take a sample from our original dataset. It will include some 10% of the initial data and only the PetalLength, PetalWidth, and Species columns: julia> test_data = iris[rand(150) .<= 0.1, [:PetalLength, :PetalWidth, :Species]] 10×3 DataFrames.DataFrame │ Row │ PetalLength │ PetalWidth │ Species │ ├─────┼─────────────┼────────────┼──────────────┤ │ 1 │ 1.1 │ 0.1 │ "setosa" │ │ 2 │ 1.9 │ 0.4 │ "setosa" │ │ 3 │ 4.6 │ 1.3 │ "versicolor" │ │ 4 │ 5.0 │ 1.7 │ "versicolor" │ │ 5 │ 3.7 │ 1.0 │ "versicolor" │ │ 6 │ 4.7 │ 1.5 │ "versicolor" │ │ 7 │ 4.6 │ 1.4 │ "versicolor" │ │ 8 │ 6.1 │ 2.5 │ "virginica" │ │ 9 │ 6.9 │ 2.3 │ "virginica" │ │ 10 │ 6.7 │ 2.0 │ "virginica" │ What just happened here? The secret in this piece of code is rand(150) .<= 0.1. It does a lot—first, it generates an array of random Float values between 0 and 1; then, it compares the array, element-wise, against 0.1 (which represents 10% of 1); and finally, the resultant Boolean array is used to filter out the corresponding rows from the dataset. It's really impressive how powerful and succinct Julia can be! In my case, the result is a DataFrame with the preceding 10 rows, but your data will be different since we're picking random rows (and it's quite possible you won't have exactly 10 rows either). Saving and loading using tabular file formats We can easily save this data to a file in a tabular file format (one of CSV, TSV, and others) using the CSV package. We'll have to add it first and then call the write method: pkg> add CSV julia> using CSV julia> CSV.write("test_data.csv", test_data) And, just as easily, we can read back the data from tabular file formats, with the corresponding CSV.read function: julia> td = CSV.read("test_data.csv") 10×3 DataFrames.DataFrame │ Row │ PetalLength │ PetalWidth │ Species │ ├─────┼─────────────┼────────────┼──────────────┤ │ 1 │ 1.1 │ 0.1 │ "setosa" │ │ 2 │ 1.9 │ 0.4 │ "setosa" │ │ 3 │ 4.6 │ 1.3 │ "versicolor" │ │ 4 │ 5.0 │ 1.7 │ "versicolor" │ │ 5 │ 3.7 │ 1.0 │ "versicolor" │ │ 6 │ 4.7 │ 1.5 │ "versicolor" │ │ 7 │ 4.6 │ 1.4 │ "versicolor" │ │ 8 │ 6.1 │ 2.5 │ "virginica" │ │ 9 │ 6.9 │ 2.3 │ "virginica" │ │ 10 │ 6.7 │ 2.0 │ "virginica" │ Just specifying the file extension is enough for Julia to understand how to handle the document (CSV, TSV), both when writing and reading. Working with Feather files Feather is a binary file format that was specially designed for storing data frames. It is fast, lightweight, and language-agnostic. The project was initially started in order to make it possible to exchange data frames between R and Python. Soon, other languages added support for it, including Julia. Support for Feather files does not come out of the box, but is made available through the homonymous package. Let's go ahead and add it and then bring it into scope: pkg> add Feather julia> using Feather Now, saving our DataFrame is just a matter of calling Feather.write: julia> Feather.write("test_data.feather", test_data) Next, let's try the reverse operation and load back our Feather file. We'll use the counterpart read function: julia> Feather.read("test_data.feather") 10×3 DataFrames.DataFrame │ Row │ PetalLength │ PetalWidth │ Species │ ├─────┼─────────────┼────────────┼──────────────┤ │ 1 │ 1.1 │ 0.1 │ "setosa" │ │ 2 │ 1.9 │ 0.4 │ "setosa" │ │ 3 │ 4.6 │ 1.3 │ "versicolor" │ │ 4 │ 5.0 │ 1.7 │ "versicolor" │ │ 5 │ 3.7 │ 1.0 │ "versicolor" │ │ 6 │ 4.7 │ 1.5 │ "versicolor" │ │ 7 │ 4.6 │ 1.4 │ "versicolor" │ │ 8 │ 6.1 │ 2.5 │ "virginica" │ │ 9 │ 6.9 │ 2.3 │ "virginica" │ │ 10 │ 6.7 │ 2.0 │ "virginica" │ Yeah, that's our sample data all right! In order to provide compatibility with other languages, the Feather format imposes some restrictions on the data types of the columns. You can read more about Feather in the package's official documentation at https://juliadata.github.io/Feather.jl/latest/index.html. Saving and loading with MongoDB Let's also take a look at using a NoSQL backend for persisting and retrieving our data. In order to follow through this part, you'll need a working MongoDB installation. You can download and install the correct version for your operating system from the official website, at https://www.mongodb.com/download-center?jmp=nav#community. I will use a Docker image which I installed and started up through Docker's Kitematic (available for download at https://github.com/docker/kitematic/releases). Next, we need to make sure to add the Mongo package. The package also has a dependency on LibBSON, which is automatically added. LibBSON is used for handling BSON, which stands for Binary JSON, a binary-encoded serialization of JSON-like documents. While we're at it, let's add the JSON package as well; we will need it. I'm sure you know how to do that by now—if not, here is a reminder: pkg> add Mongo, JSON At the time of writing, Mongo.jl support for Julia v1 was still a work in progress. This code was tested using Julia v0.6. Easy! Let's let Julia know that we'll be using all these packages: julia> using Mongo, LibBSON, JSON We're now ready to connect to MongoDB: julia> client = MongoClient() Once successfully connected, we can reference a dataframes collection in the db database: julia> storage = MongoCollection(client, "db", "dataframes") Julia's MongoDB interface uses dictionaries (a data structure called Dict in Julia) to communicate with the server. For now, all we need to do is to convert our DataFrame to such a Dict. The simplest way to do it is to sequentially serialize and then deserialize the DataFrame by using the JSON package. It generates a nice structure that we can later use to rebuild our DataFrame: julia> datadict = JSON.parse(JSON.json(test_data)) Thinking ahead, to make any future data retrieval simpler, let's add an identifier to our dictionary: julia> datadict["id"] = "iris_test_data" Now we can insert it into Mongo: julia> insert(storage, datadict) In order to retrieve it, all we have to do is query the Mongo database using the "id" field we've previously configured: Julia> data_from_mongo = first(find(storage, query("id" => "iris_test_data"))) We get a BSONObject, which we need to convert back to a DataFrame. Don't worry, it's straightforward. First, we create an empty DataFrame: julia> df_from_mongo = DataFrame() 0×0 DataFrames.DataFrame Then we populate it using the data we retrieved from Mongo: for i in 1:length(data_from_mongo["columns"]) df_from_mongo[Symbol(data_from_mongo["colindex"]["names"][i])] = Array(data_from_mongo["columns"][i]) end julia> df_from_mongo 10×3 DataFrames.DataFrame │ Row │ PetalLength │ PetalWidth │ Species │ ├─────┼─────────────┼────────────┼──────────────┤ │ 1 │ 1.1 │ 0.1 │ "setosa" │ │ 2 │ 1.9 │ 0.4 │ "setosa" │ │ 3 │ 4.6 │ 1.3 │ "versicolor" │ │ 4 │ 5.0 │ 1.7 │ "versicolor" │ │ 5 │ 3.7 │ 1.0 │ "versicolor" │ │ 6 │ 4.7 │ 1.5 │ "versicolor" │ │ 7 │ 4.6 │ 1.4 │ "versicolor" │ │ 8 │ 6.1 │ 2.5 │ "virginica" │ │ 9 │ 6.9 │ 2.3 │ "virginica" │ │ 10 │ 6.7 │ 2.0 │ "virginica" │ And that's it! Our data has been loaded back into a DataFrame. In this tutorial, we looked at the Iris dataset and worked on loading and saving the data in a simple Julia project.  To learn more about machine learning recommendation in Julia and testing the model check out this book Julia Programming Projects. Julia for machine learning. Will the new language pick up pace? Announcing Julia v1.1 with better exception handling and other improvement GitHub Octoverse: top machine learning packages, languages, and projects of 2018
Read more
  • 0
  • 0
  • 7267

article-image-create-box-whisker-plot-tableau
Sugandha Lahoti
30 Dec 2017
5 min read
Save for later

How to create a Box and Whisker Plot in Tableau

Sugandha Lahoti
30 Dec 2017
5 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Shweta Sankhe-Savale, titled Tableau Cookbook – Recipes for Data Visualization. With the recipes in this book, learn to create beautiful data visualizations in no time on Tableau.[/box] In today’s tutorial, we will learn how to create a Box and Whisker plot in Tableau. The Box plot, or Box and Whisker plot as it is popularly known, is a convenient statistical representation of the variation in a statistical population. It is a great way of showing a number of data points as well as showing the outliers and the central tendencies of data. This visual representation of the distribution within a dataset was first introduced by American mathematician John W. Tukey in 1969. A box plot is significantly easier to plot than say a histogram and it does not require the user to make assumptions regarding the bin sizes and number of bins; and yet it gives significant insight into the distribution of the dataset. The box plot primarily consists of four parts: The median provides the central tendency of our dataset. It is the value that divides our dataset into two parts, values that are either higher or lower than the median. The position of the median within the box indicates the skewness in the data as it shifts either towards the upper or lower quartile. The upper and lower quartiles, which form the box, represent the degree of dispersion or spread of the data between them. The difference between the upper and lower quartile is called the Interquartile Range (IQR) and it indicates the mid-spread within which 50 percentage of the points in our dataset lie. The upper and lower whiskers in a box plot can either be plotted at the maximum and minimum value in the dataset, or 1.5 times the IQR on the upper and lower side. Plotting the whiskers at the maximum and minimum values includes 100 percentage of all values in the dataset including all the outliers. Whereas plotting the whiskers at 1.5 times the IQR on the upper and lower side represents outliers in the data beyond the whiskers. The points lying between the lower whisker and the lower quartile are the lower 25 percent of values in the dataset, whereas the points lying between the upper whisker and the upper quartile are the upper 25 percent of values in the dataset. In a typical normal distribution, each part of the box plot will be equally spaced. However, in most cases, the box plot will quickly show the underlying variations and trends in data and allows for easy comparison between datasets: Getting Ready Create a Box and Whisker plot in a new sheet in a workbook. For this purpose, we will connect to an Excel file named Data for Box plot & Gantt chart, which has been uploaded on https://1drv.ms/f/ s!Av5QCoyLTBpnhkGyrRrZQWPHWpcY. Let us save this Excel file in Documents | My Tableau Repository | Datasources | Tableau Cookbook data folder. The data contains information about customers in terms of their gender and recorded weight. The data contains 100 records, one record per customer. Using this data, let us look at how we can create a Box and Whisker plot. How to do it Once we have downloaded and saved the data from the link provided in the Getting ready section, we will create a new worksheet in our existing workbook and rename it to Box and Whisker plot. Since we haven't connected to the new dataset yet, establish a new data connection by pressing Ctrl + D on our keyboard. Select the Excel option and connect to the Data for Box plot & Gantt chart file, which is saved in our Documents | My Tableau Repository | Datasources | Tableau Cookbook data folder. Next let us select the table named Box and Whisker plot data by doubleclicking on it. Let us go ahead with the Live option to connect to this data. Next let us multi-select the Customer and Gender field from the Dimensions pane and the Weight from the Measures pane by doing a Ctrl + Select. Refer to the following image: 6. Next let us click on the Show Me! button and select the box-and-whisker plot. Refer to the highlighted section in the following image: 7. Once we click on the box-and-whisker plot option, we will see the following view: How it works In the preceding chart, we get two box and whisker plots: one for each gender. The whiskers are the maximum and minimum extent of the data. Furthermore, in each category we can see some circles, which are essentially representing a customer. Thus, within each gender category, the graph is showing the distribution of customers by their respective weights. When we hover over any of these circles, we can see details of the customer in terms of name, gender, and recorded weight in the tooltip. Refer to the following image: However, when we hover over the box (gray section), we will see the details in terms of median, lower quartiles, upper quartiles, and so on. Refer to the following image: Thus, a summary of the box plot that we created is as follows: In more simple terms, for the female category, the majority of the population lies between the weight range of 44 to 75, whereas for the male category, the majority of the population lies between the weight range of 44 to 82. Please note that in our visualization, even though the Row shelf displays SUM(Weight), since we have Customer in the Detail shelf, there's only one entry per customer, so SUM(Weight) is actually the same as MIN(Weight), MAX(Weight), or AVG(Weight). We learnt the basics of Box and Whisker plot and how to create them using Tableau. If you had fun with this recipe, do check out our book Tableau Cookbook – Recipes for Data Visualization to create interactive dashboards and beautiful data visualizations with Tableau.        
Read more
  • 0
  • 0
  • 7265

article-image-time-series-modeling-what-is-it-why-it-matters-how-its-used
Sunith Shetty
10 Aug 2018
11 min read
Save for later

Time series modeling: What is it, Why it matters and How it's used

Sunith Shetty
10 Aug 2018
11 min read
A series can be defined as a number of events, objects, or people of a similar or related kind coming one after another; if we add the dimension of time, we get a time series. A time series can be defined as a series of data points in time order. In this article, we will understand what time series is and why it is one of the essential characteristics for forecasting. This article is an excerpt from a book written by Harish Gulati titled SAS for Finance. The importance of time series What importance, if any, does time series have and how will it be relevant in the future? These are just a couple of fundamental questions that any user should find answers to before delving further into the subject. Let's try to answer this by posing a question. Have you heard the terms big data, artificial intelligence (AI), and machine learning (ML)? These three terms make learning time series analysis relevant. Big data is primarily about a large amount of data that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interaction. AI is a kind of technology that is being developed by data scientists, computational experts, and others to enable processes to become more intelligent, while ML is an enabler that is helping to implement AI. All three of these terms are interlinked with the data they use, and a lot of this data is time series in its nature. This could be either financial transaction data, the behavior pattern of individuals during various parts of the day, or related to life events that we might experience. An effective mechanism that enables us to capture the data, store it, analyze it, and then build algorithms to predict transactions, behavior (and life events, in this instance) will depend on how big data is utilized and how AI and MI are leveraged. A common perception in the industry is that time series data is used for forecasting only. In practice, time series data is used for: Pattern recognition Forecasting Benchmarking Evaluating the influence of a single factor on the time series Quality control For example, a retailer may identify a pattern in clothing sales every time it gets a celebrity endorsement, or an analyst may decide to use car sales volume data from 2012 to 2017 to set a selling benchmark in units. An analyst might also build a model to quantify the effect of Lehman's crash at the height of the 2008 financial crisis in pushing up the price of gold. Variance in the success of treatments across time periods can also be used to highlight a problem, the tracking of which may enable a hospital to take remedial measures. These are just some of the examples that showcase how time series analysis isn't limited to just forecasting. In this chapter, we will review how the financial industry and others use forecasting, discuss what a good and a bad forecast is, and hope to understand the characteristics of time series data and its associated problems. Forecasting across industries Since one of the primary uses of time series data is forecasting, it's wise that we learn about some of its fundamental properties. To understand what the industry means by forecasting and the steps involved, let's visit a common misconception about the financial industry: only lending activities require forecasting. We need forecasting in order to grant personal loans, mortgages, overdrafts, or simply assess someone's eligibility for a credit card, as the industry uses forecasting to assess a borrower's affordability and their willingness to repay the debt. Even deposit products such as savings accounts, fixed-term savings, and bonds are priced based on some forecasts. How we forecast and the rationale for that methodology is different in borrowing or lending cases, however. All of these areas are related to time series, as we inevitably end up using time series data as part of the overall analysis that drives financial decisions. Let's understand the forecasts involved here a bit better. When we are assessing an individual's lending needs and limits, we are forecasting for a single person yet comparing the individual to a pool of good and bad customers who have been offered similar products. We are also assessing the individual's financial circumstances and behavior through industry-available scoring models or by assessing their past behavior, with the financial provider assessing the lending criteria. In the case of deposit products, as long as the customer is eligible to transact (can open an account and has passed know your customer (KYC), anti-money laundering (AML), and other checks), financial institutions don't perform forecasting at an individual level. However, the behavior of a particular customer is primarily driven by the interest rate offered by the financial institution. The interest rate, in turn, is driven by the forecasts the financial institution has done to assess its overall treasury position. The treasury is the department that manages the central bank's money and has the responsibility of ensuring that all departments are funded, which is generated through lending and attracting deposits at a lower rate than a bank lends. The treasury forecasts its requirements for lending and deposits, while various teams within the treasury adhere to those limits. Therefore, a pricing manager for a deposit product will price the product in such a way that the product will attract enough deposits to meet the forecasted targets shared by the treasury; the pricing manager also has to ensure that those targets aren't overshot by a significant margin, as the treasury only expects to manage a forecasted target. In both lending and deposit decisions, financial institutions do tend to use forecasting. A lot of these forecasts are interlinked, as we saw in the example of the treasury's expectations and the subsequent pricing decision for a deposit product. To decide on its future lending and borrowing positions, the treasury must have used time series data to determine what the potential business appetite for lending and borrowing in the market is and would have assessed that with the current cash flow situation within the relevant teams and institutions. Characteristics of time series data Any time series analysis has to take into account the following factors: Seasonality Trend Outliers and rare events Disruptions and step changes Seasonality Seasonality is a phenomenon that occurs each calendar year. The same behavior can be observed each year. A good forecasting model will be able to incorporate the effect of seasonality in its forecasts. Christmas is a great example of seasonality, where retailers have come to expect higher sales over the festive period. Seasonality can extend into months but is usually only observed over days or weeks. When looking at time series where the periodicity is hours, you may find a seasonality effect for certain hours of the day. Some of the reasons for seasonality include holidays, climate, and changes in social habits. For example, travel companies usually run far fewer services on Christmas Day, citing a lack of demand. During most holidays people love to travel, but this lack of demand on Christmas Day could be attributed to social habits, where people tend to stay at home or have already traveled. Social habit becomes a driving factor in the seasonality of journeys undertaken on Christmas Day. It's easier for the forecaster when a particular seasonal event occurs on a fixed calendar date each year; the issue comes when some popular holidays depend on lunar movements, such as Easter, Diwali, and Eid. These holidays may occur in different weeks or months over the years, which will shift the seasonality effect. Also, if some holidays fall closer to other holiday periods, it may lead to individuals taking extended holidays and travel sales may increase more than expected in such years. The coffee shop near the office may also experience lower sales for a longer period. Changes in the weather can also impact seasonality; for example, a longer, warmer summer may be welcome in the UK, but this would impact retail sales in the autumn as most shoppers wouldn't need to buy a new wardrobe. In hotter countries, sales of air-conditioners would increase substantially compared to the summer months' usual seasonality. Forecasters could offset this unpredictability in seasonality by building in a weather forecast variable. We will explore similar challenges in the chapters ahead. Seasonality shouldn't be confused with a cyclic effect. A cyclic effect is observed over a longer period of generally two years or more. The property sector is often associated with having a cyclic effect, where it has long periods of growth or slowdown before the cycle continues. Trend A trend is merely a long-term direction of observed behavior that is found by plotting data against a time component. A trend may indicate an increase or decrease in behavior. Trends may not even be linear, but a broad movement can be identified by analyzing plotted data. Outliers and rare events Outliers and rare events are terminologies that are often used interchangeably by businesses. These concepts can have a big impact on data, and some sort of outlier treatment is usually applied to data before it is used for modeling. It is almost impossible to predict an outlier or rare event but they do affect a trend. An example of an outlier could be a customer walking into a branch to deposit an amount that is 100 times the daily average of that branch. In this case, the forecaster wouldn't expect that trend to continue. Disruptions Disruptions and step changes are becoming more common in time series data. One reason for this is the abundance of available data and the growing ability to store and analyze it. Disruptions could include instances when a business hasn't been able to trade as normal. Flooding at the local pub may lead to reduced sales for a few days, for example. While analyzing daily sales across a pub chain, an analyst may have to make note of a disruptive event and its impact on the chain's revenue. Step changes are also more common now due to technological shifts, mergers and acquisitions, and business process re-engineering. When two companies announce a merger, they often try to sync their data. They might have been selling x and y quantities individually, but after the merger will expect to sell x + y + c (where c is the positive or negative effect of the merger). Over time, when someone plots sales data in this case, they will probably spot a step change in sales that happened around the time of the merger, as shown in the following screenshot: In the trend graph, we can see that online travel bookings are increasing. In the step change and disruptions chart, we can see that Q1 of 2012 saw a substantive increase in bookings, where Q1 of 2014 saw a substantive dip. The increase was due to the merger of two companies that took place in Q1 of 2012. The decrease in Q1 of 2014 was attributed to prolonged snow storms in Europe and the ash cloud disruption from volcanic activity over Iceland. While online bookings kept increasing after the step change, the disruption caused by the snow storm and ash cloud only had an effect on sales in Q1 of 2014. In this case, the modeler will have to treat the merger and the disruption differently while using them in the forecast, as disruption could be disregarded as an outlier and treated accordingly. Also note that the seasonality chart shows that Q4 of each year sees almost a 20% increase in travel bookings, and this pattern continues each calendar year. In this article, we defined time series and learned why it is important for forecasting. We also looked at the characteristics of time series data. To know more how to leverage the analytical power of SAS to perform financial analysis efficiently, you can check out the book SAS for Finance. Read more Getting to know SQL Server options for disaster recovery Implementing a simple Time Series Data Analysis in R Training RNNs for Time Series Forecasting
Read more
  • 0
  • 0
  • 7264

article-image-jupyter-and-python-scripting
Packt
21 Oct 2016
9 min read
Save for later

Jupyter and Python Scripting

Packt
21 Oct 2016
9 min read
In this article by Dan Toomey, author of the book Learning Jupyter, we will see data access in Jupyter with Python and the effect of pandas on Jupyter. We will also see Python graphics and lastly Python random numbers. (For more resources related to this topic, see here.) Python data access in Jupyter I started a view for pandas using Python Data Access as the name. We will read in a large dataset and compute some standard statistics on the data. We are interested in seeing how we use pandas in Jupyter, how well the script performs, and what information is stored in the metadata (especially if it is a larger dataset). Our script accesses the iris dataset built into one of the Python packages. All we are looking to do is read in a slightly large number of items and calculate some basic operations on the dataset. We are really interested in seeing how much of the data is cached in the PYNB file. The Python code is: # import the datasets package from sklearn import datasets # pull in the iris data iris_dataset = datasets.load_iris() # grab the first two columns of data X = iris_dataset.data[:, :2] # calculate some basic statistics x_count = len(X.flat) x_min = X[:, 0].min() - .5 x_max = X[:, 0].max() + .5 x_mean = X[:, 0].mean() # display our results x_count, x_min, x_max, x_mean I broke these steps into a couple of cells in Jupyter, as shown in the following screenshot: Now, run the cells (using Cell | Run All) and you get this display below. The only difference is the last Out line where our values are displayed. It seemed to take longer to load the library (the first time I ran the script) than to read the data and calculate the statistics. If we look in the PYNB file for this notebook, we see that none of the data is cached in the PYNB file. We simply have code references to the library, our code, and the output from when we last calculated the script: { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(300, 3.7999999999999998, 8.4000000000000004, 5.8433333333333337)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# calculate some basic statisticsn", "x_count = len(X.flat)n", "x_min = X[:, 0].min() - .5n", "x_max = X[:, 0].max() + .5n", "x_mean = X[:, 0].mean()n", "n", "# display our resultsn", "x_count, x_min, x_max, x_mean" ] } Python pandas in Jupyter One of the most widely used features of Python is pandas. pandas are built-in libraries of data analysis packages that can be used freely. In this example, we will develop a Python script that uses pandas to see if there is any effect to using them in Jupyter. I am using the Titanic dataset from http://www.kaggle.com/c/titanic-gettingStarted/download/train.csv. I am sure the same data is available from a variety of sources. Here is our Python script that we want to run in Jupyter: from pandas import * training_set = read_csv('train.csv') training_set.head() male = training_set[training_set.sex == 'male'] female = training_set[training_set.sex =='female'] womens_survival_rate = float(sum(female.survived))/len(female) mens_survival_rate = float(sum(male.survived))/len(male) The result is… we calculate the survival rates of the passengers based on sex. We create a new notebook, enter the script into appropriate cells, include adding displays of calculated data at each point and produce our results. Here is our notebook laid out where we added displays of calculated data at each cell,as shown in the following screenshot: When I ran this script, I had two problems: On Windows, it is common to use backslash ("") to separate parts of a filename. However, this coding uses the backslash as a special character. So, I had to change over to use forward slash ("/") in my CSV file path. I originally had a full path to the CSV in the above code example. The dataset column names are taken directly from the file and are case sensitive. In this case, I was originally using the 'sex' field in my script, but in the CSV file the column is named Sex. Similarly I had to change survived to Survived. The final script and result looks like the following screenshot when we run it: I have used the head() function to display the first few lines of the dataset. It is interesting… the amount of detail that is available for all of the passengers. If you scroll down, you see the results as shown in the following screenshot: We see that 74% of the survivors were women versus just 19% men. I would like to think chivalry is not dead! Curiously the results do not total to 100%. However, like every other dataset I have seen, there is missing and/or inaccurate data present. Python graphics in Jupyter How do Python graphics work in Jupyter? I started another view for this named Python Graphics so as to distinguish the work. If we were to build a sample dataset of baby names and the number of births in a year of that name, we could then plot the data. The Python coding is simple: import pandas import matplotlib %matplotlib inline baby_name = ['Alice','Charles','Diane','Edward'] number_births = [96, 155, 66, 272] dataset = list(zip(baby_name,number_births)) df = pandas.DataFrame(data = dataset, columns=['Name', 'Number']) df['Number'].plot() The steps of the script are as follows: We import the graphics library (and data library) that we need Define our data Convert the data into a format that allows for easy graphical display Plot the data We would expect a resultant graph of the number of births by baby name. Taking the above script and placing it into cells of our Jupyter node, we get something that looks like the following screenshot: I have broken the script into different cells for easier readability. Having different cells also allows you to develop the script easily step by step, where you can display the values computed so far to validate your results. I have done this in most of the cells by displaying the dataset and DataFrame at the bottom of those cells. When we run this script (Cell | Run All), we see the results at each step displayed as the script progresses: And finally we see our plot of the births as shown in the following screenshot. I was curious what metadata was stored for this script. Looking into the IPYNB file, you can see the expected value for the formula cells. The tabular data display of the DataFrame is stored as HTML—convenient: { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "<div>n", "<table border="1" class="dataframe">n", "<thead>n", "<tr style="text-align: right;">n", "<th></th>n", "<th>Name</th>n", "<th>Number</th>n", "</tr>n", "</thead>n", "<tbody>n", "<tr>n", "<th>0</th>n", "<td>Alice</td>n", "<td>96</td>n", "</tr>n", "<tr>n", "<th>1</th>n", "<td>Charles</td>n", "<td>155</td>n", "</tr>n", "<tr>n", "<th>2</th>n", "<td>Diane</td>n", "<td>66</td>n", "</tr>n", "<tr>n", "<th>3</th>n", "<td>Edward</td>n", "<td>272</td>n", "</tr>n", "</tbody>n", "</table>n", "</div>" ], "text/plain": [ " Name Numbern", "0 Alice 96n", "1 Charles 155n", "2 Diane 66n", "3 Edward 272" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], The graphic output cell that is stored like this: { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "<matplotlib.axes._subplots.AxesSubplot at 0x47cf8f0>" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "<a few hundred lines of hexcodes> …/wc/B0RRYEH0EQAAAABJRU5ErkJggg==n", "text/plain": [ "<matplotlib.figure.Figure at 0x47d8e30>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# plot the datan", "df['Number'].plot()n" ] } ], Where the image/png tag contains a large hex digit string representation of the graphical image displayed on screen (I abbreviated the display in the coding shown). So, the actual generated image is stored in the metadata for the page. Python random numbers in Jupyter For many analyses we are interested in calculating repeatable results. However, much of the analysis relies on some random numbers to be used. In Python, you can set the seed for the random number generator to achieve repeatable results with the random_seed() function. In this example, we simulate rolling a pair of dice and looking at the outcome. We would example the average total of the two dice to be 6—the halfway point between the faces. The script we are using is this: import pylab import random random.seed(113) samples = 1000 dice = [] for i in range(samples): total = random.randint(1,6) + random.randint(1,6) dice.append(total) pylab.hist(dice, bins= pylab.arange(1.5,12.6,1.0)) pylab.show() Once we have the script in Jupyter and execute it, we have this result: I had added some more statistics. Not sure if I would have counted on such a high standard deviation. If we increased the number of samples, this would decrease. The resulting graph was opened in a new window, much as it would if you ran this script in another Python development environment. The toolbar at the top of the graphic is extensive, allowing you to manipulate the graphic in many ways. Summary In this article, we walked through simple data access in Jupyter through Python. Then we saw an example of using pandas. We looked at a graphics example. Finally, we looked at an example using random numbers in a Python script. Resources for Article: Further resources on this subject: Python Data Science Up and Running [article] Mining Twitter with Python – Influence and Engagement [article] Unsupervised Learning [article]
Read more
  • 0
  • 0
  • 7263
article-image-using-specular-unity
Packt
16 Aug 2013
14 min read
Save for later

Using Specular in Unity

Packt
16 Aug 2013
14 min read
(For more resources related to this topic, see here.) The specularity of an object surface simply describes how shiny it is. These types of effects are often referred to as view-dependent effects in the Shader world. This is because in order to achieve a realistic Specular effect in your Shaders, you need to include the direction the camera or user is facing the object's surface. Although Specular requires one more component to achieve its visual believability, which is the light direction. By combining these two directions or vectors, we end up with a hotspot or highlight on the surface of the object, half way between the view direction and the light direction. This half-way direction is called the half vector and is something new we are going to explore in this article, along with customizing our Specular effects to simulate metallic and cloth Specular surfaces. Utilizing Unity3D's built-in Specular type Unity has already provided us with a Specular function we can use for our Shaders. It is called the BlinnPhong Specular lighting model. It is one of the more basic and efficient forms of Specular, which you can find used in a lot of games even today. Since it is already built into the Unity Surface Shader language, we thought it is best to start with that first and build on it. You can also find an example in the Unity reference manual, but we will go into a bit more depth with it and explain where the data is coming from and why it is working the way it is. This will help you to get a nice grounding in setting up Specular, so that we can build on that knowledge in the future recipes in this article. Getting ready Let's start by carrying out the following: Create a new Shader and give it a name. Create a new Material, give it a name, and assign the new Shader to its shaper property. Then create a sphere object and place it roughly at world center. Finally, let's create a directional light to cast some light onto our object. When your assets have been set up in Unity, you should have a scene that resembles the following screenshot: How to do it… Begin by adding the following properties to the Shader's Properties block: We then need to make sure we add the variables to the CGPROGRAM block, so that we can use the data in our new properties inside our Shader's CGPROGRAM block. Notice that we don't need to declare the _SpecColor property as a variable. This is because Unity has already created this variable for us in the built-in Specular model. All we need to do is declare it in our Properties block and it will pass the data along to the surf() function. Our Shader now needs to be told which lighting model we want to use to light our model with. You have seen the Lambert lighting model and how to make your own lighting model, but we haven't seen the BlinnPhong lighting model yet. So, let's add BlinnPhong to our #pragma statement like so: We then need to modify our surf() function to look like the following: How it works… This basic Specular is a great starting point when you are prototyping your Shaders, as you can get a lot accomplished in terms of writing the core functionality of the Shader, while not having to worry about the basic lighting functions. Unity has provided us with a lighting model that has already taken the task of creating your Specular lighting for you. If you look into the UnityCG.cginc file found in your Unity's install directory under the Data folder, you will notice that you have Lambert and BlinnPhong lighting models available for you to use. The moment you compile your Shader with the #pragma surface surf BlinnPhong, you are telling the Shader to utilize the BlinnPhong lighting function in the UnityCG.cginc file, so that we don't have to write that code over and over again. With your Shader compiled and no errors present, you should see a result similar to the following screenshot: Creating a Phong Specular type The most basic and performance-friendly Specular type is the Phong Specular effect. It is the calculation of the light direction reflecting off of the surface compared to the user's view direction. It is a very common Specular model used in many applications, from games to movies. While it isn't the most realistic in terms of accurately modeling the reflected Specular, it gives a great approximation that performs well in most situations. Plus, if your object is further away from the camera and the need for a very accurate Specular isn't needed, this is a great way to provide a Specular effect on your Shaders. In this article, we will be covering how to implement the per vertex version of the and also see how to implement the per pixel version using some new parameters in the surface Shader's Input struct. We will see the difference and discuss when and why to use these two different implementations for different situations. Getting ready Create a new Shader, Material, and object, and give them appropriate names so that you can find them later. Finally, attach the Shader to the Material and the Material to the object. To finish off your new scene, create a new directional light so that we can see our Specular effect as we code it. How to do it… You might be seeing a pattern at this point, but we always like to start out with our most basic part of the Shader writing process: the creation of properties. So, let's add the following properties to the Shader: We then have to make sure to add the corresponding variables to our CGPROGRAM block inside our SubShader block. Now we have to add our custom lighting model so that we can compute our own Phong Specular. Add the following code to the Shader's SubShader() function. Don't worry if it doesn't make sense at this point; we will cover each line of code in the next section: Finally, we have to tell the CGPROGRAM block that it needs to use our custom lighting function instead of one of the built-in ones. We do this by changing the #pragma statement to the following: The following screenshot demonstrates the result of our custom Phong lighting model using our own custom reflection vector: How it works… Let's break down the lighting function by itself, as the rest of the Shader should be pretty familiar to you at this point. We simply start by using the lighting function that gives us the view direction. Remember that Unity has given you a set of lighting functions that you can use, but in order to use them correctly you have to have the same arguments they provide. Refer to the following table, or go to http://docs.unity3d.com/Documentation/Components/SL-SurfaceShaderLighting.html: Not view Dependent half4 Lighting Name You choose (SurfaceOutput s, half3 lightDir, half atten); View Dependent half4 Lighting Name You choose (SurfaceOutput s, half3 lightDir, half3 viewDir, half atten); In our case, we are doing a Specular Shader, so we need to have the view-dependent lighting function structure. So, we have to write: This will tell the Shader that we want to create our own view-dependent Shader. Always make sure that your lighting function name is the same in your lighting function declaration and the #pragma statement, or Unity will not be able to find your lighting model. The lighting function then begins by declaring the usual Diffuse component by dotting the vertex normal with the light direction or vector. This will give us a value of 1 when a normal on the model is facing towards the light, and a value of -1 when facing away from the light direction. We then calculate the reflection vector taking the vertex normal, scaling it by 2.0 and by the diff value, then subtracting the light direction from it. This has the effect of bending the normal towards the light; so as a vertex normal is pointing away from the light, it is forced to look at the light. Refer to the following screenshot for a more visual representation. The script that produces this debug effect is included at the book's support page at www.packtpub.com/support. Then all we have left to do is to create the final spec's value and color. To do this, we dot the reflection vector with the view direction and take it to a power of _SpecPower. Finally, we just multiply the _SpecularColor.rgb value over the spec value to get our final Specular highlight. The following screenshot displays the final result of our Phong Specular calculation isolated out in the Shader: Creating a BlinnPhong Specular type Blinn is another more efficient way of calculating and estimating specularity. It is done by getting the half vector from the view direction and the light direction. It was brought into the world of Cg by a man named Jim Blinn. He found that it was much more efficient to just get the half vector instead of calculating our own reflection vectors. It cut down on both code and processing time. If you actually look at the built-in BlinnPhong lighting model included in the UnityCG.cginc file, you will notice that it is using the half vector as well, hence the reason why it is named BlinnPhong. It is just a simpler version of the full Phong calculation. Getting ready This time, instead of creating a whole new scene, let's just use the objects and scene we have, and create a new Shader and Material and name them BlinnPhong. Once you have a new Shader, double-click on it to launch MonoDevelop, so that we can start to edit our Shader. How to do it… First, we need to add our own properties to the Properties block, so that we can control the look of the Specular highlight. Then, we need to make sure that we have created the corresponding variables inside our CGPROGRAM block, so that we can access the data from our Properties block, inside of our subshader. Now it's time to create our custom lighting model that will process our Diffuse and Specular calculations. To complete our Shader, we will need to tell our CGPROGRAM block to use our custom lighting model rather than a built-in one, by modifying the #pragma statement with the following code: The following screenshot demonstrates the results of our BlinnPhong lighting model: How it works… The BlinnPhong Specular is almost exactly like the Phong Specular, except that it is more efficient because it uses less code to achieve almost the same effect. You will find this approach nine times out of ten in today's modern Shaders, as it is easier to code and lighter on the Shader performance. Instead of calculating our own reflection vector, we are simply going to get the vector half way between the view direction and the light direction, basically simulating the reflection vector. It has actually been found that this approach is more physically accurate than the last approach, but we thought it is necessary to show you all the possibilities. So to get the half vector, we simply need to add the view direction and the light direction together, as shown in the following code snippet: Then, we simply need to dot the vertex normal with that new half vector to get our main Specular value. After that, we just take it to a power of _SpecPower and multiply it by the Specular color variable. It's much lighter on the code and much lighter on the math, but still gives us a nice Specular highlight that will work for a lot of real-time situations. Masking Specular with textures Now that we have taken a look at how to create a Specular effect for our Shaders, let's start to take a look into the ways in which we can start to modify our Specular and give more artistic control over its final visual quality. In this next recipe, we will look at how we can use textures to drive our Specular and Specular power attributes. The technique of using Specular textures is seen in most modern game development pipelines because it allows the 3D artists to control the final visual effect on a per-pixel basis. This provides us with a way in which we can have a mat-type surface and a shiny surface all in one Shader; or, we can drive the width of the Specular or the Specular power with another texture, to have one surface with a broad Specular highlight and another surface with a very sharp, tiny highlight. There are many effects one can achieve by mixing his/her Shader calculations with textures, and giving artists the ability to control their Shader's final visual effect is key to an efficient pipeline. Let's see how we can use textures to drive our Specular lighting models. This article will introduce you to some new concepts, such as creating your own Input struct, and learning how the data is being passed around from the output struct, to the lighting function, to the Input struct, and to the surf() function. Understanding the flow of data between these core Surface Shader elements is core to a successful Shader pipeline. Getting ready We will need a new Shader, Material, and another object to apply our Shader and Material on to. With the Shader and Material connected and assigned to your object in your scene, double-click the Shader to bring it up in MonoDevelop. We will also need a Specular texture to use. Any texture will do as long as it has some nice variation in colors and patterns. The following screenshot shows the textures we are using for this recipe: How to do it… First, let's populate our Properties block with some new properties. Add the following code to your Shader's Properties block: We then need to add the corresponding variables to the subshader, so that we can access the data from the properties in our Properties block. Add the following code, just after the #pragma statement: Now we have to add our own custom Output struct. This will allow us to store more data for use between our surf function and our lighting model. Don't worry if this doesn't make sense just yet. We will cover the finer details of this Output struct in the next section of the article. Place the following code just after the variables in the SubShader block: Just after the Output struct we just entered, we need to add our custom lighting model. In this case, we have a custom lighting model called LightingCustomPhong. Enter the following code just after the Output struct we just created: In order for our custom lighting model to work, we have to tell the SubShader block which lighting model we want to use. Enter the following code to the #pragma statement so that it loads our custom lighting model: Since we are going to be using a texture to modify the values of our base Specular calculation, we need to store another set of UVs for that texture specifically. This is done inside the Input struct by placing the word uv in front of the variable's name that is holding the texture. Enter the following code just after your custom lighting model: To finish off the Shader, we just need to modify our surf() function with the following code. This will let us pass the texture information to our lighting model function, so that we can use the pixel values of the texture to modify our Specular values in the lighting model function: The following screenshot shows the result of masking our Specular calculations with a color texture and its channel information. We now have a nice variation in Specular over the entire surface, instead of just a global value for the Specular:
Read more
  • 0
  • 0
  • 7263

article-image-microsoft-mulls-replacing-c-and-c-code-with-rust-calling-it-a-a-modern-safer-system-programming-language-with-great-memory-safety-features
Vincy Davis
18 Jul 2019
3 min read
Save for later

Microsoft mulls replacing C and C++ code with Rust calling it a "modern safer system programming language" with great memory safety features

Vincy Davis
18 Jul 2019
3 min read
Here's another reason why Rust is the present and the future in programming. Few days ago, Microsoft announced that they are going to start exploring Rust and skip their own C languages. This announcement was made by the Principal Security Engineering Manager of Microsoft Security Response Centre (MSRC), Gavin Thomas. Thomas states that ~70% of the vulnerabilities which Microsoft assigns a CVE each year are caused by developers, who accidently insert memory corruption bugs into their C and C++ code. He adds, "As Microsoft increases its code base and uses more Open Source Software in its code, this problem isn’t getting better, it's getting worse. And Microsoft isn’t the only one exposed to memory corruption bugs—those are just the ones that come to MSRC." Image Source: Microsoft blog He highlights the fact that even after having so many security mechanisms (like static analysis tools, fuzzing at scale, taint analysis, many encyclopaedias of coding guidelines, threat modelling guidance, etc) to make a code secure, developers have to invest a lot of time in studying about more tools for training and vulnerability fixes. Thomas states that though C++ has many qualities like fast, mature, small memory and disk footprint, it does not have the memory security guarantee of languages like .NET C#. He believes that Rust is one language, which can provide both the requirements. Thomas strongly advocates that a software security industry should focus on providing a secure environment for developers to work on, rather than turning deaf ear to the importance of security, outdated methods and approaches. He thus concludes by hinting that Microsoft is going to adapt the Rust programming language. As he says that, "Perhaps it's time to scrap unsafe legacy languages and move on to a modern safer system programming language?" Microsoft exploring Rust is not surprising as Rust has been popular with many developers for its simpler syntax, less bugs, memory safe and thread safety. It has also been voted as the most loved programming language, according to the 2019 StackOverflow survey, the biggest developer survey on the internet. It allows developers to focus on their applications, rather than worrying about its security and maintenance. Recently, there have been many applications written in Rust, like Vector, Brave ad-blocker, PyOxidizer and more. Developers couldn't agree more with this post, as all have expressed their love for Rust. https://twitter.com/alilleybrinker/status/1151495738158977024 https://twitter.com/karanganesan/status/1151485485644054528 https://twitter.com/shah_sheikh/status/1151457054004875264 A Redditor says, "While this first post is very positive about memory-safe system programming languages in general and Rust in particular, I would not call this an endorsement. Still, great news!" Visit the Microsoft blog for more details. Introducing Ballista, a distributed compute platform based on Kubernetes and Rust EU Commission opens an antitrust case against Amazon on grounds of violating EU competition rules Fastly CTO Tyler McMullen on Lucet and the future of WebAssembly and Rust [Interview]
Read more
  • 0
  • 0
  • 7262

article-image-understanding-python-regex-engine
Packt
18 Feb 2014
8 min read
Save for later

Understanding the Python regex engine

Packt
18 Feb 2014
8 min read
(For more resources related to this topic, see here.) These are the most common characteristics of the algorithm: It supports "lazy quantifiers" such as *?, +?, and ??. It matches the first coincidence, even though there are longer ones in the string. >>>re.search("engineer | engineering", "engineering").group()'engineer' This also means that order is important. The algorithm tracks only one transition at one step, which means that the engine checks one character at a time. Backreferences and capturing parentheses are supported. Backtracking is the ability to remember the last successful position so that it can go back and retry if needed In the worst case, complexity is exponential O(Cn). We'll see this later in Backtracking. Backtracking Backtracking allows going back and repeating the different paths of the regular expression. It does so by remembering the last successful position, this applies to alternation and quantifiers, let’s see an example: Backtracking As we see in the image the regex engine tries to match one character at a time until it fails and starts again with the following path it can retry. The regex used in the image is the perfect example of the importance of how the regex is built, in this case the expression can be rebuild as spa (in | niard), so that the regex engine doesn’t have to go back up to the start of the string in order to retry the second alternative. This leads us to what is called catastrophic backtracking a well-known problem with backtracking that can give you several problems, ranging from slow regex to a crash with a stack overflow. In the previous example, you can see that the behavior grows not only with the input but also with the different paths in the regex, that’s why the algorithm is exponential O(Cn), with this in mind it’s easy to understand why we can end up with a stack overflow. The problem arises when the regex fails to match the String. Let’s benchmark a regex with technique we’ve seen previously, so we can understand the problem better. First let’s try a simple regex: >>> def catastrophic(n): print "Testing with %d characters" %n pat = re.compile('(a+)+c') text = "%s" %('a' * n) pat.search(text) As you can see the text we’re trying to match it’s always going to fail due to there is no c at the end. Let’s test it with different inputs. >>> for n in range(20, 30): test(catastrophic, n) Testing with 20 characters The function catastrophic lasted: 0.130457 Testing with 21 characters The function catastrophic lasted: 0.245125 …… The function catastrophic lasted: 14.828221 Testing with 28 characters The function catastrophic lasted: 29.830929 Testing with 29 characters The function catastrophic lasted: 61.110949 The behavior of this regex looks like quadratic. But why? what’s happening here? The problem is that (a+) starts greedy, so it tries to get as many a’s as possible and after that it fails to match the (a+)+, that is, it backtracks to the second a, and continue consuming a’s until it fails to match c, when it tries again (backtrack) the whole process starting with the second a. Let’s see another example, in this case with an exponential behavior: >>> def catastrophic(n): print "Testing with %d characters" %n pat = re.compile('(x+)+(b+)+c') text = 'x' * n text += 'b' * n pat.search(text) >>> for n in range(12, 18): test(catastrophic, n) Testing with 12 characters The function catastrophic lasted: 1.035162 Testing with 13 characters The function catastrophic lasted: 4.084714 Testing with 14 characters The function catastrophic lasted: 16.319145 Testing with 15 characters The function catastrophic lasted: 65.855182 Testing with 16 characters The function catastrophic lasted: 276.941307 As you can see the behavior is exponential, which can lead to a catastrophic scenarios. And finally let’s see what happen when regex has a match. >>> def non_catastrophic(n): print "Testing with %d characters" %n pat = re.compile('(x+)+(b+)+c') text = 'x' * n text += 'b' * n text += 'c' pat.search(text) >>> for n in range(12, 18): test(non_catastrophic, n) Testing with 10 characters The function catastrophic lasted: 0.000029 …… Testing with 19 characters The function catastrophic lasted: 0.000012 Optimization recommendations In the following sections we will find a number of recommendations that could be used to apply to improve regular expressions. The best tool will always be the common sense, and even following these recommendations common sense will need to be used. It has to be understood when the recommendation is applicable and when not. For instance the recommendation don’t be greedy cannot be used in the 100% of the cases. Reuse compiled patterns To use a regular expression we have to convert it from the string representation to a compiled form as RegexObject. This compilation takes some time. If instead of using the compile function we are using the rest of the methods to avoid the creation of the RegexObject, we should understand that the compilation is executed anyway and a number of compiled RegexObject are cached automatically. However, when we are compiling that cache won’t back us. Every single compile execution will consume an amount of time that perhaps could be negligible for a single execution, but it’s definitely relevant if many executions are performed. Extract common parts in alternation Alternation is always a performance risk point in regular expressions. When using them in Python, and therefore in a sort of NFA implementation, we should extract any common part outside of the alternation. For instance if we have /(Hello⇢World|Hello⇢Continent|Hello⇢Country,)/, we could easily extract Hello⇢ having the following expression /Hello⇢(World|Continent|Country)/. This will make our engine to just check Hello⇢ once, and not going back and recheck for each possibility. Shortcut the alternation Ordering in alternation is relevant; each of the different options present in the alternation will be checked one by one, from the left to the right. This can be used in favor of performance. If we place the more likely options at the beginning of the alternation, more checks will mark the alternation as matched sooner. For instance, we know that the more common colors of cars are white and black. If we are writing a regular expression accepting some colors, we should put white and black first, as those are the more likely to appear. This is: /(white|black|red|blue|green)/. For the rest of the elements, if they have the very same odds of appearing, if could be favorable to put the shortest ones before the longer ones. Use non capturing groups when appropriate Capturing groups will consume some time per each group defined in an expression. This time is not very important but is still relevant if we are executing a regular expression many times. Sometimes we are using groups but we might not be interested in the result. For instance when using alternation. If that is the case we can save some execution time to the engine by marking that group as non-capturing. This is: (?:person|company).. Be specific When the patterns we define are very specific, the engine can help us performing quick integrity checks before the actual pattern matching is executed. For instance, if we pass to the engine the expression /w{15}/ to be matched against the text hello, the engine could decide to check if the input string is actually at least 15 characters long instead of matching the expression. Don’t be greedy The quantifiers and we learnt the difference between greedy and reluctant quantifiers. We also found that the quantifiers are greedy by default. What does this mean to performance? It means that the engine will always try to catch as many characters as possible and then reducing the scope step by step until it's done. This could potentially make the regular expression slow if the match is typically short. Keep in mind, however, this is only applicable if the match is usually short. Summary In this article, we understood how to see the engine working behind the scenes. We learned some theory of the engine design and how it's easy to fall in a common pitfall—the catastrophic backtracking. Finally, we reviewed different general recommendations to improve the performance of our regular expressions. Resources for Article: Further resources on this subject: Python LDAP Applications: Part 1 - Installing and Configuring the Python-LDAP Library and Binding to an LDAP Directory [Article] Python Data Persistence using MySQL Part III: Building Python Data Structures Upon the Underlying Database Data [Article] Python Testing: Installing the Robot Framework [Article]
Read more
  • 0
  • 0
  • 7260
article-image-networking-qt
Packt
21 Sep 2015
21 min read
Save for later

Networking in Qt

Packt
21 Sep 2015
21 min read
In this article from the book Game Programming using Qt by authors Witold Wysota and Lorenz Haas, you will be taught how to communicate with the Internet servers and with sockets in general. First, we will have a look at QNetworkAccessManager, which makes sending network requests and receiving replies really easy. Building on this basic knowledge, we will then use Google's Distance API to get information about the distance between two locations and the time it would take to get from one location to the other. (For more resources related to this topic, see here.) QNetworkAccessManager The easiest way to access files on the Internet is to use Qt's Network Access API. This API is centered on QNetworkAccessManager, which handles the complete communication between your game and the Internet. When we develop and test a network-enabled application, it is recommended that you use a private, local network if feasible. This way, it is possible to debug both ends of the connection and the errors will not expose sensitive data. If you are not familiar with setting up a web server locally on your machine, there are luckily a number of all-in-one installers that are freely available. These will automatically configure Apache2, MySQL, PHP, and much more on your system. On Windows, for example, you could use XAMPP (http://www.apachefriends.org/en) or the Uniform Server (http://www.uniformserver.com); on Apple computers there is MAMP (http://www.mamp.info/en); and on Linux, you normally don't have to do anything since there is already localhost. If not, open your preferred package manager, search for a package called apache2 or similar, and install it. Alternatively, have a look at your distribution's documentation. Before you go and install Apache on your machine, think about using a virtual machine like VirtualBox (http://www.virtualbox.org) for this task. This way, you keep your machine clean and you can easily try different settings of your test server. With multiple virtual machines, you can even test the interaction between different instances of your game. If you are on UNIX, Docker (http://www.docker.com) might be worth to have a look at too. Downloading files over HTTP For downloading files over HTTP, first set up a local server and create a file called version.txt in the root directory of the installed server. The file should contain a small text like "I am a file on localhost" or something similar. To test whether the server and the file are correctly set up, start a web browser and open http://localhost/version.txt. You then should see the file's content. Of course, if you have access to a domain, you can also use that. Just alter the URL used in the example correspondingly. If you fail, it may be the case that your server does not allow to display text files. Instead of getting lost in the server's configuration, just rename the file to version .html. This should do the trick! Result of requesting http://localhost/version.txt on a browser As you might have guessed, because of the filename, the real-life scenario could be to check whether there is an updated version of your game or application on the server. To get the content of a file, only five lines of code are needed. Time for action – downloading a file First, create an instance of QNetworkAccessManager: QNetworkAccessManager *m_nam = new QNetworkAccessManager(this); Since QNetworkAccessManager inherits QObject, it takes a pointer to QObject, which is used as a parent. Thus, you do not have to take care of deleting the manager later on. Furthermore, one single instance of QNetworkAccessManager is enough for an entire application. So, either pass a pointer to the network access manager in your game around or, for ease of use, create a singleton pattern and access the manager through that. A singleton pattern ensures that a class is instantiated exactly once. The pattern is useful for accessing application-wide configurations or—in our case—an instance of QNetworkAccessManager. On the wiki pages for qtcentre.org and qt-project.org, you will find examples for different singleton patterns. A simple template-based approach would look like this (as a header file): template <class T> class Singleton { public: static T& Instance() { static T _instance; return _instance; } private: Singleton(); ~Singleton(); Singleton(const Singleton &); Singleton& operator=(const Singleton &); }; In the source code, you would include this header file and acquire a singleton of a class called MyClass with: MyClass *singleton = &Singleton<MyClass>::Instance(); If you are using Qt Quick, you can directly use the view instance of QNetworkAccessManager: QQuickView *view = new QQuickView; QNetworkAccessManager *m_nam = view->engine()->networkAccessManager(); Secondly, we connect the manager's finished() signal to a slot of our choice. For example, in our class, we have a slot called downloadFinished(): connect(m_nam, SIGNAL(finished(QNetworkReply*)), this, SLOT(downloadFinished(QNetworkReply*))); Then, it actually request's the version.txt file from localhost: m_nam->get(QNetworkRequest(QUrl("http://localhost/version.txt"))); With get(), a request to get the contents of the file, specified by the URL, is posted. The function expects QNetworkRequest, which defines all the information needed to send a request over the network. The main information of such a request is naturally the URL of the file. This is the reason why QNetworkRequest takes a QUrl as an argument in its constructor. You can also set the URL with setUrl() to a request. If you like to define some additional headers, you can either use setHeader() for the most common header or use setRawHeader() to be fully flexible. If you want to set, for example, a custom user agent to the request, the call would look like: QNetworkRequest request; request.setUrl(QUrl("http://localhost/version.txt")); request.setHeader(QNetworkRequest::UserAgentHeader, "MyGame"); m_nam->get(request); The setHeader() function takes two arguments, the first is a value of the enumeration QNetworkRequest::KnownHeaders, which holds the most common—self-explanatory—headers such as LastModifiedHeader or ContentTypeHeader, and the second is the actual value. You could also have written the header by using of setRawHeader(): request.setRawHeader("User-Agent", "MyGame"); When you use setRawHeader(), you have to write the header field names yourself. Beside that, it behaves like setHeader(). A list of all available headers for the HTTP protocol Version 1.1 can be found in section 14 at http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14. With the get() function we requested the version.txt file from localhost. All we have to do from now on is to wait for the server to reply. As soon as the server's reply is finished, the slot downloadFinished() will be called. That was defined by the previous connection statement. As an argument the reply of type QNetworkReply is transferred to the slot and we can read the reply's data and set it to m_edit, an instance of QPlainTextEdit, using the following code: void FileDownload::downloadFinished(QNetworkReply *reply) { const QByteArray content = reply->readAll(); m_edit->setPlainText(content); reply->deleteLater(); } Since QNetworkReply inherits QIODevice, there are also other possibilities to read the contents of the reply including QDataStream or QTextStream to either read and interpret binary data or textual data. Here, as fourth command, QIODevice::readAll() is used to get the complete content of the requested file in a QByteArray. The responsibility for the transferred pointer to the corresponding QNetworkReply lies with us, so we need to delete it at the end of the slot. This would be the fifth line of code needed to download a file with Qt. However, be careful and do not call delete on the reply directly. Always use deleteLater() as the documentation suggests! Have a go hero – extending the basic file downloader If you haven't set up a localhost, just alter the URL in the source code to download another file. Of course, having to alter the source code in order to download another file is far from an ideal approach. So try to extend the dialog, by adding a line edit where you can specify the URL you want to download. Also, you can offer a file dialog to choose the location to where the downloaded file should be saved. Error handling If you do not see the content of the file, something went wrong. Just as in real life, this can always happen so we better make sure, that there is good error handling in such cases to inform the user what is going on. Time for action – displaying a proper error message Fortunately QNetworkReply offers several possibilities to do this. In the slot called downloadFinished() we first want to check if an error occurred: if (reply->error() != QNetworkReply::NoError) {/* error occurred */} The function QNetworkReply::error() returns the error that occurred while handling the request. The error is encoded as a value of type QNetworkReply::NetworkError. The two most common errors are probably these: Error code Meaning ContentNotFoundError This error indicates that the URL of the request could not be found. It is similar to the HTTP error code 404. ContentAccessDenied This error indicates that you do not have the permission to access the requested file. It is similar to the HTTP error 401. You can look up the other 23 error codes in the documentation. But normally you do not need to know exactly what went wrong. You only need to know if everything worked out—QNetworkReply::NoError would be the return value in this case—or if something went wrong. Since QNetworkReply::NoError has the value 0, you can shorten the test phrase to check if an error occurred to: if (reply->error()) { // an error occurred } To provide the user with a meaningful error description you can use QIODevice::errorString(). The text is already set up with the corresponding error message and we only have to display it: if (reply->error()) { const QString error = reply->errorString(); m_edit->setPlainText(error); return; } In our example, assuming we had an error in the URL and wrote versions.txt by mistake, the application would look like this: If the request was a HTTP request and the status code is of interest, it could be retrieved by QNetworkReply::attribute(): reply->attribute(QNetworkRequest::HttpStatusCodeAttribute) Since it returns QVariant, you can either use QVariant::toInt() to get the code as an integer or QVariant::toString() to get the number as a QString. Beside the HTTP status code you can query through attribute() a lot of other information. Have a look at the description of the enumeration QNetworkRequest::Attribute in the documentation. There you also will find QNetworkRequest::HttpReasonPhraseAttribute which holds a human readable reason phrase of the HTTP status code. For example "Not Found" if an HTTP error 404 occurred. The value of this attribute is used to set the error text for QIODevice::errorString(). So you can either use the default error description provided by errorString() or compose your own by interpreting the reply's attributes. If a download failed and you want to resume it or if you only want to download a specific part of a file, you can use the range header: QNetworkRequest req(QUrl("...")); req.setRawHeader("Range", "bytes=300-500"); QNetworkReply *reply = m_nam->get(req); In this example only the bytes 300 to 500 would be downloaded. However, the server must support this. Downloading files over FTP As simple as it is to download files over HTTP, as simple it is to download a file over FTP. If it is an anonymous FTP server for which you do not need an authentication, just use the URL like we did earlier. Assuming there is again a file called version.txt on the FTP server on localhost, type: m_nam->get(QNetworkRequest(QUrl("ftp://localhost/version.txt"))); That is all, everything else stays the same. If the FTP server requires an authentication you'll get an error, for example: Setting the user name and the user password to access an FTP server is likewise easy. Either write it in the URL or use QUrl functions setUserName() and setPassword(). If the server does not use a standard port, you can set the port explicitly with QUrl::setPort(). To upload a file to a FTP server use QNetworkAccessManager::put() which takes as first argument a QNetworkRequest, calling a URL that defines the name of the new file on the server, and as second argument the actual data, that should be uploaded. For small uploads, you can pass the content as a QByteArray. For larger contents, better use a pointer to a QIODevice. Make sure the device is open and stays available until the upload is done. Downloading files in parallel A very important note on QNetworkAccessManager: it works asynchronously. This means you can post a network request without blocking the main event loop and this is what keeps the GUI responsive. If you post more than one request, they are put on the manager's queue. Depending on the protocol used they get processed in parallel. If you are sending HTTP requests, normally up to six requests will be handled at a time. This will not block the application. Therefore, there is really no need to encapsulate QNetworkAccessManager in a thread, unfortunately, this unnecessary approach is frequently recommended all over the Internet. QNetworkAccessManager already threads internally. Really, don't move QNetworkAccessManager to a thread—unless you know exactly what you are doing. If you send multiple requests, the slot connected to the manager's finished() signal is called in an arbitrary order depending on how quickly a request gets a reply from the server. This is why you need to know to which request a reply belongs. This is one reason why every QNetworkReply carries its related QNetworkRequest. It can be accessed through QNetworkReply::request(). Even if the determination of the replies and their purpose may work for a small application in a single slot, it will quickly get large and confusing if you send a lot of requests. This problem is aggravated by the fact that all replies are delivered to only one slot. Since most probably there are different types of replies that need different treatments, it would be better to bundle them to specific slots, specialized for a special task. Fortunately this can be achieved very easily. QNetworkAccessManager::get() returns a pointer to the QNetworkReply which will get all information about the request you post with get(). By using this pointer, you can then connect specific slots to the reply's signals. For example if you have several URLs and you want to save all linked images from these sites to the hard drive, then you would request all web pages via QNetworkAccessManager::get() and connect their replies to a slot specialized for parsing the received HTML. If links to images are found, this slot would request them again with get(). However, this time the replies to these requests would be connected to a second slot, which is designed for saving the images to the disk. Thus you can separate the two tasks, parsing HTML and saving data to a local drive. The most important signals of QNetworkReply are. The finished signal The finished() signal is equivalent with the QNetworkAccessManager::finished() signal we used earlier. It is triggered as soon as a reply has been returned—successfully or not. After this signal has been emitted, neither the reply's data nor its metadata will be altered anymore. With this signal you are now able to connect a reply to a specific slot. This way you can realize the scenario outlined previously. However, one problem remains: if you post simultaneous requests, you do not know which one has finished and thus called the connected slot. Unlike QNetworkAccessManager::finished(), QNetworkReply::finished() does not pass a pointer to QNetworkReply; this would actually be a pointer to itself in this case. A quick solution to solve this problem is to use sender(). It returns a pointer to the QObject instance that has called the slot. Since we know that it was a QNetworkReply, we can write: QNetworkReply *reply = qobject_cast<QNetworkReply*>(sender()); if (!reply) return; This was done by casting sender() to a pointer of type QNetworkReply. Whenever casting classes that inherit QObject, use qobject_cast. Unlike dynamic_cast it does not use RTTI and works across dynamic library boundaries. Although we can be pretty confident the cast will work, do not forget to check if the pointer is valid. If it is a null pointer, exit the slot. Time for action – writing OOP conform code by using QSignalMapper A more elegant way that does not rely on sender(), would be to use QSignalMapper and a local hash, in which all replies that are connected to that slot are stored. So whenever you call QNetworkAccessManager::get() store the returned pointer in a member variable of type QHash<int, QNetworkReply*> and set up the mapper. Let's assume that we have following member variables and that they are set up properly: QNetworkAccessManager *m_nam; QSignalMapper *m_mapper; QHash<int, QNetworkReply*> m_replies; Then you would connect the finished() signal of a reply this way: QNetworkReply *reply = m_nam->get(QNetworkRequest(QUrl(/*...*/))); connect(reply, SIGNAL(finished()), m_mapper, SLOT(map())); int id = /* unique id, not already used in m_replies*/; m_replies.insert(id, reply); m_mapper->setMapping(reply, id); What just happened? First we post the request and fetch the pointer to the QNetworkReply with reply. Then we connect the reply's finished signal to the mapper's slot map(). Next we have to find a unique ID which must not already be in use in the m_replies variable. One could use random numbers generated with qrand() and fetch numbers as long as they are not unique. To determine if a key is already in use, call QHash::contains(). It takes the key as an argument against which it should be checked. Or even simpler: count up another private member variable. Once we have a unique ID we insert the pointer to QNetworkReply in the hash using the ID as a key. Last, with setMapping(), we set up the mapper's mapping: the ID's value corresponds to the actual reply. At a prominent place, most likely the constructor of the class, we already have connected the mappers map() signal to a custom slot. For example: connect(m_mapper, SIGNAL(mapped(int)), this, SLOT(downloadFinished(int))); When the slot downloadFinished() is called, we can get the corresponding reply with: void SomeClass::downloadFinished(int id) { QNetworkReply *reply = m_replies.take(id); // do some stuff with reply here reply->deleteLater(); } QSignalMapper also allows to map with QString as an identifier instead of an integer as used above. So you could rewrite the example and use the URL to identify the corresponding QNetworkReply; at least as long as the URLs are unique. The error signal If you download files sequentially, you can swap the error handling out. Instead of dealing with errors in the slot connected to the finished() signal, you can use the reply's signal error() which passes the error of type QNetworkReply::NetworkError to the slot. After the error() signal has been emitted, the finished() signal will most likely also be emitted shortly. The readyRead signal Until now, we used the slot connected to the finished() signal to get the reply's content. That works perfectly if you deal with small files. However, this approach is unsuitable when dealing with large files since they would unnecessarily bind too many resources. For larger files it is better to read and save transferred data as soon as it is available. We get informed by QIODevice::readyRead() whenever new data is available to be read. So for large files you should type in the following: connect(reply, SIGNAL(readyRead()), this, SLOT(readContent())); file.open(QIODevice::WriteOnly); This will help you connect the reply's signal readyRead() to a slot, set up QFile and open it. In the connected slot, type in the following snippet: const QByteArray ba = reply->readAll(); file.write(ba); file.flush(); Now you can fetch the content, which was transferred so far, and save it to the (already opened) file. This way the needed resources are minimized. Don't forget to close the file after the finished() signal was emitted. In this context it would be helpful if one could know upfront the size of the file one wants to download. Therefore, we can use QNetworkAccessManager::head(). It behaves like the get() function, but does not transfer the content of the file. Only the headers are transferred. And if we are lucky, the server sends the "Content-Length" header, which holds the file size in bytes. To get that information we type: reply->head(QNetworkRequest::ContentLengthHeader).toInt(); With this information, we could also check upfront if there is enough space left on the disk. The downloadProgress method Especially when a big file is being downloaded, the user usually wants to know how much data has already been downloaded and how long it will approximately take for the download to finish. Time for action – showing the download progress In order to achieve this we can use the reply's downloadProgress() signal. As a first argument it passes the information on how many bytes have already been received and as a second argument how many there are in total. This gives us the possibility to indicate the progress of the download with QProgressBar. As the passed arguments are of type qint64 we can't use them directly with QProgressBar since it only accepts int. So in the connected slot we first calculate the percentage of the download progress: void SomeClass::downloadProgress(qint64 bytesReceived, qint64 bytesTotal) { qreal progress = (bytesTotal < 1) ? 1.0 : bytesReceived * 100.0 / bytesTotal; progressBar->setValue(progress * progressBar->maximum()); } What just happened? With the percentage we set the new value for the progress bar where progressBar is the pointer to this bar. However, what value will progressBar->maximum() have and where do we set the range for the progress bar? What is nice is that you do not have to set it for every new download. It is only done once, for example in the constructor of the class containing the bar. As range values I would recommend: progressBar->setRange(0, 2048); The reason is that if you take for example a range of 0 to 100 and the progress bar is 500 pixels wide, the bar would jump 5 pixels forward for every value change. This will look ugly. To get a smooth progression where the bar expands by 1 pixel at a time, a range of 0 to 99.999.999 would surely work but would be highly inefficient. This is because the current value of the bar would change a lot without any graphical depiction. So the best value for the range would be 0 to the actual bar's width in pixel. Unfortunately, the width of the bar can change depending on the actual widget width and frequently querying the actual size of the bar every time the value change is also not a good solution. Why 2048, then? The idea behind this value is the resolution of the screen. Full HD monitors normally have a width of 1920 pixels, thus taking 2^11, aka 2048, ensure that a progress bar runs smoothly, even if it is fully expanded. So 2048 isn't the perfect number but a fairly good compromise. If you are targeting smaller devices, choose a smaller, more appropriate number. To be able to calculate the remaining time for the download to finish you have to start a timer. In this case use QElapsedTimer. After posting the request with QNetworkAccessManager::get() start the timer by calling QElapsedTimer::start(). Assuming the timer is called m_timer, the calculation would be: qint64 total = m_timer.elapsed() / progress; qint64 remaining = (total – m_timer.elapsed()) / 1000; QElapsedTimer::elapsed() returns the milliseconds counting from the moment when the timer was started. This value divided by the progress equals the estimated total download time. If you subtract the elapsed time and divide the result by 1000, you'll get the remaining time in seconds. Using a proxy If you like to use a proxy you first have to set up a QNetworkProxy. You have to define the type of the proxy with setType(). As arguments you most likely want to pass QNetworkProxy::Socks5Proxy or QNetworkProxy::HttpProxy. Then set up the host name with setHostName(), the user name with setUserName() and the password with setPassword(). The last two properties are, of course, only needed if the proxy requires an authentication. Once the proxy is set up you can set it to the access manager via QNetworkAccessManager::setProxy(). Now, all new requests will use that proxy. Summary In this article you familiarized yourself with QNetworkAccessManager. This class is at the heart of your code whenever you want to download or upload files to the Internet. After having gone through the different signals that you can use to fetch errors, to get notified about new data or to show the progress, you should now know everything you need on that topic. Resources for Article: Further resources on this subject: GUI Components in Qt 5[article] Code interlude – signals and slots [article] Configuring Your Operating System [article]
Read more
  • 0
  • 0
  • 7256

article-image-basic-image-processing
Packt
02 Jun 2015
8 min read
Save for later

Basic Image Processing

Packt
02 Jun 2015
8 min read
In this article, Ashwin Pajankar, the author of the book, Raspberry PI Computer Vision Programming, takes us through basic image processing in OpenCV. We will do this with the help of the following topics: Image arithmetic operations—adding, subtracting, and blending images Splitting color channels in an image Negating an image Performing logical operations on an image This article is very short and easy to code with plenty of hands-on activities. (For more resources related to this topic, see here.) Arithmetic operations on images In this section, we will have a look at the various arithmetic operations that can be performed on images. Images are represented as matrices in OpenCV. So, arithmetic operations on images are similar to the arithmetic operations on matrices. Images must be of the same size for you to perform arithmetic operations on the images, and these operations are performed on individual pixels. cv2.add(): This function is used to add two images, where the images are passed as parameters. cv2.subtract(): This function is used to subtract an image from another. We know that the subtraction operation is not commutative. So, cv2.subtract(img1,img2) and cv2.(img2,img1) will yield different results, whereas cv2.add(img1,img2) and cv2.add(img2,img1) will yield the same result as the addition operation is commutative. Both the images have to be of same size and type, as explained before. Check out the following code: import cv2 img1 = cv2.imread('/home/pi/book/test_set/4.2.03.tiff',1) img2 = cv2.imread('/home/pi/book/test_set/4.2.04.tiff',1) cv2.imshow('Image1',img1) cv2.waitKey(0) cv2.imshow('Image2',img2) cv2.waitKey(0) cv2.imshow('Addition',cv2.add(img1,img2)) cv2.waitKey(0) cv2.imshow('Image1-Image2',cv2.subtract(img1,img2)) cv2.waitKey(0) cv2.imshow('Image2-Image1',cv2.subtract(img2,img1)) cv2.waitKey(0) cv2.destroyAllWindows() The preceding code demonstrates the usage of arithmetic functions on images. Here's the output window of Image1: Here is the output window of Addition: The output window of Image1-Image2 looks like this: Here is the output window of Image2-Image1: Blending and transitioning images The cv2.addWeighted() function calculates the weighted sum of two images. Because of the weight factor, it provides a blending effect to the images. Add the following lines of code before destroyAllWindows() in the previous code listing to see this function in action: cv2.addWeighted(img1,0.5,img2,0.5,0) cv2.waitKey(0) In the preceding code, we passed the following five arguments to the addWeighted() function: Img1: This is the first image. Alpha: This is the weight factor for the first image (0.5 in the example). Img2: This is the second image. Beta: This is the weight factor for the second image (0.5 in the example). Gamma: This is the scalar value (0 in the example). The output image value is calculated with the following formula: This operation is performed on every individual pixel. Here is the output of the preceding code: We can create a film-style transition effect on the two images by using the same function. Check out the output of the following code that creates a smooth image transition from an image to another image: import cv2 import numpy as np import time   img1 = cv2.imread('/home/pi/book/test_set/4.2.03.tiff',1) img2 = cv2.imread('/home/pi/book/test_set/4.2.04.tiff',1)   for i in np.linspace(0,1,40): alpha=i beta=1-alpha print 'ALPHA ='+ str(alpha)+' BETA ='+str (beta) cv2.imshow('Image Transition',    cv2.addWeighted(img1,alpha,img2,beta,0)) time.sleep(0.05) if cv2.waitKey(1) == 27 :    break   cv2.destroyAllWindows() Splitting and merging image colour channels On several occasions, we may be interested in working separately with the red, green, and blue channels. For example, we might want to build a histogram for every channel of an image. Here, cv2.split() is used to split an image into three different intensity arrays for each color channel, whereas cv2.merge() is used to merge different arrays into a single multi-channel array, that is, a color image. The following example demonstrates this: import cv2 img = cv2.imread('/home/pi/book/test_set/4.2.03.tiff',1) b,g,r = cv2.split (img) cv2.imshow('Blue Channel',b) cv2.imshow('Green Channel',g) cv2.imshow('Red Channel',r) img=cv2.merge((b,g,r)) cv2.imshow('Merged Output',img) cv2.waitKey(0) cv2.destroyAllWindows() The preceding program first splits the image into three channels (blue, green, and red) and then displays each one of them. The separate channels will only hold the intensity values of the particular color and the images will essentially be displayed as grayscale intensity images. Then, the program merges all the channels back into an image and displays it. Creating a negative of an image In mathematical terms, the negative of an image is the inversion of colors. For a grayscale image, it is even simpler! The negative of a grayscale image is just the intensity inversion, which can be achieved by finding the complement of the intensity from 255. A pixel value ranges from 0 to 255, and therefore, negation involves the subtracting of the pixel value from the maximum value, that is, 255. The code for the same is as follows: import cv2 img = cv2.imread('/home/pi/book/test_set/4.2.07.tiff') grayscale = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) negative = abs(255-grayscale) cv2.imshow('Original',img) cv2.imshow('Grayscale',grayscale) cv2.imshow('Negative',negative) cv2.waitKey(0) cv2.destroyAllWindows() Here is the output window of Greyscale: Here's the output window of Negative: The negative of a negative will be the original grayscale image. Try this on your own by taking the image negative of the negative again Logical operations on images OpenCV provides bitwise logical operation functions for images. We will have a look at the functions that provide the bitwise logical AND, OR, XOR (exclusive OR), and NOT (inversion) functionality. These functions can be better demonstrated visually with grayscale images. I am going to use barcode images in horizontal and vertical orientation for demonstration. Let's have a look at the following code: import cv2 import matplotlib.pyplot as plt   img1 = cv2.imread('/home/pi/book/test_set/Barcode_Hor.png',0) img2 = cv2.imread('/home/pi/book/test_set/Barcode_Ver.png',0) not_out=cv2.bitwise_not(img1) and_out=cv2.bitwise_and(img1,img2) or_out=cv2.bitwise_or(img1,img2) xor_out=cv2.bitwise_xor(img1,img2)   titles = ['Image 1','Image 2','Image 1 NOT','AND','OR','XOR'] images = [img1,img2,not_out,and_out,or_out,xor_out]   for i in xrange(6):    plt.subplot(2,3,i+1)    plt.imshow(images[i],cmap='gray')    plt.title(titles[i])    plt.xticks([]),plt.yticks([]) plt.show() We first read the images in grayscale mode and calculated the NOT, AND, OR, and XOR, functionalities and then with matplotlib, we displayed those in a neat way. We leveraged the plt.subplot() function to display multiple images. Here in the preceding example, we created a grid with two rows and three columns for our images and displayed each image in every part of the grid. You can modify this line and change it to plt.subplot(3,2,i+1) to create a grid with three rows and two columns. Also, we can use the technique without a loop in the following way. For each image, you have to write the following statements. I will write the code for the first image only. Go ahead and write it for the rest of the five images: plt.subplot(2,3,1) , plt.imshow(img1,cmap='gray') , plt.title('Image 1') , plt.xticks([]),plt.yticks([]) Finally, use plt.show() to display. This technique is to avoid the loop when a very small number of images, usually 2 or 3 in number, have to be displayed. The output of this is as follows: Make a note of the fact that the logical NOT operation is the negative of the image. Exercise You may want to have a look at the functionality of cv2.copyMakeBorder(). This function is used to create the borders and paddings for images, and many of you will find it useful for your projects. The exploring of this function is left as an exercise for the readers. You can check the python OpenCV API documentation at the following location: http://docs.opencv.org/modules/refman.html Summary In this article, we learned how to perform arithmetic and logical operations on images and split images by their channels. We also learned how to display multiple images in a grid by using matplotlib. Resources for Article: Further resources on this subject: Raspberry Pi and 1-Wire [article] Raspberry Pi Gaming Operating Systems [article] The Raspberry Pi and Raspbian [article]
Read more
  • 0
  • 0
  • 7245