How-To Tutorials

article-image-f-for-net-core-application-development-tutorial

16 Aug 2018

17 min read

Getting started with F# for .Net Core application development [Tutorial]

16 Aug 2018

F# is Microsoft's purely functional programming language, that can be used along with the .NET Core framework. In this article, we will get introduced to F# to leverage .NET Core for our application development. This article is extracted from the book, .NET Core 2.0 By Example, written by Rishabh Verma and Neha Shrivastava. Basics of classes Classes are types of object which can contain functions, properties, and events. An F# class must have a parameter and a function attached like a member. Both properties and functions can use the member keyword. The following is the class definition syntax: type [access-modifier] type-name [type-params] [access-modifier] (parameter-list) [ as identifier ] = [ class ] [ inherit base-type-name(base-constructor-args) ] [ let-bindings ] [ do-bindings ] member-list [ end ] // Mutually recursive class definitions: type [access-modifier] type-name1 ... and [access-modifier] type-name2 ... Let’s discuss the preceding syntax for class declaration: type: In the F# language, class definition starts with a type keyword. access-modifier: The F# language supports three access modifiers—public, private, and internal. By default, it considers the public modifier if no other access modifier is provided. The Protected keyword is not used in the F# language, and the reason is that it will become object oriented rather than functional programming. For example, F# usually calls a member using a lambda expression and if we make a member type protected and call an object of a different instance, it will not work. type-name: It is any of the previously mentioned valid identifiers; the default access modifier is public. type-params: It defines optional generic type parameters. parameter-list: It defines constructor parameters; the default access modifier for the primary constructor is public. identifier: It is used with the optional as keyword, the as keyword gives a name to an instance variable which can be used in the type definition to refer to the instance of the type. Inherit: This keyword allows us to specify the base class for a class. let-bindings: This is used to declare fields or function values in the context of a class. do-bindings: This is useful for the execution of code to create an object member-list: The member-list comprises extra constructors, instance and static method declarations, abstract bindings, interface declarations, and event and property declarations. Here is an example of a class: type StudentName(firstName,lastName) = member this.FirstName = firstName member this.LastName = lastName In the previous example, we have not defined the parameter type. By default, the program considers it as a string value but we can explicitly define a data type, as follows: type StudentName(firstName:string,lastName:string) = member this.FirstName = firstName member this.LastName = lastName Constructor of a class In F#, the constructor works in a different way to any other .NET language. The constructor creates an instance of a class. A parameter list defines the arguments of the primary constructor and class. The constructor contains let and do bindings, which we will discuss next. We can add multiple constructors, apart from the primary constructor, using the new keyword and it must invoke the primary constructor, which is defined with the class declaration. The syntax of defining a new constructor is as shown: new (argument-list) = constructor-body Here is an example to explain the concept. In the following code, the StudentDetail class has two constructors: a primary constructor that takes two arguments and another constructor that takes no arguments: type StudentDetail(x: int, y: int) = do printfn "%d %d" x y new() = StudentDetail(0, 0) A let and do binding A let and do binding creates the primary constructor of a class and runs when an instance of a class is created. A function is compiled into a member if it has a let binding. If the let binding is a value which is not used in any function or member, then it is compiled into a local variable of a constructor; otherwise, it is compiled into a field of the class. The do expression executes the initialized code. As any extra constructors always call the primary constructor, let and do bindings always execute, irrespective of which constructor is called. Fields that are created by let bindings can be accessed through the methods and properties of the class, though they cannot be accessed from static methods, even if the static methods take an instance variable as a parameter: type Student(name) as self = let data = name do self.PrintMessage() member this.PrintMessage() = printf " Student name is %s" data Generic type parameters F# also supports a generic parameter type. We can specify multiple generic type parameters separated by a comma. The syntax of a generic parameter declaration is as follows: type MyGenericClassExample<'a> (x: 'a) = do printfn "%A" x The type of the parameter infers where it is used. In the following code, we call the MyGenericClassExample method and pass a sequence of tuples, so here the parameter type became a sequence of tuples: let g1 = MyGenericClassExample( seq { for i in 1 .. 10 -> (i, i*i) } ) Properties Values related to an object are represented by properties. In object-oriented programming, properties represent data associated with an instance of an object. The following snippet shows two types of property syntax: // Property that has both get and set defined. [ attributes ] [ static ] member [accessibility-modifier] [self- identifier.]PropertyName with [accessibility-modifier] get() = get-function-body and [accessibility-modifier] set parameter = set-function-body // Alternative syntax for a property that has get and set. [ attributes-for-get ] [ static ] member [accessibility-modifier-for-get] [self-identifier.]PropertyName = get-function-body [ attributes-for-set ] [ static ] member [accessibility-modifier-for-set] [self- identifier.]PropertyName with set parameter = set-function-body There are two kinds of property declaration: Explicitly specify the value: We should use the explicit way to implement the property if it has non-trivial implementation. We should use a member keyword for the explicit property declaration. Automatically generate the value: We should use this when the property is just a simple wrapper for a value. There are many ways of implementing an explicit property syntax based on need: Read-only: Only the get() method Write-only: Only the set() method Read/write: Both get() and set() methods An example is shown as follows: // A read-only property. member this.MyReadOnlyProperty = myInternalValue // A write-only property. member this.MyWriteOnlyProperty with set (value) = myInternalValue <- value // A read-write property. member this.MyReadWriteProperty with get () = myInternalValue and set (value) = myInternalValue <- value Backing stores are private values that contain data for properties. The keyword, member val instructs the compiler to create backing stores automatically and then gives an expression to initialize the property. The F# language supports immutable types, but if we want to make a property mutable, we should use get and set. As shown in the following example, the MyClassExample class has two properties: propExample1 is read-only and is initialized to the argument provided to the primary constructor, and propExample2 is a settable property initialized with a string value ".Net Core 2.0": type MyClassExample(propExample1 : int) = member val propExample1 = property1 member val propExample2 = ".Net Core 2.0" with get, set Automatically implemented properties don't work efficiently with some libraries, for example, Entity Framework. In these cases, we should use explicit properties. Static and instance properties There can be further categorization of properties as static or instance properties. Static, as the name suggests, can be invoked without any instance. The self-identifier is neglected by the static property while it is necessary for the instance property. The following is an example of the static property: static member MyStaticProperty with get() = myStaticValue and set(value) = myStaticValue <- value Abstract properties Abstract properties have no implementation and are fully abstract. They can be virtual. It should not be private and if one accessor is abstract all others must be abstract. The following is an example of the abstract property and how to use it: // Abstract property in abstract class. // The property is an int type that has a get and // set method [<AbstractClass>] type AbstractBase() = abstract Property1 : int with get, set // Implementation of the abstract property type Derived1() = inherit AbstractBase() let mutable value = 10 override this.Property1 with get() = value and set(v : int) = value <- v // A type with a "virtual" property. type Base1() = let mutable value = 10 abstract Property1 : int with get, set default this.Property1 with get() = value and set(v : int) = value <- v // A derived type that overrides the virtual property type Derived2() = inherit Base1() let mutable value2 = 11 override this.Property1 with get() = value2 and set(v) = value2 <- v Inheritance and casts In F#, the inherit keyword is used while declaring a class. The following is the syntax: type MyDerived(...) = inherit MyBase(...) In a derived class, we can access all methods and members of the base class, but it should not be a private member. To refer to base class instances in the F# language, the base keyword is used. Virtual methods and overrides In F#, the abstract keyword is used to declare a virtual member. So, here we can write a complete definition of the member as we use abstract for virtual. F# is not similar to other .NET languages. Let's have a look at the following example: type MyClassExampleBase() = let mutable x = 0 abstract member virtualMethodExample : int -> int default u. virtualMethodExample (a : int) = x <- x + a; x type MyClassExampleDerived() = inherit MyClassExampleBase () override u. virtualMethodExample (a: int) = a + 1 In the previous example, we declared a virtual method, virtualMethodExample, in a base class, MyClassExampleBase, and overrode it in a derived class, MyClassExampleDerived. Constructors and inheritance An inherited class constructor must be called in a derived class. If a base class constructor contains some arguments, then it takes parameters of the derived class as input. In the following example, we will see how derived class arguments are passed in the base class constructor with inheritance: type MyClassBase2(x: int) = let mutable z = x * x do for i in 1..z do printf "%d " i type MyClassDerived2(y: int) = inherit MyClassBase2(y * 2) do for i in 1..y do printf "%d " i If a class has multiple constructors, such as new(str) or new(), and this class is inherited in a derived class, we can use a base class constructor to assign values. For example, DerivedClass, which inherits BaseClass, has new(str1,str2), and in place of the first string, we pass inherit BaseClass(str1). Similarly for blank, we wrote inherit BaseClass(). Let's explore the following example for more detail: type BaseClass = val string1 : string new (str) = { string1 = str } new () = { string1 = "" } type DerivedClass = inherit BaseClass val string2 : string new (str1, str2) = { inherit BaseClass(str1); string2 = str2 } new (str2) = { inherit BaseClass(); string2 = str2 } let obj1 = DerivedClass("A", "B") let obj2 = DerivedClass("A") Functions and lambda expressions A lambda expression is one kind of anonymous function, which means it doesn't have a name attached to it. But if we want to create a function which can be called, we can use the fun keyword with a lambda expression. We can pass the input parameter in the lambda function, which is created using the fun keyword. This function is quite similar to a normal F# function. Let's see a normal F# function and a lambda function: // Normal F# function let addNumbers a b = a+b // Evaluating values let sumResult = addNumbers 5 6 // Lambda function and evaluating values let sumResult = (fun (a:int) (b:int) -> a+b) 5 6 // Both the function will return value sumResult = 11 Handling data – tuples, lists, record types, and data manipulation F# supports many data types, for example: Primitive types: bool, int, float, string values. Aggregate type: class, struct, union, record, and enum Array: int[], int[ , ], and float[ , , ] Tuple: type1 * type2 * like (a,1,2,true) type is—char * int * int * bool Generic: list<’x>, dictionary < ’key, ’value> In an F# function, we can pass one tuple instead of multiple parameters of different types. Declaration of a tuple is very simple and we can assign values of a tuple to different variables, for example: let tuple1 = 1,2,3 // assigning values to variables , v1=1, v2= 2, v3=3 let v1,v2,v3 = tuple1 // if we want to assign only two values out of three, use “_” to skip the value. Assigned values: v1=1, //v3=3 let v1,_,v3 = tuple In the preceding examples, we saw that tuple supports pattern matching. These are option types and an option type in F# supports the idea that the value may or not be present at runtime. List List is a generic type implementation. An F# list is similar to a linked list implementation in any other functional language. It has a special opening and closing bracket construct, a short form of the standard empty list ([ ]) syntax: let empty = [] // This is an empty list of untyped type or we can say //generic type. Here type is: 'a list let intList = [10;20;30;40] // this is an integer type list The cons operator is used to prepend an item to a list using a double colon cons(prepend,::). To append another list to one list, we use the append operator—@: // prepend item x into a list let addItem xs x = x :: xs let newIntList = addItem intList 50 // add item 50 in above list //“intlist”, final result would be- [50;10;20;30;40] // using @ to append two list printfn "%A" (["hi"; "team"] @ ["how";"are";"you"]) // result – ["hi"; "team"; "how";"are";"you"] Lists are decomposable using pattern matching into a head and a tail part, where the head is the first item in the list and the tail part is the remaining list, for example: printfn "%A" newIntList.Head printfn "%A" newIntList.Tail printfn "%A" newIntList.Tail.Tail.Head let rec listLength (l: 'a list) = if l.IsEmpty then 0 else 1 + (listLength l.Tail) printfn "%d" (listLength newIntList) Record type The class, struct, union, record, and enum types come under aggregate types. The record type is one of them, it can have n number of members of any individual type. Record type members are by default immutable but we can make them mutable. In general, a record type uses the members as an immutable data type. There is no way to execute logic during instantiation as a record type don't have constructors. A record type also supports match expression, depending on the values inside those records, and they can also again decompose those values for individual handling, for example: type Box = {width: float ; height:int } let giftbox = {width = 6.2 ; height = 3 } In the previous example, we declared a Box with float a value width and an integer height. When we declare giftbox, the compiler automatically detects its type as Box by matching the value types. We can also specify type like this: let giftbox = {Box.width = 6.2 ; Box.height = 3 } or let giftbox : Box = {width = 6.2 ; height = 3 } This kind of type declaration is used when we have the same type of fields or field type declared in more than one type. This declaration is called a record expression. Object-oriented programming in F# F# also supports implementation inheritance, the creation of object, and interface instances. In F#, constructed types are fully compatible .NET classes which support one or more constructors. We can implement a do block with code logic, which can run at the time of class instance creation. The constructed type supports inheritance for class hierarchy creation. We use the inherit keyword to inherit a class. If the member doesn't have implementation, we can use the abstract keyword for declaration. We need to use the abstractClass attribute on the class to inform the compiler that it is abstract. If the abstractClass attribute is not used and type has all abstract members, the F# compiler automatically creates an interface type. Interface is automatically inferred by the compiler as shown in the following screenshot: The override keyword is used to override the base class implementation; to use the base class implementation of the same member, we use the base keyword. In F#, interfaces can be inherited from another interface. In a class, if we use the construct interface, we have to implement all the members in the interface in that class, as well. In general, it is not possible to use interface members from outside the class instance, unless we upcast the instance type to the required interface type. To create an instance of a class or interface, the object expression syntax is used. We need to override virtual members if we are creating a class instance and need member implementation for interface instantiation: type IExampleInterface = abstract member IntValue: int with get abstract member HelloString: unit -> string type PrintValues() = interface IExampleInterface with member x.IntValue = 15 member x.HelloString() = sprintf "Hello friends %d" (x :> IExampleInterface).IntValue let example = let varValue = PrintValues() :> IExampleInterface { new IExampleInterface with member x.IntValue = varValue.IntValue member x.HelloString() = sprintf "<b>%s</b>" (varValue.HelloString()) } printfn "%A" (example.HelloString()) Exception handling The exception keyword is used to create a custom exception in F#; these exceptions adhere to Microsoft best practices, such as constructors supplied, serialization support, and so on. The keyword raise is used to throw an exception. Apart from this, F# has some helper functions, such as failwith, which throws a failure exception at F# runtime, and invalidop, invalidarg, which throw the .NET Framework standard type invalid operation and invalid argument exception, respectively. try/with is used to catch an exception; if an exception occurred on an expression or while evaluating a value, then the try/with expression could be used on the right side of the value evaluation and to assign the value back to some other value. try/with also supports pattern matching to check an individual exception type and extract an item from it. try/finally expression handling depends on the actual code block. Let's take an example of declaring and using a custom exception: exception MyCustomExceptionExample of int * string raise (MyCustomExceptionExample(10, "Error!")) In the previous example, we created a custom exception called MyCustomExceptionExample, using the exception keyword, passing value fields which we want to pass. Then we used the raise keyword to raise exception passing values, which we want to display while running the application or throwing the exception. However, as shown here, while running this code, we don't get our custom message in the error value and the standard exception message is displayed: We can see in the previous screenshot that the exception message doesn't contain the message that we passed. In order to display our custom error message, we need to override the standard message property on the exception type. We will use pattern matching assignment to get two values and up-cast the actual type, due to the internal representation of the exception object. If we run this program again, we will get the custom message in the exception: exception MyCustomExceptionExample of int * string with override x.Message = let (MyCustomExceptionExample(i, s)) = upcast x sprintf "Int: %d Str: %s" i s raise (MyCustomExceptionExample(20, "MyCustomErrorMessage!")) Now, we will get the following error message: In the previous screenshot, we can see our custom message with integer and string values included in the output. We can also use the helper function, failwith, to raise a failure exception, as it includes our message as an error message, as follows: failwith "An error has occurred" The preceding error message can be seen in the following screenshot: Here is a detailed exception screenshot: An example of the invalidarg helper function follows. In this factorial function, we are checking that the value of x is greater than zero. For cases where x is less than 0, we call invalidarg, pass x as the parameter name that is invalid, and then some error message saying the value should be greater than 0. The invalidarg helper function throws an invalid argument exception from the standard system namespace in .NET: let rec factorial x = if x < 0 then invalidArg "x" "Value should be greater than zero" match x with | 0 -> 1 | _ -> x * (factorial (x - 1)) By now, you should be pretty familiar with the F# programming language, to use in your application development, alongside C#. If you found this tutorial helpful and you're interested in learning more, head over to this book .NET Core 2.0 By Example, by Rishabh Verma and Neha Shrivastava. .NET Core completes move to the new compiler – RyuJIT Applying Single Responsibility principle from SOLID in .NET Core Unit Testing in .NET Core with Visual Studio 2017 for better code quality

0
0
3871

How-To Tutorials

article-image-multithreading-in-rust-using-crates-tutorial

Aaron Lazar

15 Aug 2018

17 min read

Multithreading in Rust using Crates [Tutorial]

Aaron Lazar

15 Aug 2018

17 min read

The crates.io ecosystem in Rust can make use of approaches to improve our development speed as well as the performance of our code. In this tutorial, we'll learn how to use the crates ecosystem to manipulate threads in Rust. This article is an extract from Rust High Performance, authored by Iban Eguia Moraza. Using non-blocking data structures One of the issues we saw earlier was that if we wanted to share something more complex than an integer or a Boolean between threads and if we wanted to mutate it, we needed to use a Mutex. This is not entirely true, since one crate, Crossbeam, allows us to use great data structures that do not require locking a Mutex. They are therefore much faster and more efficient. Often, when we want to share information between threads, it's usually a list of tasks that we want to work on cooperatively. Other times, we want to create information in multiple threads and add it to a list of information. It's therefore not so usual for multiple threads to be working with exactly the same variables since as we have seen, that requires synchronization and it will be slow. This is where Crossbeam shows all its potential. Crossbeam gives us some multithreaded queues and stacks, where we can insert data and consume data from different threads. We can, in fact, have some threads doing an initial processing of the data and others performing a second phase of the processing. Let's see how we can use these features. First, add crossbeam to the dependencies of the crate in the Cargo.toml file. Then, we start with a simple example: extern crate crossbeam; use std::thread; use std::sync::Arc; use crossbeam::sync::MsQueue; fn main() { let queue = Arc::new(MsQueue::new()); let handles: Vec<_> = (1..6) .map(|_| { let t_queue = queue.clone(); thread::spawn(move || { for _ in 0..1_000_000 { t_queue.push(10); } }) }) .collect(); for handle in handles { handle.join().unwrap(); } let final_queue = Arc::try_unwrap(queue).unwrap(); let mut sum = 0; while let Some(i) = final_queue.try_pop() { sum += i; } println!("Final sum: {}", sum); } Let's first understand what this example does. It will iterate 1,000,000 times in 5 different threads, and each time it will push a 10 to a queue. Queues are FIFO lists, first input, first output. This means that the first number entered will be the first one to pop() and the last one will be the last to do so. In this case, all of them are a 10, so it doesn't matter. Once the threads finish populating the queue, we iterate over it and we add all the numbers. A simple computation should make you able to guess that if everything goes perfectly, the final number should be 50,000,000. If you run it, that will be the result, and that's not all. If you run it by executing cargo run --release, it will run blazingly fast. On my computer, it took about one second to complete. If you want, try to implement this code with the standard library Mutex and vector, and you will see that the performance difference is amazing. As you can see, we still needed to use an Arc to control the multiple references to the queue. This is needed because the queue itself cannot be duplicated and shared, it has no reference count. Crossbeam not only gives us FIFO queues. We also have LIFO stacks. LIFO comes from last input, first output, and it means that the last element you inserted in the stack will be the first one to pop(). Let's see the difference with a couple of threads: extern crate crossbeam; use std::thread; use std::sync::Arc; use std::time::Duration; use crossbeam::sync::{MsQueue, TreiberStack}; fn main() { let queue = Arc::new(MsQueue::new()); let stack = Arc::new(TreiberStack::new()); let in_queue = queue.clone(); let in_stack = stack.clone(); let in_handle = thread::spawn(move || { for i in 0..5 { in_queue.push(i); in_stack.push(i); println!("Pushed :D"); thread::sleep(Duration::from_millis(50)); } }); let mut final_queue = Vec::new(); let mut final_stack = Vec::new(); let mut last_q_failed = 0; let mut last_s_failed = 0; loop { // Get the queue match queue.try_pop() { Some(i) => { final_queue.push(i); last_q_failed = 0; println!("Something in the queue! :)"); } None => { println!("Nothing in the queue :("); last_q_failed += 1; } } // Get the stack match stack.try_pop() { Some(i) => { final_stack.push(i); last_s_failed = 0; println!("Something in the stack! :)"); } None => { println!("Nothing in the stack :("); last_s_failed += 1; } } // Check if we finished if last_q_failed > 1 && last_s_failed > 1 { break; } else if last_q_failed > 0 || last_s_failed > 0 { thread::sleep(Duration::from_millis(100)); } } in_handle.join().unwrap(); println!("Queue: {:?}", final_queue); println!("Stack: {:?}", final_stack); } As you can see in the code, we have two shared variables: a queue and a stack. The secondary thread will push new values to each of them, in the same order, from 0 to 4. Then, the main thread will try to get them back. It will loop indefinitely and use the try_pop() method. The pop() method can be used, but it will block the thread if the queue or the stack is empty. This will happen in any case once all values get popped since no new values are being added, so the try_pop() method will help not to block the main thread and end gracefully. The way it checks whether all the values were popped is by counting how many times it failed to pop a new value. Every time it fails, it will wait for 100 milliseconds, while the push thread only waits for 50 milliseconds between pushes. This means that if it tries to pop new values two times and there are no new values, the pusher thread has already finished. It will add values as they are popped to two vectors and then print the result. In the meantime, it will print messages about pushing and popping new values. You will understand this better by seeing the output: Note that the output can be different in your case, since threads don't need to be executed in any particular order. In this example output, as you can see, it first tries to get something from the queue and the stack but there is nothing there, so it sleeps. The second thread then starts pushing things, two numbers actually. After this, the queue and the stack will be [0, 1]. Then, it pops the first item from each of them. From the queue, it will pop the 0 and from the stack it will pop the 1 (the last one), leaving the queue as [1] and the stack as [0]. It will go back to sleep and the secondary thread will insert a 2 in each variable, leaving the queue as [1, 2] and the stack as [0, 2]. Then, the main thread will pop two elements from each of them. From the queue, it will pop the 1 and the 2, while from the stack it will pop the 2 and then the 0, leaving both empty. The main thread then goes to sleep, and for the next two tries, the secondary thread will push one element and the main thread will pop it, twice. It might seem a little bit complex, but the idea is that these queues and stacks can be used efficiently between threads without requiring a Mutex, and they accept any Send type. This means that they are great for complex computations, and even for multi-staged complex computations. The Crossbeam crate also has some helpers to deal with epochs and even some variants of the mentioned types. For multithreading, Crossbeam also adds a great utility: scoped threads. Scoped threads In all our examples, we have used standard library threads. As we have discussed, these threads have their own stack, so if we want to use variables that we created in the main thread we will need to send them to the thread. This means that we will need to use things such as Arc to share non-mutable data. Not only that, having their own stack means that they will also consume more memory and eventually make the system slower if they use too much. Crossbeam gives us some special threads that allow sharing stacks between them. They are called scoped threads. Using them is pretty simple and the crate documentation explains them perfectly; you will just need to create a Scope by calling crossbeam::scope(). You will need to pass a closure that receives the Scope. You can then call spawn() in that scope the same way you would do it in std::thread, but with one difference, you can share immutable variables among threads if they were created inside the scope or moved to it. This means that for the queues or stacks we just talked about, or for atomic data, you can simply call their methods without requiring an Arc! This will improve the performance even further. Let's see how it works with a simple example: extern crate crossbeam; fn main() { let all_nums: Vec<_> = (0..1_000_u64).into_iter().collect(); let mut results = Vec::new(); crossbeam::scope(|scope| { for num in &all_nums { results.push(scope.spawn(move || num * num + num * 5 + 250)); } }); let final_result: u64 = results.into_iter().map(|res| res.join()).sum(); println!("Final result: {}", final_result); } Let's see what this code does. It will first just create a vector with all the numbers from 0 to 1000. Then, for each of them, in a crossbeam scope, it will run one scoped thread per number and perform a supposedly complex computation. This is just an example, since it will just return a result of a simple second-order function. Interestingly enough, though, the scope.spawn() method allows returning a result of any type, which is great in our case. The code will add each result to a vector. This won't directly add the resulting number, since it will be executed in parallel. It will add a result guard, which we will be able to check outside the scope. Then, after all the threads run and return the results, the scope will end. We can now check all the results, which are guaranteed to be ready for us. For each of them, we just need to call join() and we will get the result. Then, we sum it up to check that they are actual results from the computation. This join() method can also be called inside the scope and get the results, but it will mean that if you do it inside the for loop, for example, you will block the loop until the result is generated, which is not efficient. The best thing is to at least run all the computations first and then start checking the results. If you want to perform more computations after them, you might find it useful to run the new computation in another loop or iterator inside the crossbeam scope. But, how does crossbeam allow you to use the variables outside the scope freely? Won't there be data races? Here is where the magic happens. The scope will join all the inner threads before exiting, which means that no further code will be executed in the main thread until all the scoped threads finish. This means that we can use the variables of the main thread, also called parent stack, due to the main thread being the parent of the scope in this case without any issue. We can actually check what is happening by using the println!() macro. If we remember from previous examples, printing to the console after spawning some threads would usually run even before the spawned threads, due to the time it takes to set them up. In this case, since we have crossbeam preventing it, we won't see it. Let's check the example: extern crate crossbeam; fn main() { let all_nums: Vec<_> = (0..10).into_iter().collect(); crossbeam::scope(|scope| { for num in all_nums { scope.spawn(move || { println!("Next number is {}", num); }); } }); println!("Main thread continues :)"); } If you run this code, you will see something similar to the following output: As you can see, scoped threads will run without any particular order. In this case, it will first run the 1, then the 0, then the 2, and so on. Your output will probably be different. The interesting thing, though, is that the main thread won't continue executing until all the threads have finished. Therefore, reading and modifying variables in the main thread is perfectly safe. There are two main performance advantages with this approach; Arc will require a call to malloc() to allocate memory in the heap, which will take time if it's a big structure and the memory is a bit full. Interestingly enough, that data is already in our stack, so if possible, we should try to avoid duplicating it in the heap. Moreover, the Arc will have a reference counter, as we saw. And it will even be an atomic reference counter, which means that every time we clone the reference, we will need to atomically increment the count. This takes time, even more than incrementing simple integers. Most of the time, we might be waiting for some expensive computations to run, and it would be great if they just gave all the results when finished. We can still add some more chained computations, using scoped threads, that will only be executed after the first ones finish, so we should use scoped threads more often than normal threads, if possible. Using thread pool So far, we have seen multiple ways of creating new threads and sharing information between them. Nevertheless, the ideal number of threads we should spawn to do all the work should be around the number of virtual processors in the system. This means we should not spawn one thread for each chunk of work. Nevertheless, controlling what work each thread does can be complex, since you have to make sure that all threads have work to do at any given point in time. Here is where thread pooling comes in handy. The Threadpool crate will enable you to iterate over all your work and for each of your small chunks, you can call something similar to a thread::spawn(). The interesting thing is that each task will be assigned to an idle thread, and no new thread will be created for each task. The number of threads is configurable and you can get the number of CPUs with other crates. Not only that, if one of the threads panics, it will automatically add a new one to the pool. To see an example, first, let's add threadpool and num_cpus as dependencies in our Cargo.toml file. Then, let's see an example code: extern crate num_cpus; extern crate threadpool; use std::sync::atomic::{AtomicUsize, Ordering}; use std::sync::Arc; use threadpool::ThreadPool; fn main() { let pool = ThreadPool::with_name("my worker".to_owned(), num_cpus::get()); println!("Pool threads: {}", pool.max_count()); let result = Arc::new(AtomicUsize::new(0)); for i in 0..1_0000_000 { let t_result = result.clone(); pool.execute(move || { t_result.fetch_add(i, Ordering::Relaxed); }); } pool.join(); let final_res = Arc::try_unwrap(result).unwrap().into_inner(); println!("Final result: {}", final_res); } This code will create a thread pool of threads with the number of logical CPUs in your computer. Then, it will add a number from 0 to 1,000,000 to an atomic usize, just to test parallel processing. Each addition will be performed by one thread. Doing this with one thread per operation (1,000,000 threads) would be really inefficient. In this case, though, it will use the appropriate number of threads, and the execution will be really fast. There is another crate that gives thread pools an even more interesting parallel processing feature: Rayon. Using parallel iterators If you can see the big picture in these code examples, you'll have realized that most of the parallel work has a long loop, giving work to different threads. It happened with simple threads and it happens even more with scoped threads and thread pools. It's usually the case in real life, too. You might have a bunch of data to process, and you can probably separate that processing into chunks, iterate over them, and hand them over to various threads to do the work for you. The main issue with that approach is that if you need to use multiple stages to process a given piece of data, you might end up with lots of boilerplate code that can make it difficult to maintain. Not only that, you might find yourself not using parallel processing sometimes due to the hassle of having to write all that code. Luckily, Rayon has multiple data parallelism primitives around iterators that you can use to parallelize any iterative computation. You can almost forget about the Iterator trait and use Rayon's ParallelIterator alternative, which is as easy to use as the standard library trait! Rayon uses a parallel iteration technique called work stealing. For each iteration of the parallel iterator, the new value or values get added to a queue of pending work. Then, when a thread finishes its work, it checks whether there is any pending work to do and if there is, it starts processing it. This, in most languages, is a clear source of data races, but thanks to Rust, this is no longer an issue, and your algorithms can run extremely fast and in parallel. Let's look at how to use it for an example similar to those we have seen in this chapter. First, add rayon to your Cargo.toml file and then let's start with the code: extern crate rayon; use rayon::prelude::*; fn main() { let result = (0..1_000_000_u64) .into_par_iter() .map(|e| e * 2) .sum::<u64>(); println!("Result: {}", result); } As you can see, this works just as you would write it in a sequential iterator, yet, it's running in parallel. Of course, running this example sequentially will be faster than running it in parallel thanks to compiler optimizations, but when you need to process data from files, for example, or perform very complex mathematical computations, parallelizing the input can give great performance gains. Rayon implements these parallel iteration traits to all standard library iterators and ranges. Not only that, it can also work with standard library collections, such as HashMap and Vec. In most cases, if you are using the iter() or into_iter() methods from the standard library in your code, you can simply use par_iter() or into_par_iter() in those calls and your code should now be parallel and work perfectly. But, beware, sometimes parallelizing something doesn't automatically improve its performance. Take into account that if you need to update some shared information between the threads, they will need to synchronize somehow, and you will lose performance. Therefore, multithreading is only great if workloads are completely independent and you can execute one without any dependency on the rest. If you found this article useful and would like to learn more such tips, head over to pick up this book, Rust High Performance, authored by Iban Eguia Moraza. Rust 1.28 is here with global allocators, nonZero types and more Java Multithreading: How to synchronize threads to implement critical sections and avoid race conditions Multithreading with Qt

0
0
10133

How-To Tutorials

article-image-understanding-functional-reactive-programming-in-scala

Fatema Patrawala

15 Aug 2018

6 min read

Understanding functional reactive programming in Scala [Tutorial]

Fatema Patrawala

15 Aug 2018

6 min read

Like OOP (Object-Oriented Programming), Functional Programming is a kind of programming paradigm. It is a programming style in which we write programs in terms of pure functions and immutable data. It treats its programs as function evaluation. As we use pure functions and immutable data to write our applications, we will get lots of benefits for free. For instance, with immutable data, we do not need to worry about shared-mutable states, side effects, and thread-safety. It follows a Declarative programming style, which means programming is done in terms of expressions, not statements. For instance, in OOP or imperative programming paradigms, we use statements to write programs where FP uses everything as expressions. In this scala functional programming tutorial we will understand the principles and benefits of FP and why Functional reactive programming is a best fit for Reactive programming in Scala. This Scala tutorial is an extract taken from the book Scala Reactive Programming written by Rambabu Posa. Principles of functional programming FP has the following principles: Pure functions Immutable data No side effects Referential transparency (RT) Functions are first-class citizens Functions that include anonymous functions, higher order functions, combinators, partial functions, partially-applied functions, function currying, closures Tail recursion Functions composability A pure function is a function that always returns the same results for the same inputs irrespective of how many times and where you run this function. We will get lots of benefits with immutable data. For instance, no shared data, no side effects, thread safety for free, and so on. Like an object is a first-class citizen in OOP, in FP, a function is a first-class citizen. This means that we can use a function as any of these: An object A value A data A data type An operation In simple words, in FP, we treat both functions and data as the same. We can compose functions that are in sequential order so that we can solve even complex problems easily. Higher-Order Functions (HOF) are functions that take one or more functions as their parameters or return a function as their result or do both. For instance, map(), flatMap(), filter(), and so on are some of the important and frequently used higher-order functions. Consider the following example: map(x => x*x) Here, the map() function is an example of Higher-Order Function because it takes an anonymous function as its parameter. This anonymous function x => x *x is of type Int => Int, which takes an Int as input and returns Int as its result. An anonymous function is a function without any name. Benefits of functional programming FP provides us with many benefits: Thread-safe code Easy-to-write concurrency and parallel code We can write simple, readable, and elegant code Type safety Composability Supports Declarative programming As we use pure functions and immutability in FP, we will get thread-safety for free. One of the greatest benefits of FP is function composability. We can compose multiple functions one by one and execute them either sequentially or parentally. It gives us a great approach to solve complex problems easily. Functional Reactive programming The combination of FP and RP is known as function Reactive programming or, for short, FRP. It is a multiparadigm and combines the benefits and best features of two of the most popular programming paradigms, which are, FP and RP. FRP is a new programming paradigm or a new style of programming that uses the RP paradigm to support asynchronous non-blocking data streaming with backpressure and also uses the FP paradigm to utilize its features (such as pure functions, immutability, no side effects, RT, and more) and its HOF or combinators (such as map, flatMap, filter, reduce, fold, and zip). In simple words, FRP is a new programming paradigm to support RP using FP features and its building blocks. FRP = FP + RP, as shown here: Today, we have many FRP solutions, frameworks, tools, or technologies. Here's a list of a few FRP technologies: Scala, Play Framework, and Akka Toolkit RxJS Reactive-banana Reactive Sodium Haskell This book is dedicated toward discussing Lightbend's FRP technology stack—Lagom Framework, Scala, Play Framework, and Akka Toolkit (Akka Streams). FRP technologies are mainly useful in developing interactive programs, such as rich GUI (graphical user interfaces), animations, multiplayer games, computer music, or robot controllers. Types of Reactive Programming Even though most of the projects or companies use FP Paradigm to develop their Reactive systems or solutions, there are a couple of ways to use RP. They are known as types of RP: FRP (Functional Reactive Programming) OORP (Object-Oriented Reactive Programming) However, FP is the best programming paradigm to conflate with RP. We will get all the benefits of FP for free. Why FP is the best fit for RP When we conflate RP with FP, we will get the following benefits: Composability—we can compose multiple data streams using functional operations so that we can solve even complex problems easily Thread safety Readability Simple, concise, clear, and easy-to-understand code Easy-to-write asynchronous, concurrent, and parallel code Supports very flexible and easy-to-use operations Supports Declarative programming Easy to write, more Scalable, highly available, and robust code In FP, we concentrate on what to do to fulfill a job, whereas in other programming paradigms, such as OOP or imperative programming (IP), we concentrate on how to do. Declarative programming gives us the following benefits: No side effects Enforces to use immutability Easy to write concise and understandable code The main property of RP is real-time data streaming, and the main property of FP is composability. If we combine these two paradigms, we will get more benefits and can develop better solutions easily. In RP, everything is a stream, while everything is a function in FP. We can use these functions to perform operations on data streams. We learnt the principles and benefits of Scala functional programming. To build fault-tolerant, robust, and distributed applications in Scala, grab the book Scala Reactive Programming today. Introduction to the Functional Programming Manipulating functions in functional programming Why functional programming in Python matters: Interview with best selling author, Steven Lott

0
0
4387

How-To Tutorials

article-image-mongodb-sharding-clusters-choosing-right-shard-key

Fatema Patrawala

14 Aug 2018

9 min read

MongoDB Sharding: Sharding clusters and choosing the right shard key [Tutorial]

Fatema Patrawala

14 Aug 2018

9 min read

Sharding was one of the features that MongoDB offered from an early stage, since version 1.6 was released in August 2010. Sharding is the ability to horizontally scale out our database by partitioning our datasets across different servers—the shards. Foursquare and Bitly are two of the most famous early customers for MongoDB that were also using sharding from its inception all the way to the general availability release. In this article we will learn how to design a sharding cluster and how to make the single most important decision around it of choosing the unique shard key. This article is a MongoDB shard tutorial taken from the book Mastering MongoDB 3.x by Alex Giamas. Sharding setup in MongoDB Sharding is performed at the collection level. We can have collections that we don't want or need to shard for several reasons. We can leave these collections unsharded. These collections will be stored in the primary shard. The primary shard is different for each database in MongoDB. The primary shard is automatically selected by MongoDB when we create a new database in a sharded environment. MongoDB will pick the shard that has the least data stored at the moment of creation. If we want to change the primary shard at any other point, we can issue the following command: > db.runCommand( { movePrimary : "mongo_books", to : "UK_based" } ) We thus move the database named mongo_books to the shard named UK_based. Choosing the shard key Choosing our shard key is the most important decision we need to make. The reason is that once we shard our data and deploy our cluster, it becomes very difficult to change the shard key. First, we will go through the process of changing the shard key. Changing the shard key There is no command or simple procedure to change the shard key in MongoDB. The only way to change the shard key involves backing up and restoring all of our data, something that may range from being extremely difficult to impossible in high-load production environments. The steps if we want to change our shard key are as follows: Export all data from MongoDB. Drop the original sharded collection. Configure sharding with the new key. Presplit the new shard key range. Restore our data back into MongoDB. From these steps, step 4 is the one that needs some more explanation. MongoDB uses chunks to split data in a sharded collection. If we bootstrap a MongoDB sharded cluster from scratch, chunks will be calculated automatically by MongoDB. MongoDB will then distribute the chunks across different shards to ensure that there are an equal number of chunks in each shard. The only case in which we cannot really do this is when we want to load data into a newly sharded collection. The reasons are threefold: MongoDB creates splits only after an insert operation. Chunk migration will copy all of the data in that chunk from one shard to another. The floor(n/2) chunk migrations can happen at any given time, where n is the number of shards we have. Even with three shards, this is only a floor(1.5)=1 chunk migration at a time. These three limitations combined mean that letting MongoDB to figure it out on its own will definitely take much longer and may result in an eventual failure. This is why we want to presplit our data and give MongoDB some guidance on where our chunks should go. Considering our example of the mongo_books database and the books collection, this would be: > db.runCommand( { split : "mongo_books.books", middle : { id : 50 } } ) The middle command parameter will split our key space in documents that have id<=50 and documents that have id>50. There is no need for a document to exist in our collection with id=50 as this will only serve as the guidance value for our partitions. In this example, we chose 50 assuming that our keys follow a uniform distribution (that is, the same count of keys for each value) in the range of values from 0 to 100. We should aim to create at least 20-30 chunks to grant MongoDB flexibility in potential migrations. We can also use bounds and find instead of middle if we want to manually define the partition key, but both parameters need data to exist in our collection before applying them. Choosing the correct shard key After the previous section, it's now self-evident that we need to take into great consideration the choice of our shard key as it is something that we have to stick with. A great shard key has three characteristics: High cardinality Low frequency Non-monotonically changing in value We will go over the definitions of these three properties first to understand what they mean. High cardinality means that the shard key must have as many distinct values as possible. A Boolean can take only values of true/false, and so it is a bad shard key choice. A 64-bit long value field that can take any value from −(2^63) to 2^63 − 1 and is a good example in terms of cardinality. Low frequency directly relates to the argument about high cardinality. A low-frequency shard key will have a distribution of values as close to a perfectly random / uniform distribution. Using the example of our 64-bit long value, it is of little use to us if we have a field that can take values ranging from −(2^63) to 2^63 − 1 only to end up observing the values of 0 and 1 all the time. In fact, it is as bad as using a Boolean field, which can also take only two values after all. If we have a shard key with high frequency values, we will end up with chunks that are indivisible. These chunks cannot be further divided and will grow in size, negatively affecting the performance of the shard that contains them. Non-monotonically changing values mean that our shard key should not be, for example, an integer that always increases with every new insert. If we choose a monotonically increasing value as our shard key, this will result in all writes ending up in the last of all of our shards, limiting our write performance. If we want to use a monotonically changing value as the shard key, we should consider using hash-based sharding. In the next section, we will describe different sharding strategies and their advantages and disadvantages. Range-based sharding The default and the most widely used sharding strategy is range-based sharding. This strategy will split our collection's data into chunks, grouping documents with nearby values in the same shard. For our example database and collection, mongo_books and books respectively, we have: > sh.shardCollection("mongo_books.books", { id: 1 } ) This creates a range-based shard key on id with ascending direction. The direction of our shard key will determine which documents will end up in the first shard and which ones in the subsequent ones. This is a good strategy if we plan to have range-based queries as these will be directed to the shard that holds the result set instead of having to query all shards. Hash-based sharding If we don't have a shard key (or can't create one) that achieves the three goals mentioned previously, we can use the alternative strategy of using hash-based sharding. In this case, we are trading data distribution with query isolation. Hash-based sharding will take the values of our shard key and hash them in a way that guarantees close to uniform distribution. This way we can be sure that our data will evenly distribute across shards. The downside is that only exact match queries will get routed to the exact shard that holds the value. Any range query will have to go out and fetch data from all shards. For our example database and collection (mongo_books and books respectively), we have: > sh.shardCollection("mongo_books.books", { id: "hashed" } ) Similar to the preceding example, we are now using the id field as our hashed shard key. Suppose we use fields with float values for hash-based sharding. Then we will end up with collisions if the precision of our floats is more that 2^53. These fields should be avoided where possible. Coming up with our own key Range-based sharding does not need to be confined to a single key. In fact, in most cases, we would like to combine multiple keys to achieve high cardinality and low frequency. A common pattern is to combine a low-cardinality first part (but still having as distinct values more than two times the number of shards that we have) with a high-cardinality key as its second field. This achieves both read and write distribution from the first part of the sharding key and then cardinality and read locality from the second part. On the other hand, if we don't have range queries, we can get away by using hash-based sharding on a primary key as this will exactly target the shard and document that we are going after. To make things more complicated, these considerations may change depending on our workload. A workload that consists almost exclusively (say 99.5%) of reads won't care about write distribution. We can use the built-in _id field as our shard key and this will only add 0.5% load in the last shard. Our reads will still be distributed across shards. Unfortunately, in most cases, this is not simple. Location-based data Due to government regulations and the desire to have our data as close to our users as possible, there is often a constraint and need to limit data in a specific data center. By placing different shards at different data centers, we can satisfy this requirement. To summarize we learned about MongoDB sharding and got to know techniques to choose the correct shard key. Get the expert guide Mastering MongoDB 3.x today to build fault-tolerant MongoDB application. MongoDB 4.0 now generally available with support for multi-platform, mobile, ACID transactions and more MongoDB going relational with 4.0 release Indexing, Replicating, and Sharding in MongoDB [Tutorial]

0
0
10848

How-To Tutorials

article-image-cloud-native-architectures-microservices-containers-serverless-part-2

Guest Contributor

14 Aug 2018

8 min read

Modern Cloud Native architectures: Microservices, Containers, and Serverless - Part 2

Guest Contributor

14 Aug 2018

8 min read

0
0
3729

article-image-application-data-entity-framework-net-core

Aaron Lazar

14 Aug 2018

14 min read

Access application data with Entity Framework in .NET Core [Tutorial]

Aaron Lazar

14 Aug 2018

14 min read

In this tutorial, we will get started with using the Entity Framework and create a simple console application to perform CRUD operations. The intent is to get started with EF Core and understand how to use it. Before we dive into coding, let us see the two development approaches that EF Core supports: Code-first Database-first These two paradigms have been supported for a very long time and therefore we will just look at them at a very high level. EF Core mainly targets the code-first approach and has limited support for the database-first approach, as there is no support for the visual designer or wizard for the database model out of the box. However, there are third-party tools and extensions that support this. The list of third-party tools and extensions can be seen at https://docs.microsoft.com/en-us/ef/core/extensions/. This tutorial has been extracted from the book .NET Core 2.0 By Example, by Rishabh Verma and Neha Shrivastava. In the code-first approach, we first write the code; that is, we first create the domain model classes and then, using these classes, EF Core APIs create the database and tables, using migration based on the convention and configuration provided. We will look at conventions and configurations a little later in this section. The following diagram illustrates the code-first approach: In the database-first approach, as the name suggests, we have an existing database or we create a database first and then use EF Core APIs to create the domain and context classes. As mentioned, currently EF Core has limited support for it due to a lack of tooling. So, our preference will be for the code-first approach throughout our examples. The reader can discover the third-party tools mentioned previously to learn more about the EF Core database-first approach as well. The following image illustrates the database-first approach: Building Entity Framework Core Console App Now that we understand the approaches and know that we will be using the code-first approach, let's dive into coding our getting started with EF Core console app. Before we do so, we need to have SQL Express installed in our development machine. If SQL Express is not installed, download the SQL Express 2017 edition from https://www.microsoft.com/en-IN/sql-server/sql-server-downloads and run the setup wizard. We will do the Basic installation of SQL Express 2017 for our learning purposes, as shown in the following screenshot: Our objective is to learn how to use EF Core and so we will not do anything fancy in our console app. We will just do simple Create Read Update Delete (CRUD) operations of a simple class called Person, as defined here: public class Person { public int Id { get; set; } public string Name { get; set; } public bool Gender { get; set; } public DateTime DateOfBirth { get; set; } public int Age { get { var age = DateTime.Now.Year - this.DateOfBirth.Year; if (DateTime.Now.DayOfYear < this.DateOfBirth.DayOfYear) { age = age - 1; } return age; } } } As we can see in the preceding code, the class has simple properties. To perform the CRUD operations on this class, let's create a console app by performing the following steps: Create a new .NET Core console project named GettingStartedWithEFCore, as shown in the following screenshot: Create a new folder named Models in the project node and add the Person class to this newly created folder. This will be our model entity class, which we will use for CRUD operations. Next, we need to install the EF Core package. Before we do that, it's important to know that EF Core provides support for a variety of databases. A few of the important ones are: SQL Server SQLite InMemory (for testing) The complete and comprehensive list can be seen at https://docs.microsoft.com/en-us/ef/core/providers/. We will be working with SQL Server on Windows for our learning purposes, so let's install the SQL Server package for Entity Framework Core. To do so, let's install the Microsoft.EntityFrameworkCore.SqlServer package from the NuGet Package Manager in Visual Studio 2017. Right-click on the project. Select Manage Nuget Packages and then search for Microsoft.EntityFrameworkCore.SqlServer. Select the matching result and click Install: Next, we will create a class called Context, as shown here: public class Context : DbContext { public DbSet<Person> Persons { get; set; } protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder) { //// Get the connection string from configuration optionsBuilder.UseSqlServer(@"Server=.\SQLEXPRESS ;Database=PersonDatabase;Trusted_Connection=True;"); } protected override void OnModelCreating(ModelBuilder modelBuilder) { modelBuilder.Entity<Person> ().Property(nameof(Person.Name)).IsRequired(); } } The class looks quite simple, but it has the following subtle and important things to make note of: The Context class derives from DbContext, which resides in the Microsoft.EntityFrameworkCore namespace. DbContext is an integral part of EF Core and if you have worked with EF, you will already be aware of it. An instance of DbContext represents a session with the database and can be used to query and save instances of your entities. DbContext is a combination of the Unit Of Work and Repository Patterns. Typically, you create a class that derives from DbContext and contains Microsoft.EntityFrameworkCore.DbSet properties for each entity in the model. If properties have a public setter, they are automatically initialized when the instance of the derived context is created. It contains a property named Persons (plural of the model class Person) of type DbSet<Person>. This will map to the Persons table in the underlying database. The class overrides the OnConfiguring method of DbContext and specifies the connection string to be used with the SQL Server database. The connection string should be read from the configuration file, appSettings.json, but for the sake of brevity and simplicity, it's hardcoded in the preceding code. The OnConfiguring method allows us to select and configure the data source to be used with a context using DbContextOptionsBuilder. Let's look at the connection string. Server= specifies the server. It can be .\SQLEXPRESS, .\SQLSERVER, .\LOCALDB, or any other instance name based on the installation you have done. Database= specifies the database name that will be created. Trusted_Connection=True specifies that we are using integrated security or Windows authentication. An enthusiastic reader should read the official Microsoft Entity framework documentation on configuring the context at https://docs.microsoft.com/en-us/ef/core/miscellaneous/configuring-dbcontext. The OnModelCreating method allows us to configure the model using the ModelBuilder Fluent API. This is the most powerful method of configuration and allows configuration to be specified without modifying the entity classes. The Fluent API configuration has the highest precedence and will override conventions and data annotations. The preceding code has same effect as the following data annotation has on the Name property in the Person class: [Required] public string Name { get; set; } The preceding point highlights the flexibility and configuration that EF Core brings to the table. EF Core uses a combination of conventions, attributes, and Fluent API statements to build a database model at runtime. All we have to do is to perform actions on the model classes using a combination of these and they will automatically be translated to appropriate changes in the database. Before we conclude this point, let's have a quick look at each of the different ways to configure a database model: EF Core conventions: The conventions in EF Core are comprehensive. They are the default rules by which EF Core builds a database model based on classes. A few of the simpler yet important default conventions are listed here: EF Core creates database tables for all DbSet<TEntity> properties in a Context class with the same name as that of the property. In the preceding example, the table name would be Persons based on this convention. EF Core creates tables for entities that are not included as DbSet properties but are reachable through reference properties in the other DbSet entities. If the Person class had a complex/navigation property, EF Core would have created a table for it as well. EF Core creates columns for all the scalar read-write properties of a class with the same name as the property by default. It uses the reference and collection properties for building relationships among corresponding tables in the database. In the preceding example, the scalar properties of Person correspond to a column in the Persons table. EF Core assumes a property named ID or one that is suffixed with ID as a primary key. If the property is an integer type or Guid type, then EF Core also assumes it to be IDENTITY and automatically assigns a value when inserting the data. This is precisely what we will make use of in our example while inserting or creating a new Person. EF Core maps the data type of a database column based on the data type of the property defined in the C# class. A few of the mappings between the C# data type to the SQL Server column data type are listed in the following table: C# data type SQL server data type int int string nvarchar(Max) decimal decimal(18,2) float real byte[] varbinary(Max) datetime datetime bool bit byte tinyint short smallint long bigint double float There are many other conventions, and we can define custom conventions as well. For more details, please read the official Microsoft documentation at https://docs.microsoft.com/en-us/ef/core/modeling/. Attributes: Conventions are often not enough to map the class to database objects. In such scenarios, we can use attributes called data annotation attributes to get the desired results. The [Required] attribute that we have just seen is an example of a data annotation attribute. Fluent API: This is the most powerful way of configuring the model and can be used in addition to or in place of attributes. The code written in the OnModelConfiguring method is an example of a Fluent API statement. If we check now, there is no PersonDatabase database. So, we need to create the database from the model by adding a migration. EF Core includes different migration commands to create or update the database based on the model. To do so in Visual Studio 2017, go to Tools | Nuget Package Manager | Package Manager Console, as shown in the following screenshot: This will open the Package Manager Console window. Select the Default Project as GettingStartedWithEFCore and type the following command: add-migration CreatePersonDatabase If you are not using Visual Studio 2017 and you are dependent on .NET Core CLI tooling, you can use the following command: dotnet ef migrations add CreatePersonDatabase We have not installed the Microsoft.EntityFrameworkCore.Design package, so it will give an error: Your startup project 'GettingStartedWithEFCore' doesn't reference Microsoft.EntityFrameworkCore.Design. This package is required for the Entity Framework Core Tools to work. Ensure your startup project is correct, install the package, and try again. So let's first go to the NuGet Package Manager and install this package. After successful installation of this package, if we run the preceding command again, we should be able to run the migrations successfully. It will also tell us the command to undo the migration by displaying the message To undo this action, use Remove-Migration. We should see the new files added in the Solution Explorer in the Migrations folder, as shown in the following screenshot: 8. Although we have migrations applied, we have still not created a database. To create the database, we need to run the following commands. In Visual Studio 2017: update-database –verbose In .NET Core CLI: dotnet ef database update If all goes well, we should have the database created with the Persons table (property of type DbSet<Person>) in the database. Let's validate the table and database by using SQL Server Management Studio (SSMS). If SSMS is not installed in your machine, you can also use Visual Studio 2017 to view the database and table. Let's check the created database. In Visual Studio 2017, click on the View menu and select Server Explorer, as shown in the following screenshot: In Server Explorer, right-click on Data Connections and then select Add Connection. The Add Connection dialog will show up. Enter .\SQLEXPRESS in the Server name (since we installed SQL EXPRESS 2017) and select PersonDatabase as the database, as shown in the following screenshot: On clicking OK, we will see the database named PersonDatabase and if we expand the tables, we can see the Persons table as well as the _EFMigrationsHistory table. Notice that the properties in the Person class that had setters are the only properties that get transformed into table columns in the Persons table. Notice that the Age property is read-only in the class we created and therefore we do not see an age column in the database table, as shown in the following screenshot: This is the first migration to create a database. Whenever we add or update the model classes or configurations, we need to sync the database with the model using the add-migration and update-database commands. With this, we have our model class ready and the corresponding database created. The following image summarizes how the properties have been mapped from the C# class to the database table columns: Now, we will use the Context class to perform CRUD operations. Let's go back to our Main.cs and write the following code. The code is well commented, so please go through the comments to understand the flow: class Program { static void Main(string[] args) { Console.WriteLine("Getting started with EF Core"); Console.WriteLine("We will do CRUD operations on Person class."); //// Lets create an instance of Person class. Person person = new Person() { Name = "Rishabh Verma", Gender = true, //// For demo true= Male, false = Female. Prefer enum in real cases. DateOfBirth = new DateTime(2000, 10, 23) }; using (var context = new Context()) { //// Context has strongly typed property named Persons which referes to Persons table. //// It has methods Add, Find, Update, Remove to perform CRUD among many others. //// Use AddRange to add multiple persons in once. //// Complete set of APIs can be seen by using F12 on the Persons property below in Visual Studio IDE. var personData = context.Persons.Add(person); //// Though we have done Add, nothing has actually happened in database. All changes are in context only. //// We need to call save changes, to persist these changes in the database. context.SaveChanges(); //// Notice above that Id is Primary Key (PK) and hence has not been specified in the person object passed to context. //// So, to know the created Id, we can use the below Id int createdId = personData.Entity.Id; //// If all goes well, person data should be persisted in the database. //// Use proper exception handling to discover unhandled exception if any. Not showing here for simplicity and brevity. createdId variable would now hold the id of created person. //// READ BEGINS Person readData = context.Persons.Where(j => j.Id == createdId).FirstOrDefault(); //// We have the data of person where Id == createdId, i.e. details of Rishabh Verma. //// Lets update the person data all together just for demonstarting update functionality. //// UPDATE BEGINS person.Name = "Neha Shrivastava"; person.Gender = false; person.DateOfBirth = new DateTime(2000, 6, 15); person.Id = createdId; //// For update cases, we need this to be specified. //// Update the person in context. context.Persons.Update(person); //// Save the updates. context.SaveChanges(); //// DELETE the person object. context.Remove(readData); context.SaveChanges(); } Console.WriteLine("All done. Please press Enter key to exit..."); Console.ReadLine(); } } With this, we have completed our sample app to get started with EF Core. I hope this simple example will set you up to start using EF Core with confidence and encourage you to start exploring it further. The detailed features of EF Core can be learned from the official Microsoft documentation available at https://docs.microsoft.com/en-us/ef/core/. If you're interested in learning more, head over to this book, .NET Core 2.0 By Example, by Rishabh Verma and Neha Shrivastava. How to build a chatbot with Microsoft Bot framework Working with Entity Client and Entity SQL Get to know ASP.NET Core Web API [Tutorial]

0
0
10400

How-To Tutorials

article-image-polymorphism-type-pattern-matching-python

Aaron Lazar

13 Aug 2018

11 min read

Polymorphism and type-pattern matching in Python [Tutorial]

Aaron Lazar

13 Aug 2018

11 min read

0
0
6441

How-To Tutorials

article-image-cloud-native-architectures-microservices-containers-serverless-part-1

Guest Contributor

13 Aug 2018

9 min read

Modern Cloud Native architectures: Microservices, Containers, and Serverless - Part 1

Guest Contributor

13 Aug 2018

9 min read

0
0
3666

article-image-tic-tac-toe-game-in-asp-net-core-tutorial

Aaron Lazar

13 Aug 2018

28 min read

Building a Tic-tac-toe game in ASP.Net Core 2.0 [Tutorial]

Aaron Lazar

13 Aug 2018

28 min read

0
1
13593

How-To Tutorials

article-image-web-services-functional-python-programming-tutorial

Aaron Lazar

12 Aug 2018

18 min read

Writing web services with functional Python programming [Tutorial]

Aaron Lazar

12 Aug 2018

18 min read

In this article we'll understand how functional programming can be applied to web services in Python. This article is an extract from the 2nd edition of the bestseller, Functional Python Programming, written by Steven Lott. We'll look at a RESTful web service, which can slice and dice a source of data and provide downloads as JSON, XML, or CSV files. We'll provide an overall WSGI-compatible wrapper. The functions that do the real work of the application won't be narrowly constrained to fit the WSGI standard. We'll use a simple dataset with four subcollections: the Anscombe Quartet. It's a small set of data but it can be used to show the principles of a RESTful web service. We'll split our application into two tiers: a web tier, which will be a simple WSGI application, and data service tier, which will be more typical functional programming. We'll look at the web tier first so that we can focus on a functional approach to provide meaningful results. We need to provide two pieces of information to the web service: The quartet that we want: this is a slice and dice operation. The idea is to slice up the information by filtering and extracting meaningful subsets. The output format we want. The data selection is commonly done through the request path. We can request /anscombe/I/ or /anscombe/II/ to pick specific datasets from the quartet. The idea is that a URL defines a resource, and there's no good reason for the URL to ever change. In this case, the dataset selectors aren't dependent on dates or some organizational approval status, or other external factors. The URL is timeless and absolute. The output format is not a first-class part of the URL. It's just a serialization format, not the data itself. In some cases, the format is requested through the HTTP Accept header. This is hard to use from a browser, but easy to use from an application using a RESTful API. When extracting data from the browser, a query string is commonly used to specify the output format. We'll use the ?form=json method at the end of the path to specify the JSON output format. A URL we can use will look like this: http://localhost:8080/anscombe/III/?form=csv This would request a CSV download of the third dataset. Creating the Web Server Gateway Interface First, we'll use a simple URL pattern-matching expression to define the one and only routing in our application. In a larger or more complex application, we might have more than one such pattern: import re path_pat= re.compile(r"^/anscombe/(?P<dataset>.*?)/?$") This pattern allows us to define an overall script in the WSGI sense at the top level of the path. In this case, the script is anscombe. We'll take the next level of the path as a dataset to select from the Anscombe Quartet. The dataset value should be one of I, II, III, or IV. We used a named parameter for the selection criteria. In many cases, RESTful APIs are described using a syntax, as follows: /anscombe/{dataset}/ We translated this idealized pattern into a proper, regular expression, and preserved the name of the dataset selector in the path. Here are some example URL paths that demonstrate how this pattern works: >>> m1 = path_pat.match( "/anscombe/I" ) >>> m1.groupdict() {'dataset': 'I'} >>> m2 = path_pat.match( "/anscombe/II/" ) >>> m2.groupdict() {'dataset': 'II'} >>> m3 = path_pat.match( "/anscombe/" ) >>> m3.groupdict() {'dataset': ''} Each of these examples shows the details parsed from the URL path. When a specific series is named, this is located in the path. When no series is named, then an empty string is found by the pattern. Here's the overall WSGI application: import traceback import urllib.parse def anscombe_app( environ: Dict, start_response: SR_Func ) -> Iterable[bytes]: log = environ['wsgi.errors'] try: match = path_pat.match(environ['PATH_INFO']) set_id = match.group('dataset').upper() query = urllib.parse.parse_qs(environ['QUERY_STRING']) print(environ['PATH_INFO'], environ['QUERY_STRING'], match.groupdict(), file=log) dataset = anscombe_filter(set_id, raw_data()) content_bytes, mime = serialize( query['form'][0], set_id, dataset) headers = [ ('Content-Type', mime), ('Content-Length', str(len(content_bytes))), ] start_response("200 OK", headers) return [content_bytes] except Exception as e: # pylint: disable=broad-except traceback.print_exc(file=log) tb = traceback.format_exc() content = error_page.substitute( title="Error", message=repr(e), traceback=tb) content_bytes = content.encode("utf-8") headers = [ ('Content-Type', "text/html"), ('Content-Length', str(len(content_bytes))), ] start_response("404 NOT FOUND", headers) return [content_bytes] This application will extract two pieces of information from the request: the PATH_INFO and the QUERY_STRING keys in the environment dictionary. The PATH_INFO request will define which set to extract. The QUERY_STRING request will specify an output format. It's important to note that query strings can be quite complex. Rather than assume it is simply a string like ?form=json, we've used the urllib.parse module to properly locate all of the name-value pairs in the query string. The value with the 'form' key in the dictionary extracted from the query string can be found in query['form'][0]. This should be one of the defined formats. If it isn't, an exception will be raised, and an error page displayed. After locating the path and query string, the application processing is highlighted in bold. These two statements rely on three functions to gather, filter, and serialize the results: The raw_data() function reads the raw data from a file. The result is a dictionary with lists of Pair objects. The anscombe_filter() function accepts a selection string and the dictionary of raw data and returns a single list of Pair objects. The list of pairs is then serialized into bytes by the serialize() function. The serializer is expected to produce byte's, which can then be packaged with an appropriate header, and returned. We elected to produce an HTTP Content-Length header as part of the result. This header isn't required, but it's polite for large downloads. Because we decided to emit this header, we are forced to create a bytes object with the serialization of the data so we can count the bytes. If we elected to omit the Content-Length header, we could change the structure of this application dramatically. Each serializer could be changed to a generator function, which would yield bytes as they are produced. For large datasets, this can be a helpful optimization. For the user watching a download, however, it might not be so pleasant because the browser can't display how much of the download is complete. A common optimization is to break the transaction into two parts. The first part computes the result and places a file into a Downloads directory. The response is a 302 FOUND with a Location header that identifies the file to download. Generally, most clients will then request the file based on this initial response. The file can be downloaded by Apache httpd or Nginx without involving the Python application. For this example, all errors are treated as a 404 NOT FOUND error. This could be misleading, since a number of individual things might go wrong. More sophisticated error handling could give more try:/except: blocks to provide more informative feedback. For debugging purposes, we've provided a Python stack trace in the resulting web page. Outside the context of debugging, this is a very bad idea. Feedback from an API should be just enough to fix the request, and nothing more. A stack trace provides too much information to potentially malicious users. Getting raw data Here's what we're using for this application: from Chapter_3.ch03_ex5 import ( series, head_map_filter, row_iter) from typing import ( NamedTuple, Callable, List, Tuple, Iterable, Dict, Any) RawPairIter = Iterable[Tuple[float, float]] class Pair(NamedTuple): x: float y: float pairs: Callable[[RawPairIter], List[Pair]] \ = lambda source: list(Pair(*row) for row in source) def raw_data() -> Dict[str, List[Pair]]: with open("Anscombe.txt") as source: data = tuple(head_map_filter(row_iter(source))) mapping = { id_str: pairs(series(id_num, data)) for id_num, id_str in enumerate( ['I', 'II', 'III', 'IV']) } return mapping The raw_data() function opens the local data file, and applies the row_iter() function to return each line of the file parsed into a row of separate items. We applied the head_map_filter() function to remove the heading from the file. The result created a tuple-of-list structure, which is assigned the variable data. This handles parsing the input into a structure that's useful. The resulting structure is an instance of the Pair subclass of the NamedTuple class, with two fields that have float as their type hints. We used a dictionary comprehension to build the mapping from id_str to pairs assembled from the results of the series() function. The series() function extracts (x, y) pairs from the input document. In the document, each series is in two adjacent columns. The series named I is in columns zero and one; the series() function extracts the relevant column pairs. The pairs() function is created as a lambda object because it's a small generator function with a single parameter. This function builds the desired NamedTuple objects from the sequence of anonymous tuples created by the series() function. Since the output from the raw_data() function is a mapping, we can do something like the following example to pick a specific series by name: >>> raw_data()['I'] [Pair(x=10.0, y=8.04), Pair(x=8.0, y=6.95), ... Given a key, for example, 'I', the series is a list of Pair objects that have the x, y values for each item in the series. Applying a filter In this application, we're using a simple filter. The entire filter process is embodied in the following function: def anscombe_filter( set_id: str, raw_data_map: Dict[str, List[Pair]] ) -> List[Pair]: return raw_data_map[set_id] We made this trivial expression into a function for three reasons: The functional notation is slightly more consistent and a bit more flexible than the subscript expression We can easily expand the filtering to do more We can include separate unit tests in the docstring for this function While a simple lambda would work, it wouldn't be quite as convenient to test. For error handling, we've done exactly nothing. We've focused on what's sometimes called the happy path: an ideal sequence of events. Any problems that arise in this function will raise an exception. The WSGI wrapper function should catch all exceptions and return an appropriate status message and error response content. For example, it's possible that the set_id method will be wrong in some way. Rather than obsess over all the ways it could be wrong, we'll simply allow Python to throw an exception. Indeed, this function follows the Python advice that, it's better to seek forgiveness than to ask permission. This advice is materialized in code by avoiding permission-seeking: there are no preparatory if statements that seek to qualify the arguments as valid. There is only forgiveness handling: an exception will be raised and handled in the WSGI wrapper. This essential advice applies to the preceding raw data and the serialization that we will see now. Serializing the results Serialization is the conversion of Python data into a stream of bytes, suitable for transmission. Each format is best described by a simple function that serializes just that one format. A top-level generic serializer can then pick from a list of specific serializers. The picking of serializers leads to the following collection of functions: Serializer = Callable[[str, List[Pair]], bytes] SERIALIZERS: Dict[str, Tuple[str, Serializer]]= { 'xml': ('application/xml', serialize_xml), 'html': ('text/html', serialize_html), 'json': ('application/json', serialize_json), 'csv': ('text/csv', serialize_csv), } def serialize( format: str, title: str, data: List[Pair] ) -> Tuple[bytes, str]: mime, function = SERIALIZERS.get( format.lower(), ('text/html', serialize_html)) return function(title, data), mime The overall serialize() function locates a specific serializer in the SERIALIZERS dictionary, which maps a format name to a two-tuple. The tuple has a MIME type that must be used in the response to characterize the results. The tuple also has a function based on the Serializer type hint. This function will transform a name and a list of Pair objects into bytes that will be downloaded. The serialize() function doesn't do any data transformation. It merely maps a name to a function that does the hard work of transformation. Returning a function permits the overall application to manage the details of memory or file-system serialization. Serializing to the file system, while slow, permits larger files to be handled. We'll look at the individual serializers below. The serializers fall into two groups: those that produce strings and those that produce bytes. A serializer that produces a string will need to have the string encoded as bytes for download. A serializer that produces bytes doesn't need any further work. For the serializers, which produce strings, we can use function composition with a standardized convert-to-bytes function. Here's a decorator that can standardize the conversion to bytes: from typing import Callable, TypeVar, Any, cast from functools import wraps def to_bytes( function: Callable[..., str] ) -> Callable[..., bytes]: @wraps(function) def decorated(*args, **kw): text = function(*args, **kw) return text.encode("utf-8") return cast(Callable[..., bytes], decorated) We've created a small decorator named @to_bytes. This will evaluate the given function and then encode the results using UTF-8 to get bytes. Note that the decorator changes the decorated function from having a return type of str to a return type of bytes. We haven't formally declared parameters for the decorated function, and used ... instead of the details. We'll show how this is used with JSON, CSV, and HTML serializers. The XML serializer produces bytes directly and doesn't need to be composed with this additional function. We could also do the functional composition in the initialization of the serializers mapping. Instead of decorating the function definition, we could decorate the reference to the function object. Here's an alternative definition for the serializer mapping: SERIALIZERS = { 'xml': ('application/xml', serialize_xml), 'html': ('text/html', to_bytes(serialize_html)), 'json': ('application/json', to_bytes(serialize_json)), 'csv': ('text/csv', to_bytes(serialize_csv)), } This replaces decoration at the site of the function definition with decoration when building this mapping data structure. It seems potentially confusing to defer the decoration. Serializing data into JSON or CSV formats The JSON and CSV serializers are similar because both rely on Python's libraries to serialize. The libraries are inherently imperative, so the function bodies are strict sequences of statements. Here's the JSON serializer: import json @to_bytes def serialize_json(series: str, data: List[Pair]) -> str: """ >>> data = [Pair(2,3), Pair(5,7)] >>> serialize_json( "test", data ) b'[{"x": 2, "y": 3}, {"x": 5, "y": 7}]' """ obj = [dict(x=r.x, y=r.y) for r in data] text = json.dumps(obj, sort_keys=True) return text We created a list-of-dict structure and used the json.dumps() function to create a string representation. The JSON module requires a materialized list object; we can't provide a lazy generator function. The sort_keys=True argument value is helpful for unit testing. However, it's not required for the application and represents a bit of overhead. Here's the CSV serializer: import csv import io @to_bytes def serialize_csv(series: str, data: List[Pair]) -> str: """ >>> data = [Pair(2,3), Pair(5,7)] >>> serialize_csv("test", data) b'x,y\\r\\n2,3\\r\\n5,7\\r\\n' """ buffer = io.StringIO() wtr = csv.DictWriter(buffer, Pair._fields) wtr.writeheader() wtr.writerows(r._asdict() for r in data) return buffer.getvalue() The CSV module's readers and writers are a mixture of imperative and functional elements. We must create the writer, and properly create headings in a strict sequence. We've used the _fields attribute of the Pair namedtuple to determine the column headings for the writer. The writerows() method of the writer will accept a lazy generator function. In this case, we used the _asdict() method of each Pair object to return a dictionary suitable for use with the CSV writer. Serializing data into XML We'll look at one approach to XML serialization using the built-in libraries. This will build a document from individual tags. A common alternative approach is to use Python introspection to examine and map Python objects and class names to XML tags and attributes. Here's our XML serialization: import xml.etree.ElementTree as XML def serialize_xml(series: str, data: List[Pair]) -> bytes: """ >>> data = [Pair(2,3), Pair(5,7)] >>> serialize_xml( "test", data ) b'<series name="test"><row><x>2</x><y>3</y></row><row><x>5</x><y>7</y></row></series>' """ doc = XML.Element("series", name=series) for row in data: row_xml = XML.SubElement(doc, "row") x = XML.SubElement(row_xml, "x") x.text = str(row.x) y = XML.SubElement(row_xml, "y") y.text = str(row.y) return cast(bytes, XML.tostring(doc, encoding='utf-8')) We created a top-level element, <series>, and placed <row> sub-elements underneath that top element. Within each <row> sub-element, we've created <x> and <y> tags, and assigned text content to each tag. The interface for building an XML document using the ElementTree library tends to be heavily imperative. This makes it a poor fit for an otherwise functional design. In addition to the imperative style, note that we haven't created a DTD or XSD. We have not properly assigned a namespace to our tags. We also omitted the <?xml version="1.0"?> processing instruction that is generally the first item in an XML document. The XML.tostring() function has a type hint that states it returns str. This is generally true, but when we provide the encoding parameter, the result type changes to bytes. There's no easy way to formalize the idea of variant return types based on parameter values, so we use an explicit cast() to inform mypy of the actual type. A more sophisticated serialization library could be helpful here. There are many to choose from. Visit https://wiki.python.org/moin/PythonXml for a list of alternatives. Serializing data into HTML In our final example of serialization, we'll look at the complexity of creating an HTML document. The complexity arises because in HTML, we're expected to provide an entire web page with a great deal of context information. Here's one way to tackle this HTML problem: import string data_page = string.Template("""\ <html> <head><title>Series ${title}</title></head> <body> <h1>Series ${title}</h1> <table> <thead><tr><td>x</td><td>y</td></tr></thead> <tbody> ${rows} </tbody> </table> </body> </html> """) @to_bytes def serialize_html(series: str, data: List[Pair]) -> str: """ >>> data = [Pair(2,3), Pair(5,7)] >>> serialize_html("test", data) #doctest: +ELLIPSIS b'<html>...<tr><td>2</td><td>3</td></tr>\\n<tr><td>5</td><td>7</td></tr>... """ text = data_page.substitute( title=series, rows="\n".join( "<tr><td>{0.x}</td><td>{0.y}</td></tr>".format(row) for row in data) ) return text Our serialization function has two parts. The first part is a string.Template() function that contains the essential HTML page. It has two placeholders where data can be inserted into the template. The ${title} method shows where title information can be inserted, and the ${rows} method shows where the data rows can be inserted. The function creates individual data rows using a simple format string. These are joined into a longer string, which is then substituted into the template. While workable for simple cases like the preceding example, this isn't ideal for more complex result sets. There are a number of more sophisticated template tools to create HTML pages. A number of these include the ability to embed the looping in the template, separate from the function that initializes serialization. If you found this tutorial useful and would like to learn more such techniques, head over to get Steven Lott's bestseller, Functional Python Programming. What is the difference between functional and object-oriented programming? Should you move to Python 3? 7 Python experts’ opinions Is Python edging R out in the data science wars?

0
0
4477

How-To Tutorials

article-image-implement-rnn-tensorflow-spam-prediction-tutorial

Packt Editorial Staff

11 Aug 2018

11 min read

Implementing RNN in TensorFlow for spam prediction [Tutorial]

Packt Editorial Staff

11 Aug 2018

11 min read

Artificial neural networks (ANN) are an abstract representation of the human nervous system, which contains a collection of neurons that communicate with each other through connections called axons. A recurrent neural network (RNN) is a class of ANN where connections between units form a directed cycle. RNNs make use of information from the past. That way, they can make predictions in data with high temporal dependencies. This creates an internal state of the network, which allows it to exhibit dynamic temporal behavior. In this article we will look at: Implementation of basic RNNs in TensorFlow. An example of how to implement an RNN in TensorFlow for spam predictions. Train a model that will learn to distinguish between spam and non-spam emails using the text of the email. This article is an extract taken from the book Deep Learning with TensorFlow – Second Edition, written by Giancarlo Zaccone, Md. Rezaul Karim. Implementing basic RNNs in TensorFlow TensorFlow has tf.contrib.rnn.BasicRNNCell and tf.nn.rnn_cell. BasicRNNCell, which provide the basic building blocks of RNNs. However, first let's implement a very simple RNN model, without using either of these. The idea is to have a better understanding of what goes on under the hood. We will create an RNN composed of a layer of five recurrent neurons using the ReLU activation function. We will assume that the RNN runs over only two-time steps, taking input vectors of size 3 at each time step. The following code builds this RNN, unrolled through two-time steps: n_inputs = 3 n_neurons = 5 X1 = tf.placeholder(tf.float32, [None, n_inputs]) X2 = tf.placeholder(tf.float32, [None, n_inputs]) Wx = tf.get_variable("Wx", shape=[n_inputs,n_neurons], dtype=tf. float32, initializer=None, regularizer=None, trainable=True, collections=None) Wy = tf.get_variable("Wy", shape=[n_neurons,n_neurons], dtype=tf. float32, initializer=None, regularizer=None, trainable=True, collections=None) b = tf.get_variable("b", shape=[1,n_neurons], dtype=tf.float32, initializer=None, regularizer=None, trainable=True, collections=None) Y1 = tf.nn.relu(tf.matmul(X1, Wx) + b) Y2 = tf.nn.relu(tf.matmul(Y1, Wy) + tf.matmul(X2, Wx) + b) Then we initialize the global variables as follows: init_op = tf.global_variables_initializer() This network looks much like a two-layer feedforward neural network, but both layers share the same weights and bias vectors. Additionally, we feed inputs at each layer and receive outputs from each layer. X1_batch = np.array([[0, 2, 3], [2, 8, 9], [5, 3, 8], [3, 2, 9]]) # t = 0 X2_batch = np.array([[5, 6, 8], [1, 0, 0], [8, 2, 0], [2, 3, 6]]) # t = 1 These mini-batches contain four instances, each with an input sequence composed of exactly two inputs. At the end, Y1_val and Y2_val contain the outputs of the network at both time steps for all neurons and all instances in the mini-batch. Then we create a TensorFlow session and execute the computational graph as follows: with tf.Session() as sess: init_op.run() Y1_val, Y2_val = sess.run([Y1, Y2], feed_dict={X1: X1_batch, X2: X2_batch}) Finally, we print the result: print(Y1_val) # output at t = 0 print(Y2_val) # output at t = 1 The following is the output: >>> [[ 0. 0. 0. 2.56200171 1.20286 ] [ 0. 0. 0. 12.39334488 2.7824254 ] [ 0. 0. 0. 13.58520699 5.16213894] [ 0. 0. 0. 9.95982838 6.20652485]] [[ 0. 0. 0. 14.86255169 6.98305273] [ 0. 0. 26.35326385 0.66462421 18.31009483] [ 5.12617588 4.76199865 20.55905533 11.71787453 18.92538261] [ 0. 0. 19.75175095 3.38827515 15.98449326]] The network we created is simple, but if you run it over 100 time steps, for example, the graph is going to be very big. Implementing an RNN for spam prediction In this section, we will see how to implement an RNN in TensorFlow to predict spam/ham from texts. Data description and preprocessing The popular spam dataset from the UCI ML repository will be used, which can be downloaded from http://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip. The dataset contains texts from several emails, some of which were marked as spam. Here we will train a model that will learn to distinguish between spam and non-spam emails using only the text of the email. Let's get started by importing the required libraries and model: import os import re import io import requests import numpy as np import matplotlib.pyplot as plt import tensorflow as tf from zipfile import ZipFile from tensorflow.python.framework import ops import warnings Additionally, we can stop printing the warning produced by TensorFlow if you want: warnings.filterwarnings("ignore") os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' ops.reset_default_graph() Now, let's create the TensorFlow session for the graph: sess = tf.Session() The next task is setting the RNN parameters: epochs = 300 batch_size = 250 max_sequence_length = 25 rnn_size = 10 embedding_size = 50 min_word_frequency = 10 learning_rate = 0.0001 dropout_keep_prob = tf.placeholder(tf.float32) Let's manually download the dataset and store it in a text_data.txt file in the temp directory. First, we set the path: data_dir = 'temp' data_file = 'text_data.txt' if not os.path.exists(data_dir): os.makedirs(data_dir) Now, we directly download the dataset in zipped format: if not os.path.isfile(os.path.join(data_dir, data_file)): zip_url = 'http://archive.ics.uci.edu/ml/machine-learning- databases/00228/smsspamcollection.zip' r = requests.get(zip_url) z = ZipFile(io.BytesIO(r.content)) file = z.read('SMSSpamCollection') We still need to format the data: text_data = file.decode() text_data = text_data.encode('ascii',errors='ignore') text_data = text_data.decode().split('\n') Now, store in it the directory mentioned earlier in a text file: with open(os.path.join(data_dir, data_file), 'w') as file_conn: for text in text_data: file_conn.write("{}\n".format(text)) else: text_data = [] with open(os.path.join(data_dir, data_file), 'r') as file_conn: for row in file_conn: text_data.append(row) text_data = text_data[:-1] Let's split the words that have a word length of at least 2: text_data = [x.split('\t') for x in text_data if len(x)>=1] [text_data_target, text_data_train] = [list(x) for x in zip(*text_data)] Now we create a text cleaning function: def clean_text(text_string): text_string = re.sub(r'([^\s\w]|_|[0-9])+', '', text_string) text_string = " ".join(text_string.split()) text_string = text_string.lower() return(text_string) We call the preceding method to clean the text: text_data_train = [clean_text(x) for x in text_data_train] Now we need to do one of the most important tasks, which is creating word embedding –changing text into numeric vectors: vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(max_sequence_length, min_frequency=min_word_frequency) text_processed = np.array(list(vocab_processor.fit_transform(text_data_train))) Now let's shuffle to make the dataset balance: text_processed = np.array(text_processed) text_data_target = np.array([1 if x=='ham' else 0 for x in text_data_target]) shuffled_ix = np.random.permutation(np.arange(len(text_data_target))) x_shuffled = text_processed[shuffled_ix] y_shuffled = text_data_target[shuffled_ix] Now that we have shuffled the data, we can split the data into a training and testing set: ix_cutoff = int(len(y_shuffled)*0.75) x_train, x_test = x_shuffled[:ix_cutoff], x_shuffled[ix_cutoff:] y_train, y_test = y_shuffled[:ix_cutoff], y_shuffled[ix_cutoff:] vocab_size = len(vocab_processor.vocabulary_) print("Vocabulary size: {:d}".format(vocab_size)) print("Training set size: {:d}".format(len(y_train))) print("Test set size: {:d}".format(len(y_test))) Following is the output of the preceding code: >>> Vocabulary size: 933 Training set size: 4180 Test set size: 1394 Before we start training, let's create placeholders for our TensorFlow graph: x_data = tf.placeholder(tf.int32, [None, max_sequence_length]) y_output = tf.placeholder(tf.int32, [None]) Let's create the embedding: embedding_mat = tf.get_variable("embedding_mat", shape=[vocab_size, embedding_size], dtype=tf.float32, initializer=None, regularizer=None, trainable=True, collections=None) embedding_output = tf.nn.embedding_lookup(embedding_mat, x_data) Now it's time to construct our RNN. The following code defines the RNN cell: cell = tf.nn.rnn_cell.BasicRNNCell(num_units = rnn_size) output, state = tf.nn.dynamic_rnn(cell, embedding_output, dtype=tf.float32) output = tf.nn.dropout(output, dropout_keep_prob) Now let's define the way to get the output from our RNN sequence: output = tf.transpose(output, [1, 0, 2]) last = tf.gather(output, int(output.get_shape()[0]) - 1) Next, we define the weights and the biases for the RNN: weight = bias = tf.get_variable("weight", shape=[rnn_size, 2], dtype=tf.float32, initializer=None, regularizer=None, trainable=True, collections=None) bias = tf.get_variable("bias", shape=[2], dtype=tf.float32, initializer=None, regularizer=None, trainable=True, collections=None) The logits output is then defined. It uses both the weight and the bias from the preceding code: logits_out = tf.nn.softmax(tf.matmul(last, weight) + bias) Now we define the losses for each prediction so that later on, they can contribute to the loss function: losses = tf.nn.sparse_softmax_cross_entropy_with_logits_v2(logits=logits_ou t, labels=y_output) We then define the loss function: loss = tf.reduce_mean(losses) We now define the accuracy of each prediction: accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(logits_out, 1), tf.cast(y_output, tf.int64)), tf.float32)) We then create the training_op with RMSPropOptimizer: optimizer = tf.train.RMSPropOptimizer(learning_rate) train_step = optimizer.minimize(loss) Now let's initialize all the variables using the global_variables_initializer() method: init_op = tf.global_variables_initializer() sess.run(init_op) Additionally, we can create some empty lists to keep track of the training loss, testing loss, training accuracy, and the testing accuracy in each epoch: train_loss = [] test_loss = [] train_accuracy = [] test_accuracy = [] Now we are ready to perform the training, so let's get started. The workflow of the training goes as follows: Shuffle the training data Select the training set and calculate generations Run training step for each batch Run loss and accuracy of training Run the evaluation steps. The following codes include all of the aforementioned steps: shuffled_ix = np.random.permutation(np.arange(len(x_train))) x_train = x_train[shuffled_ix] y_train = y_train[shuffled_ix] num_batches = int(len(x_train)/batch_size) + 1 for i in range(num_batches): min_ix = i * batch_size max_ix = np.min([len(x_train), ((i+1) * batch_size)]) x_train_batch = x_train[min_ix:max_ix] y_train_batch = y_train[min_ix:max_ix] train_dict = {x_data: x_train_batch, y_output: \ y_train_batch, dropout_keep_prob:0.5} sess.run(train_step, feed_dict=train_dict) temp_train_loss, temp_train_acc = sess.run([loss,\ accuracy], feed_dict=train_dict) train_loss.append(temp_train_loss) train_accuracy.append(temp_train_acc) test_dict = {x_data: x_test, y_output: y_test, \ dropout_keep_prob:1.0} temp_test_loss, temp_test_acc = sess.run([loss, accuracy], \ feed_dict=test_dict) test_loss.append(temp_test_loss) test_accuracy.append(temp_test_acc) print('Epoch: {}, Test Loss: {:.2}, Test Acc: {:.2}'.format(epoch+1, temp_test_loss, temp_test_acc)) print('\nOverall accuracy on test set (%): {}'.format(np.mean(temp_test_acc)*100.0)) Following is the output of the preceding code: >>> Epoch: 1, Test Loss: 0.68, Test Acc: 0.82 Epoch: 2, Test Loss: 0.68, Test Acc: 0.82 Epoch: 3, Test Loss: 0.67, Test Acc: 0.82 … Epoch: 997, Test Loss: 0.36, Test Acc: 0.96 Epoch: 998, Test Loss: 0.36, Test Acc: 0.96 Epoch: 999, Test Loss: 0.35, Test Acc: 0.96 Epoch: 1000, Test Loss: 0.35, Test Acc: 0.96 Overall accuracy on test set (%): 96.19799256324768 Well done! The accuracy of the RNN is above 96%, which is outstanding. Now let's observe how the loss propagates across each iteration and over time: epoch_seq = np.arange(1, epochs+1) plt.plot(epoch_seq, train_loss, 'k--', label='Train Set') plt.plot(epoch_seq, test_loss, 'r-', label='Test Set') plt.title('RNN training/test loss') plt.xlabel('Epochs') plt.ylabel('Loss') plt.legend(loc='upper left') plt.show() Figure 1: a) RNN training and test loss per epoch b) test accuracy per epoch We also plot the accuracy over time: plt.plot(epoch_seq, train_accuracy, 'k--', label='Train Set') plt.plot(epoch_seq, test_accuracy, 'r-', label='Test Set') plt.title('Test accuracy') plt.xlabel('Epochs') plt.ylabel('Accuracy') plt.legend(loc='upper left') plt.show() We discussed the implementation of RNNs in TensorFlow. We saw how to make predictions with data that has a high temporal dependency and how to develop real-life predictive models that make the predictive analytics easier using RNNs. If you want to delve into neural networks and implement deep learning algorithms check out this book, Deep learning with TensorFlow - Second Edition. Top 5 Deep Learning Architectures Understanding Sentiment Analysis and other key NLP concepts Facelifting NLP with Deep Learning

0
0
5224

How-To Tutorials

article-image-four-ibm-facial-recognition-patents-in-2018-we-found-intriguing

Natasha Mathur

11 Aug 2018

10 min read

Four IBM facial recognition patents in 2018, we found intriguing

Natasha Mathur

11 Aug 2018

10 min read

0
0
5814

article-image-time-series-modeling-what-is-it-why-it-matters-how-its-used

Sunith Shetty

10 Aug 2018

11 min read

Time series modeling: What is it, Why it matters and How it's used

Sunith Shetty

10 Aug 2018

11 min read

A series can be defined as a number of events, objects, or people of a similar or related kind coming one after another; if we add the dimension of time, we get a time series. A time series can be defined as a series of data points in time order. In this article, we will understand what time series is and why it is one of the essential characteristics for forecasting. This article is an excerpt from a book written by Harish Gulati titled SAS for Finance. The importance of time series What importance, if any, does time series have and how will it be relevant in the future? These are just a couple of fundamental questions that any user should find answers to before delving further into the subject. Let's try to answer this by posing a question. Have you heard the terms big data, artificial intelligence (AI), and machine learning (ML)? These three terms make learning time series analysis relevant. Big data is primarily about a large amount of data that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interaction. AI is a kind of technology that is being developed by data scientists, computational experts, and others to enable processes to become more intelligent, while ML is an enabler that is helping to implement AI. All three of these terms are interlinked with the data they use, and a lot of this data is time series in its nature. This could be either financial transaction data, the behavior pattern of individuals during various parts of the day, or related to life events that we might experience. An effective mechanism that enables us to capture the data, store it, analyze it, and then build algorithms to predict transactions, behavior (and life events, in this instance) will depend on how big data is utilized and how AI and MI are leveraged. A common perception in the industry is that time series data is used for forecasting only. In practice, time series data is used for: Pattern recognition Forecasting Benchmarking Evaluating the influence of a single factor on the time series Quality control For example, a retailer may identify a pattern in clothing sales every time it gets a celebrity endorsement, or an analyst may decide to use car sales volume data from 2012 to 2017 to set a selling benchmark in units. An analyst might also build a model to quantify the effect of Lehman's crash at the height of the 2008 financial crisis in pushing up the price of gold. Variance in the success of treatments across time periods can also be used to highlight a problem, the tracking of which may enable a hospital to take remedial measures. These are just some of the examples that showcase how time series analysis isn't limited to just forecasting. In this chapter, we will review how the financial industry and others use forecasting, discuss what a good and a bad forecast is, and hope to understand the characteristics of time series data and its associated problems. Forecasting across industries Since one of the primary uses of time series data is forecasting, it's wise that we learn about some of its fundamental properties. To understand what the industry means by forecasting and the steps involved, let's visit a common misconception about the financial industry: only lending activities require forecasting. We need forecasting in order to grant personal loans, mortgages, overdrafts, or simply assess someone's eligibility for a credit card, as the industry uses forecasting to assess a borrower's affordability and their willingness to repay the debt. Even deposit products such as savings accounts, fixed-term savings, and bonds are priced based on some forecasts. How we forecast and the rationale for that methodology is different in borrowing or lending cases, however. All of these areas are related to time series, as we inevitably end up using time series data as part of the overall analysis that drives financial decisions. Let's understand the forecasts involved here a bit better. When we are assessing an individual's lending needs and limits, we are forecasting for a single person yet comparing the individual to a pool of good and bad customers who have been offered similar products. We are also assessing the individual's financial circumstances and behavior through industry-available scoring models or by assessing their past behavior, with the financial provider assessing the lending criteria. In the case of deposit products, as long as the customer is eligible to transact (can open an account and has passed know your customer (KYC), anti-money laundering (AML), and other checks), financial institutions don't perform forecasting at an individual level. However, the behavior of a particular customer is primarily driven by the interest rate offered by the financial institution. The interest rate, in turn, is driven by the forecasts the financial institution has done to assess its overall treasury position. The treasury is the department that manages the central bank's money and has the responsibility of ensuring that all departments are funded, which is generated through lending and attracting deposits at a lower rate than a bank lends. The treasury forecasts its requirements for lending and deposits, while various teams within the treasury adhere to those limits. Therefore, a pricing manager for a deposit product will price the product in such a way that the product will attract enough deposits to meet the forecasted targets shared by the treasury; the pricing manager also has to ensure that those targets aren't overshot by a significant margin, as the treasury only expects to manage a forecasted target. In both lending and deposit decisions, financial institutions do tend to use forecasting. A lot of these forecasts are interlinked, as we saw in the example of the treasury's expectations and the subsequent pricing decision for a deposit product. To decide on its future lending and borrowing positions, the treasury must have used time series data to determine what the potential business appetite for lending and borrowing in the market is and would have assessed that with the current cash flow situation within the relevant teams and institutions. Characteristics of time series data Any time series analysis has to take into account the following factors: Seasonality Trend Outliers and rare events Disruptions and step changes Seasonality Seasonality is a phenomenon that occurs each calendar year. The same behavior can be observed each year. A good forecasting model will be able to incorporate the effect of seasonality in its forecasts. Christmas is a great example of seasonality, where retailers have come to expect higher sales over the festive period. Seasonality can extend into months but is usually only observed over days or weeks. When looking at time series where the periodicity is hours, you may find a seasonality effect for certain hours of the day. Some of the reasons for seasonality include holidays, climate, and changes in social habits. For example, travel companies usually run far fewer services on Christmas Day, citing a lack of demand. During most holidays people love to travel, but this lack of demand on Christmas Day could be attributed to social habits, where people tend to stay at home or have already traveled. Social habit becomes a driving factor in the seasonality of journeys undertaken on Christmas Day. It's easier for the forecaster when a particular seasonal event occurs on a fixed calendar date each year; the issue comes when some popular holidays depend on lunar movements, such as Easter, Diwali, and Eid. These holidays may occur in different weeks or months over the years, which will shift the seasonality effect. Also, if some holidays fall closer to other holiday periods, it may lead to individuals taking extended holidays and travel sales may increase more than expected in such years. The coffee shop near the office may also experience lower sales for a longer period. Changes in the weather can also impact seasonality; for example, a longer, warmer summer may be welcome in the UK, but this would impact retail sales in the autumn as most shoppers wouldn't need to buy a new wardrobe. In hotter countries, sales of air-conditioners would increase substantially compared to the summer months' usual seasonality. Forecasters could offset this unpredictability in seasonality by building in a weather forecast variable. We will explore similar challenges in the chapters ahead. Seasonality shouldn't be confused with a cyclic effect. A cyclic effect is observed over a longer period of generally two years or more. The property sector is often associated with having a cyclic effect, where it has long periods of growth or slowdown before the cycle continues. Trend A trend is merely a long-term direction of observed behavior that is found by plotting data against a time component. A trend may indicate an increase or decrease in behavior. Trends may not even be linear, but a broad movement can be identified by analyzing plotted data. Outliers and rare events Outliers and rare events are terminologies that are often used interchangeably by businesses. These concepts can have a big impact on data, and some sort of outlier treatment is usually applied to data before it is used for modeling. It is almost impossible to predict an outlier or rare event but they do affect a trend. An example of an outlier could be a customer walking into a branch to deposit an amount that is 100 times the daily average of that branch. In this case, the forecaster wouldn't expect that trend to continue. Disruptions Disruptions and step changes are becoming more common in time series data. One reason for this is the abundance of available data and the growing ability to store and analyze it. Disruptions could include instances when a business hasn't been able to trade as normal. Flooding at the local pub may lead to reduced sales for a few days, for example. While analyzing daily sales across a pub chain, an analyst may have to make note of a disruptive event and its impact on the chain's revenue. Step changes are also more common now due to technological shifts, mergers and acquisitions, and business process re-engineering. When two companies announce a merger, they often try to sync their data. They might have been selling x and y quantities individually, but after the merger will expect to sell x + y + c (where c is the positive or negative effect of the merger). Over time, when someone plots sales data in this case, they will probably spot a step change in sales that happened around the time of the merger, as shown in the following screenshot: In the trend graph, we can see that online travel bookings are increasing. In the step change and disruptions chart, we can see that Q1 of 2012 saw a substantive increase in bookings, where Q1 of 2014 saw a substantive dip. The increase was due to the merger of two companies that took place in Q1 of 2012. The decrease in Q1 of 2014 was attributed to prolonged snow storms in Europe and the ash cloud disruption from volcanic activity over Iceland. While online bookings kept increasing after the step change, the disruption caused by the snow storm and ash cloud only had an effect on sales in Q1 of 2014. In this case, the modeler will have to treat the merger and the disruption differently while using them in the forecast, as disruption could be disregarded as an outlier and treated accordingly. Also note that the seasonality chart shows that Q4 of each year sees almost a 20% increase in travel bookings, and this pattern continues each calendar year. In this article, we defined time series and learned why it is important for forecasting. We also looked at the characteristics of time series data. To know more how to leverage the analytical power of SAS to perform financial analysis efficiently, you can check out the book SAS for Finance. Read more Getting to know SQL Server options for disaster recovery Implementing a simple Time Series Data Analysis in R Training RNNs for Time Series Forecasting

0
0
7247

article-image-send-email-notifications-using-sendgrid

Packt Editorial Staff

10 Aug 2018

6 min read

How to send email Notifications using SendGrid

Packt Editorial Staff

10 Aug 2018

6 min read

SendGrid is one of the popular services that allow the audience to send emails for different purposes. In today’s tutorial we will explore to: Create SendGrid account Generate SendGrid API Key Configure SendGrid API key with Azure function app Send an email notification to the website administrator Here, we will learn how to create a SendGrid output binding and send an email notification to the administrator with a static content. In general there would be only administrators so we will be hard coding the email address of the administrator in the To address field of the SendGrid output binding Getting ready Create a SendGrid account API Key from the Azure Management Portal. Generate an API Key from the SendGrid Portal. Create a SendGrid account Navigate to Azure Management Portal and create a SendGrid Email Delivery account by searching for the same in the Marketplace shown as follows: In the SendGrid Email Delivery blade, click on Create button to navigate to the Create a new SendGrid Account. Please select Free tier in the Pricing tier and provide all other details and click on the Create button shown as follows: Once the account is created successfully, navigate to the SendGrid account. You can use the search box available in the top which is shown as follows: Navigate to the Settings, choose configurations and grab the username and SmtpServer from the Configurations blade. Generate SendGrid API key In order to utilize SendGrid account by the Azure Functions runtime, we need to provide the SendGrid API key as input to the Azure Functions. You can generate an API Key from the SendGrid portal. Let's navigate to the SendGrid portal by clicking on the Manage button in the Essentials blade of the SendGrid account shown as follows: In the SendGrid portal, click on the API Keys under Settings section of the Left hand side menu shown as follows: In the API Keys page, click on Create API Key shown as follows: In the Create API Key popup, provide a name and choose the API Key Permissions and click on Create & View button. After a moment you will be able to see the API key. Click on the key to copy the same to the clipboard: Configure SendGrid API key with Azure Function app Create a new app setting in the Azure Function app by navigating to the Application Settings blade under the Platform features section of the function app shown as follows: Click on Save button after adding the app settings in the preceding step. How to do it... Navigate to the Integrate tab of the RegisterUser function and click on New Output button to add a new output binding. Choose the SendGrid output binding and click on Select button to add the binding. Please provide the following parameters in the SendGrid output binding: Message parameter name - leave the default value - message. We will be using this parameter in the run method in a moment. SendGrid API key: Please provide the app settings key that you have created in the application settings. To address: Please provide the email address of the administrator. From address: Please provide the email address from where you would like to send the email. In general, it would be kind of [email protected]. Message subject: Please provide the subject that you would like to have in the email subject. Message Text: Please provide the email body text that you would like to have in the email body. Below is how the SendGrid output binding should look like after providing all the fields: Once you review the values, click on Save to save the changes. Navigate to Run method and make the following changes: Add a new reference for SendGrid and also the namespace Add a new out parameter message of type Mail. Create an object of type Mail. Following is the complete code of the Run method: #r "Microsoft.WindowsAzure.Storage" #r "SendGrid" using System.Net; using SendGrid.Helpers.Mail; using Microsoft.WindowsAzure.Storage.Table; using Newtonsoft.Json; public static void Run(HttpRequestMessage req, TraceWriter log, CloudTable objUserProfileTable, out string objUserProfileQueueItem, out Mail message ) { var inputs = req.Content.ReadAsStringAsync().Result; dynamic inputJson = JsonConvert.DeserializeObject<dynamic>(inputs); string firstname= inputJson.firstname; string lastname=inputJson.lastname; string profilePicUrl = inputJson.ProfilePicUrl; objUserProfileQueueItem = profilePicUrl; UserProfile objUserProfile = new UserProfile(firstname, lastname); TableOperation objTblOperationInsert = TableOperation.Insert(objUserProfile); objUserProfileTable.Execute(objTblOperationInsert); message = new Mail(); } public class UserProfile : TableEntity { public UserProfile(string lastName, string firstname,string profilePicUrl) { this.PartitionKey = "p1"; this.RowKey = Guid.NewGuid().ToString();; this.FirstName = firstName; this.LastName = lastName; this.ProfilePicUrl = profilePicUrl; } public UserProfile() { } public string FirstName { get; set; } public string LastName { get; set; } public string ProfilePicUrl {get; set;} } Now, let's test the functionality of sending the email by navigating to the RegisterUser function and submit a request with the some test values: { "firstname": "Bill", "lastname": "Gates", "ProfilePicUrl":"https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/ Bill_Gates_June_2015.jpg/220px-Bill_Gates_June_2015.jpg" } How it works... The aim here is to send a notification via email to an administrator updating that a new registration got created successfully. We have used the one of the Azure Function experimental templates named SendGrid as a SMTP server for sending the emails by hard coding the following properties in the SendGrid output bindings: From email address To email address Subject of the email Body of the email SendGrid output bindings will use the API key provided in the app settings to invoke the required APIs of the SendGrid library for sending the emails. To summarize, we learnt about sending an email notification using SendGrid service. [box type="shadow" align="" class="" width=""]This article is an excerpt from the book, Azure Serverless Computing Cookbook, written by Praveen Kumar Sriram. It contains over 50 recipes to help you build applications hosted on Serverless architecture using Azure Functions.[/box] 5 reasons why your business should adopt cloud computing Alibaba Cloud partners with SAP to provide a versatile, one-stop cloud computing environment Top 10 IT certifications for cloud and networking professionals in 2018

0
0
7541

How-To Tutorials

article-image-visualizing-data-r-and-python-using-anaconda

Natasha Mathur

09 Aug 2018

7 min read

Visualizing data in R and Python using Anaconda [Tutorial]

Natasha Mathur

09 Aug 2018

7 min read

It is said that a picture is worth a thousand words. Through various pictures and graphical presentations, we can express many abstract concepts, theories, data patterns, or certain ideas much clearer. Data can be messy at times, and simply showing the data points would confuse audiences further. If we could have a simple graph to show its main characteristics, properties, or patterns, it would help greatly. In this tutorial, we explain why we should care about data visualization and then we will discuss techniques used for data visualization in R and Python. This article is an excerpt from a book 'Hands-On Data Science with Anaconda' written by Dr. Yuxing Yan, James Yan. Data visualization in R Firstly, let's see the simplest graph for R. With the following one-line R code, we draw a cosine function from -2π to 2π: > plot(cos,-2*pi,2*pi) The related graph is shown here: Histograms could also help us understand the distribution of data points. The previous graph is a simple example of this. First, we generate a set of random numbers drawn from a standard normal distribution. For the purposes of illustration, the first line of set.seed() is actually redundant. Its existence would guarantee that all users would get the same set of random numbers if the same seed was used ( 333 in this case). In other words, with the same set of input values, our histogram would look the same. In the next line, the rnorm(n) function draws n random numbers from a standard normal distribution. The last line then has the hist() function to generate a histogram: > set.seed(333) > data<-rnorm(5000) > hist(data) The associated histogram is shown here: Note that the code of rnorm(5000) is the same as rnorm(5000,mean=0,sd=1), which implies that the default value of the mean is 0 and the default value for sd is 1. The next R program would shade the left-tail for a standard normal distribution: x<-seq(-3,3,length=100) y<-dnorm(x,mean=0,sd=1) title<-"Area under standard normal dist & x less than -2.33" yLabel<-"standard normal distribution" xLabel<-"x value" plot(x,y,type="l",lwd=3,col="black",main=title,xlab=xLabel,ylab=yLabel) x<-seq(-3,-2.33,length=100) y<-dnorm(x,mean=0,sd=1) polygon(c(-4,x,-2.33),c(0,y,0),col="red") The related graph is shown here: Note that according to the last line in the preceding graph, the shaded area is red. In terms of exploring the properties of various datasets, the R package called rattle is quite useful. If the rattle package is not preinstalled, we could run the following code to install it: > install.packages("rattle") Then, we run the following code to launch it; > library(rattle) > rattle() After hitting the Enter key, we can see the following: As our first step, we need to import certain datasets. For the sources of data, we choose from seven potential formats, such as File, ARFF, ODBC, R Dataset, and RData File, and we can load our data from there. The simplest way is using the Library option, which would list all the embedded datasets in the rattle package. After clicking Library, we can see a list of embedded datasets. Assume that we choose acme:boot:Monthly Excess Returns after clicking Execute in the top left. We would then see the following: Now, we can study the properties of the dataset. After clicking Explore, we can use various graphs to view our dataset. Assume that we choose Distribution and select the Benford check box. We can then refer to the following screenshot for more details: After clicking Execute, the following would pop up. The top red line shows the frequencies for the Benford Law for each digits of 1 to 9, while the blue line at the bottom shows the properties of our data set. Note that if you don't have the reshape package already installed in your system, then this either won't run or will ask for permission to install the package to your computer: The dramatic difference between those two lines indicates that our data does not follow a distribution suggested by the Benford Law. In our real world, we know that many people, events, and economic activities are interconnected, and it would be a great idea to use various graphs to show such a multi-node, interconnected picture. If the qgraph package is not preinstalled, users have to run the following to install it: > install.packages("qgraph") The next program shows the connection from a to b, a to c, and the like: library(qgraph) stocks<-c("IBM","MSFT","WMT") x<-rep(stocks, each = 3) y<-rep(stocks, 3) correlation<-c(0,10,3,10,0,3,3,3,0) data <- as.matrix(data.frame(from =x, to =y, width =correlation)) qgraph(data, mode = "direct", edge.color = rainbow(9)) If the data is shown, the meaning of the program will be much clearer. The correlation shows how strongly those stocks are connected. Note that all those values are randomly chosen with no real-world meanings: > data from to width [1,] "IBM" "IBM" " 0" [2,] "IBM" "MSFT" "10" [3,] "IBM" "WMT" " 3" [4,] "MSFT" "IBM" "10" [5,] "MSFT" "MSFT" " 0" [6,] "MSFT" "WMT" " 3" [7,] "WMT" "IBM" " 3" [8,] "WMT" "MSFT" " 3" [9,] "WMT" "WMT" " 0" A high value for the third variable suggests a stronger correlation. For example, IBM is more strongly correlated with MSFT, with a value of 10, than its correlation with WMT, with a value of 3. The following graph shows how strongly those three stocks are correlated: The following program shows the relationship or interconnection between five factors: library(qgraph) data(big5) data(big5groups) title("Correlations among 5 factors",line = 2.5) qgraph(cor(big5),minimum = 0.25,cut = 0.4,vsize = 1.5, groups = big5groups,legend = TRUE, borders = FALSE,theme = 'gray') The related graph is shown here: Data visualization in Python The most widely used Python package for graphs and images is called matplotlib. The following program can be viewed as the simplest Python program to generate a graph since it has just three lines: import matplotlib.pyplot as plt plt.plot([2,3,8,12]) plt.show() The first command line would upload a Python package called matplotlib.pyplot and rename it to plt. Note that we could even use other short names, but it is conventional to use plt for the matplotlib package. The second line plots four points, while the last one concludes the whole process. The completed graph is shown here: For the next example, we add labels for both x and y, and a title. The function is the cosine function with an input value varying from -2π to 2π: import scipy as sp import matplotlib.pyplot as plt x=sp.linspace(-2*sp.pi,2*sp.pi,200,endpoint=True) y=sp.cos(x) plt.plot(x,y) plt.xlabel("x-value") plt.ylabel("Cosine function") plt.title("Cosine curve from -2pi to 2pi") plt.show() The nice-looking cosine graph is shown here: If we received $100 today, it would be more valuable than what would be received in two years. This concept is called the time value of money, since we could deposit $100 today in a bank to earn interest. The following Python program uses size to illustrate this concept: import matplotlib.pyplot as plt fig = plt.figure(facecolor='white') dd = plt.axes(frameon=False) dd.set_frame_on(False) dd.get_xaxis().tick_bottom() dd.axes.get_yaxis().set_visible(False) x=range(0,11,2) x1=range(len(x),0,-1) y = [0]*len(x); plt.annotate("$100 received today",xy=(0,0),xytext=(2,0.15),arrowprops=dict(facecolor='black',shrink=2)) plt.annotate("$100 received in 2 years",xy=(2,0),xytext=(3.5,0.10),arrowprops=dict(facecolor='black',shrink=2)) s = [50*2.5**n for n in x1]; plt.title("Time value of money ") plt.xlabel("Time (number of years)") plt.scatter(x,y,s=s); plt.show() The associated graph is shown here. Again, the different sizes show their present values in relative terms: To summarize, we discussed ways data visualization works in Python and R. Visual presentations can help our audience understand data better. If you found this post useful, check out the book 'Hands-On Data Science with Anaconda' to learn about different types of visual representation written in languages such as R, Python, Julia, etc. A tale of two tools: Tableau and Power BI Anaconda Enterprise version 5.1.1 released! 10 reasons why data scientists love Jupyter notebooks

0
0
4797

How-To Tutorials

Getting started with F# for .Net Core application development [Tutorial]

Multithreading in Rust using Crates [Tutorial]

Understanding functional reactive programming in Scala [Tutorial]

MongoDB Sharding: Sharding clusters and choosing the right shard key [Tutorial]

Modern Cloud Native architectures: Microservices, Containers, and Serverless - Part 2

Access application data with Entity Framework in .NET Core [Tutorial]

Polymorphism and type-pattern matching in Python [Tutorial]

Modern Cloud Native architectures: Microservices, Containers, and Serverless - Part 1

Building a Tic-tac-toe game in ASP.Net Core 2.0 [Tutorial]

Writing web services with functional Python programming [Tutorial]

Trending Topics

Implementing RNN in TensorFlow for spam prediction [Tutorial]

Four IBM facial recognition patents in 2018, we found intriguing

Time series modeling: What is it, Why it matters and How it's used

How to send email Notifications using SendGrid

Visualizing data in R and Python using Anaconda [Tutorial]