Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

Tech Guides

851 Articles
article-image-write-python-code-or-pythonic-code
Aaron Lazar
08 Aug 2018
5 min read
Save for later

Do you write Python Code or Pythonic Code?

Aaron Lazar
08 Aug 2018
5 min read
If you’re new to Programming, and Python in particular, you might have heard the term Pythonic being brought up at tech conferences, meetups and even at your own office. You might have also wondered why the term and whether they’re just talking about writing Python code. Here we’re going to understand what the term Pythonic means and why you should be interested in learning how to not just write Python code, rather write Pythonic code. What does Pythonic mean? When people talk about pythonic code, they mean that the code uses Python idioms well, that it’s natural or displays fluency in the language. In other words, it means the most widely adopted idioms that are adopted by the Python community. If someone said you are writing un-pythonic code, they might actually mean that you are attempting to write Java/C++ code in Python, disregarding the Python idioms and performing a rough transcription rather than an idiomatic translation from the other language. Okay, now that you have a theoretical idea of what Pythonic (and unpythonic) means, let’s have a look at some Pythonic code in practice. Writing Pythonic Code Before we get into some examples, you might be wondering if there’s a defined way/method of writing Pythonic code. Well, there is, and it’s called PEP 8. It’s the official style guide for Python. Example #1 x=[1, 2, 3, 4, 5, 6] result = [] for idx in range(len(x)); result.append(x[idx] * 2) result Output: [2, 4, 6, 8, 10, 12] Consider the above code, where you’re trying to multiply some elements, “x” by 2. So, what we did here was, we created an empty list to store the results. We would then append the solution of the computation into the result. The result now contains a function which is 2 multiplied by each of the elements. Now, if you were to write the same code in a Pythonic way, you might want to simply use list comprehensions. Here’s how: x=[1, 2, 3, 4, 5, 6] [(element * 2) for element in x] Output: [2, 4, 6, 8, 10] You might have noticed, we skipped the entire for loop! Example #2 Let’s make the previous example a bit more complex, and place a condition that the elements should be multiplied by 2 only if they are even. x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] result = [] for idx in range(len(x)); if x[idx] % 2 == 0; result.append(x[idx] * 2) else; result.append(x[idx]) result Output: [1, 4, 3, 8, 5, 12, 7, 16, 9, 20] We’ve actually created an if else statement to solve this problem, but there is a simpler way of doing things the Pythonic way. [(element * 2 if element % 2 == 0 else element) for element in x] Output: [1, 4, 3, 8, 5, 12, 7, 16, 9, 20] If you notice what we’ve done here, apart from skipping multiple lines of code, is that we used the if-else statement in the same sentence. Now, if you wanted to perform filtering, you could do this: x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [element * 2 for element in x if element % 2 == 0] Output: [4, 8, 12, 16, 20] What we’ve done here is put the if statement after the for declaration, and Voila! We’ve achieved filtering. If you’re using a nice IDE like Jupyter Notebooks or PyCharm, they will help you format your code as per the PEP 8 suggestions. Why should you write Pythonic code? Well firstly, you’re saving loads of time writing humongous piles of cowdung code, so you’re obviously becoming a smarter and more productive programmer. Python is a pretty slow language, and when you’re trying to do something in Python, which is acquired from another language like Java or C++, you’re going to worsen things. With idiomatic, Pythonic code, you’re improving the speed of your programs. Moreover, idiomatic code is far easier to comprehend and understand for other developers who are working on the same code. It helps a great deal when you’re trying to refactor someone else’s code. Fearing Pythonic idioms Well, I don’t mean the idioms themselves are scary. Rather, quite a few developers and organisations have begun discriminating on the basis of whether someone can or cannot write Pythonic code. This is wrong, because, at the end of the day, though the PEP 8 exists, the idea of the term Pythonic is different for different people. To some it might mean picking up a new style guide and improving the way you code. To others, it might mean being succinct and not repeating themselves. It’s time we stopped judging people on whether they can or can’t write Pythonic code and instead, we should appreciate when someone is able to present readable, easily maintainable and succinct code. If you find them writing a bit of clumsy code, you can choose to talk to them about improving their design considerations. And the world will be a better place! If you’re interested in learning how to write more succinct and concise Python code, check out these resources: Learning Python Design Patterns - Second Edition Python Design Patterns [Video] Python Tips, Tricks and Techniques [Video]    
Read more
  • 0
  • 2
  • 24344

article-image-cloud-pricing-comparison-aws-vs-azure
Guest Contributor
02 Feb 2019
11 min read
Save for later

Cloud pricing comparison: AWS vs Azure

Guest Contributor
02 Feb 2019
11 min read
On average, businesses waste about 35% of their cloud spend due to inefficiently using their cloud resources. This amounts to more than $10 billion in wasted cloud spend across just the top three public cloud providers. Although the unmatched compute power, data storage options and efficient content delivery systems of the leading public cloud providers can support incredible business growth, this can cause some hubris. It’s easy to lose control of costs when your cloud provider appears to be keeping things running smoothly. To stop this from happening, it’s essential to adopt a new approach to how we manage - and optimize - cloud spend. It’s not an easy thing to do, as pricing structures can be complicated. However, in this post, we’ll look at how both AWS and Azure structure their pricing, and how you can best determine what’s right for you. Different types of cloud pricing schemes Broadly, the pricing model for cloud services can range from a pure subscription-based model, where services are charged based on a cloud catalog and users are billed per month, per mailbox, or app license ordered. In this instance, subscribers are billed for all the resources to which they are subscribed, irrespective of whether they are used or not. The other option is pay-as-you-go. This is where subscribers begin with a billing amount set at 0, which then grows with the services and resources they use.. Amazon uses the Pay-As-You-Go model, charging a predetermined price for every hour of virtual machine resources used. Such a model is also used by other leading cloud service providers including Microsoft Azure and Google’s Google Cloud Platform. Another variant of cloud pricing is an enterprise billing service. This is based on the number of active users assigned to a particular cloud subscription. Microsoft Azure is a leading cloud provider that offers cloud subscription for its customers. Most cloud providers offer varying combinations of the above three models with attractive discount options built-in. These include: What free tier services do AWS and Azure offer? Both AWS and Azure offer a ‘free tier’ service for new and initial subscribers. This is for potential long-time subscribers to test out the service before committing for the long run. For AWS, Amazon allows subscribers to try out most of AWS’ services free for a year, including RDS, S3, EC2, Elastic Block Store, Elastic Load Balancing (EBS) and other AWS services. For example, you can utilize EC2 and EBS on the free tier to host a website for a whole year. EBS pricing will be zero unless your usage exceeds the limit of 30GB of storage. The free tier for the EC2 includes 730 hours of a t2.micro instance. Azure offers similar deals for new users. Azure’s services like App Service, Virtual Machines, Azure SQL Database, Blob Storage and Azure Kubernetes Service (AKS) are free for the initial period of 12 months. Additionally, Azure provides the ‘Functions’ compute service (for serverless) at 1 million requests free every month throughout the subscription. This is useful if you want to give serverless a try. AWS and Azure’s pay-as-you-go, on-demand pricing models Under the pay-as-you-go model, AWS and Azure offer subscribers the option to simply settle their bills at the end of every month without any upfront investment. This is a good option if you want to avoid a long-term and binding contract. Most resources are available on demand and charged on a per hour basis, and costs are calculated based on the number of hours the resource was used. For data storage and data transfer, the rates are generally calculated per Gigabyte. Subscribers are notified 30 days in advance for any changes in the Pay As You Go rates as well as when new services are added periodically to the platform. Reserve-and-pay-less pricing model In addition to the on-demand pricing model, Amazon AWS has an alternate scheme called Reserved Instance (RI) that allows the subscriber to reserve capacity for specific products. RI offers discounted hourly rates and capacity reservation for its EC2 and RDS services. A subscriber can reserve a resource and can save up to 75% of total billing costs in the long run. These discounted rates are automatically added to the subscriber’s AWS bills. Subscribers have the option to reserve instances either for a 1-year or a 3-year term. Microsoft Azure offers to help subscribers save up to 72% of their billing costs compared to its pay-as-you-go model when subscribers sign up for one to three-year terms for Windows and Linux virtual machines (VMs). Microsoft also allows for added flexibility in the sense that if your business needs change, you can cancel your Azure RI subscription at any time and return the remaining unused RI to Microsoft for an early termination fee. Use-more-and-pay-less pricing model In addition to the above payment options, AWS offers subscribers one additional payment option. When it comes to data transfer and data storage services, AWS gives discounts based on the subscriber’s usage. These volume-based discounts help subscribers realize critical savings as their usage increases. Subscribers can benefit from the economies of scale, allowing their businesses to grow while costs are kept relatively under control. AWS also gives subscribers the option to sign up for services that help their growing business. As an example, AWS’ storage services offer subscribers with opportunities to lower pricing based on how frequently data is accessed and performance needed in the retrieval process. For EC2, you can get a discount of up to 10% if you reserve more. The image below demonstrates the pricing of the AWS S3 bucket based on usage. Comparing Cloud Pricing on Azure and AWS As the major cloud service providers – Amazon Web Services, Azure, Google Cloud Platform and IBM – continually decrease prices of cloud instances, provide new and innovative discount options, include additional instances, and drop billing increments. In some cases, especially, Microsoft Azure, per second billing has also been introduced. However, as costs decrease, the complexity increases. It is paramount for subscribers to understand and efficiently navigate this complexity. We take a crack at it here. Reserved Instance Pricing Given the availability of Reserved Instances by Azure, AWS and GCP have also introduced publicly available discounts, some reaching up to 75%. This is in exchange for signing up to use the services of the particular cloud service provider for a one year to 3 year period. We’ve briefly covered this in the section above. Before signing up, however, subscribers need to understand the amount of usage they are committing to and how much of usage to leave as an ‘on-demand’ option. To do this, subscribers need to consider many different factors – Historical usage – by region, instance type, etc Steady-state vs. part-time usage An estimate of usage growth or decline Probability of switching cloud service providers Choosing alternative computing models like serverless, containers, etc. On-Demand Instance Pricing On-Demand Instances work best for applications that have short-term, irregular workloads but critical enough as to not be interrupted. For instance, if you’re running cron jobs on a periodic basis that lasts for a few hours, you can move them to on-demand instances. Each On-Demand Instance is billed per instance hour from time it is launched until it is terminated. These are most useful during the testing or development phase of applications. On-demand instances are available in many varying levels of computing power, designed for different tasks executed within the cloud environment. These on-demand instances have no binding contractual commitments and can be used as and when required. Generally, on-demand instances are among the most expensive purchasing options for instances. Each on-demand instance is billed at a per instance hour from the time it is launched until it is stopped or terminated. If partial instance hours are used, these are rounded up to the full hour during billing. The chart below shows the on-demand price per hour for AWS and Azure cloud services and the hourly price for each GB of RAM. VM Type AWS OD Hourly Azure OD Hourly AWS OD / GB RAM Azure OD / GB RAM Standard 2 vCPU w Local SSD $0.133 $0.100 $0.018 $0.013 Standard 2 vCPU no local disk $0.100 $0.100 $0.013 $0.013 Highmem 2 vCPU w Local SSD $0.166 $0.133 $0.011 $0.008 Highmem 2 vCPU no local disk $0.133 $0.133 $0.009 $0.008 Highcpu 2 vCPU w Local SSD $0.105 $0.085 $0.028 $0.021 Highcpu 2 vCPU no local disk $0.085 $0.085 $0.021 $0.021   The on-demand price of Azure instances is cheaper compared to AWS for certain VM types. The price difference is evident for instances with local SSD. Discounted Cloud Instance Pricing When it comes to discounted cloud pricing, it is important to remember that this comes with a lock-in period of 1 – 3 years. Therefore, it would work best for organizations that are more stable and have a good idea of what their historical cloud usage is and can fairly accurately predict what cloud services they would require over the next 12 month period. In the table below, we have looked at annual costs of both AWS and Azure. VM Type AWS 1 Y RI Annual Azure 1 Y RI Annual AWS 1 Y RI Annual / GB RAM Azure 1 Y RI Annual / GB RAM Standard 2 vCPU w Local SSD $867 $508 $116 $64 Standard 2 vCPU no local disk $622 $508 $78 $64 Highmem 2 vCPU w Local SSD $946 $683 $63 $43 Highmem 2 vCPU no local disk $850 $683 $56 $43 Highcpu 2 vCPU w Local SSD $666 $543 $178 $136 Highcpu 2 vCPU no local disk $543 $543 $136 $136 Azure’s rates are clearly better than Amazon’s pricing and by a good margin. Azure offers better-discounted rates for Standard, Highmem and High CPU compute instances.   Optimizing Cloud Pricing Subscribers need to move beyond short-term, one time fixes and make use of automation to continuously monitor their spend, raise alerts for over or underuse of service and also take an automated action based on a predetermined condition. Here are some of the ways you can optimize your cloud spending: Cloud Pricing Calculators Cloud Pricing tools enable you to list the different parameters for your AWS or Azure subscriptions. You can use these tools to calculate an approximate monthly cost that would likely be incurred. AWS Simple Monthly Calculator You can try the official cloud pricing calculators from AWS and Azure or a third-party pricing calculator. Calculators help you to optimize your pricing based on your requirements. For example, if you have a long-term requirement for running instances, and if you’re currently running them using on-demand pricing schemes, cloud calculators can offer better insights into reserved-instance schemes and other ways that you can improve your cloud expenditure. For instance, this Azure calculator by NetApp offers more price optimization option. This includes options to tier less frequently used data to storage objects like Azure Blob and customize snapshot creation and storage efficiency. Zerto is another popular calculator for Azure and AWS with a simpler interface. However, note that the estimated cost is based on current pricing and is subject can be liable to change. Price List API Historically, for potential users to narrow down on the final usage cost involved a considerable amount of manual rate checks. They involve collecting price points, and checking and cross-referencing them manually. In the case of AWS, the Price List API offers programmatic access, which is especially beneficial to designers who can now query the AWS price list instead of searching manually through the web. To make matters more natural, the queries can be constructed into simple code in any language. Azure offers a similar billing API to gain insights into your Azure usage programmatically. Summary Understanding and optimizing cloud pricing is somewhat challenging with AWS and Azure. This is partially because they offer hundreds of features with different pricing options and new features are added to the pipeline every week. To solve some of these complexities, we’ve covered some of the popular ways to tackle pricing in AWS and Azure. Here’s a list of things that we’ve covered: How the cloud pricing works and the different pricing schemes in AWS and Azure Comparison of different instance pricing options in AWS and Azure which includes reserved instance, on-demand instances, and discounted instances. Third-party tools like calculators for optimizing price. Price list API for AWS and Azure. If you have any thoughts to share, feel free to post it in the comments. About the author Gilad David Maayan Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Oracle, Zend, CheckPoint and Ixia. Gilad is a 3-time winner of international technical communication awards, including the STC Trans-European Merit Award and the STC Silicon Valley Award of Excellence. Over the past 7 years, Gilad has headed Agile SEO, which performs strategic search marketing for leading technology brands. Together with his team, Gilad has done market research, developer relations, and content strategy in 39 technology markets, lending him a broad perspective on trends, approaches, and ecosystems across the tech industry. Cloud computing trends in 2019 The 10 best cloud and infrastructure conferences happening in 2019 Bo Weaver on Cloud security, skills gap, and software development in 2019  
Read more
  • 0
  • 0
  • 24222

article-image-tools-to-stay-completely-anonymous-online
Guest Contributor
12 Jul 2018
8 min read
Save for later

10 great tools to stay completely anonymous online

Guest Contributor
12 Jul 2018
8 min read
Everybody is facing a battle these days. Though it may not be immediately apparent, it is already affecting a majority of the global population. This battle is not fought with bombs, planes, or tanks or with any physical weapons for that matter. This battle is for our online privacy. A survey made last year discovered 69% of data breaches were related to identity theft. Another survey shows the number of cases of data breaches related to identity theft has steadily risen over the last 4 years worldwide. And it is likely to increase as hackers are gaining easy access more advanced tools. The EU’s GDPR may curb this trend by imposing stricter data protection standards on data controllers and processors. These entities have been collecting and storing our data for years through ads that track our online habits-- another reason to protect our online anonymity. However, this new regulation has only been in force for over a month and only within the EU. So, it's going to take some time before we feel its long-term effects. The question is, what should we do when hackers out there try to steal and maliciously use our personal information? Simple: We defend ourselves with tools at our disposal to keep ourselves completely anonymous online. So, here’s a list you may find useful. 1. VPNs A VPN helps you maintain anonymity by hiding your real IP and internet activity from prying eyes. Normally, your browser sends a query tagged with your IP every time you make an online search. Your ISP takes this query and sends it to a DNS server which then points you to the correct website. Of course, your ISP (and all the servers your query had to go through) can, and will likely, view and monitor all the data you course through them-- including your personal information and IP address. This allows them to keep a tab on all your internet activity. A VPN protects your identity by assigning you an anonymous IP and encrypting your data. This means that any query you send to your ISP will be encrypted and no longer display your real IP. This is why using a VPN is one of the best ways to keeping anonymous online. However, not all VPNs are created equal. You have to choose the best one if you want airtight security. Also, beware of free VPNs. Most of them make money by selling your data to advertisers. You’ll want to compare and contrast several VPNs to find the best one for you. But, that’s sooner said than done with so many different VPNs out there. Look for reviews on trustworthy sites to find the best vpn for your needs. 2. TOR Browser The Onion Router (TOR) is a browser that strengthens your online anonymity even more by using different layers of encryption-- thereby protecting your internet activity which includes “visits to Web sites, online posts, instant messages, and other communication forms”. It works by first encasing your data in three layers of encryption. Your data is then bounced three times-- each bounce taking off one layer of encryption. Once your data gets to the right server, it “puts back on” each layer it has shed as it successively bounces back to your device. You can even improve TOR by using it in combination with a compatible VPN. It is important to note, though, that using TOR won’t hide the fact that you’re using it. Some sites may restrict allowances made through TOR. 3. Virtual machine A Virtual machine is basically a second computer within your computer. It lets you emulate another device through an application. This emulated computer can then be set according to your preferences. The best use for this tool, however, is for tasks that don’t involve an internet connection. It is best used for when you want to open a file and want to make sure no one is watching over your shoulder. After opening the file, you then simply delete the virtual machine. You can try VirtualBox which is available on Windows, Linux, and Mac. 4. Proxy servers A proxy server is an intermediary between your device and the internet. It’s basically another computer that you use to process internet requests. It’s similar to a virtual machine in concept but it’s an entirely separate physical machine. It protects your anonymity in a similar way a VPN does (by hiding your IP) but it can also send a different user agent to keep your browser unidentifiable and block or accept cookies but keep them from passing to your device. Most VPN companies also offer proxy servers so they’re a good place to look for a reliable one. 5. Fake emails A fake email is exactly what the name suggests: an email that isn’t linked to your real identity. Fake emails aid your online anonymity by not only hiding your real identity but by making sure to keep you safe from phishing emails or malware-- which can be easily sent to you via email. Making a fake email can be as easy as signing up for an email without using your real information or by using a fake email service. 6. Incognito mode “Going incognito” is the easiest anonymity tool to come by. Your device will not store any data at all while in this mode including: your browsing history, cookies, site data, and information entered in forms. Most browsers have a privacy mode that you can easily use to hide your online activity from other users of the same device. 7. Ad blockers Ads are everywhere these days. Advertising has and always will be a lucrative business. That said, there is a difference between good ads and bad ads. Good ads are those that target a population as a whole. Bad ads (interest-based advertising, as their companies like to call it) target each of us individually by tracking our online activity and location-- which compromises our online privacy. Tracking algorithms aren’t illegal, though, and have even been considered “clever”. But, the worst ads are those that contain malware that can infect your device and prevent you from using it. You can use ad blockers to combat these threats to your anonymity and security. Ad blockers usually come in the form of browser extensions which instantly work with no additional configuration needed. For Google Chrome, you can choose either Adblock Plus, uBlock Origin, or AdBlock. For Opera, you can choose either Opera Ad Blocker, Adblock Plus, or uBlock Origin. 8. Secure messaging apps If you need to use an online messaging app, you should know that the popular ones aren’t as secure as you’d like them to be. True, Facebook messenger does have a “secret conversation” feature but Facebook hasn’t exactly been the most secure social network to begin with. Instead, use tools like Signal or Telegram. These apps use end-to-end encryption and can even be used to make voice calls. 9. File shredder The right to be forgotten has surfaced in mainstream media with the onset of the EU’s General Data Protection Regulation. This right basically requires data collecting or processing entities to completely remove a data subject’s PII from their records. You can practice this same right on your own device by using a “file shredding” tool. But the the thing is: Completely removing sensitive files from your device is hard. Simply deleting it and emptying your device’s recycle bin doesn’t actually remove the file-- your device just treats the space it filled up as empty and available space. These “dead” files can still haunt you when they are found by someone who knows where to look. You can use software like Dr. Cleaner (for Mac) or Eraser (for Win) to “shred” your sensitive files by overwriting them several times with random patterns of random sets of data. 10. DuckDuckGo DuckDuckGo is a search engine that doesn’t track your behaviour (like Google and Bing that use behavioural trackers to target you with ads). It emphasizes your privacy and avoids the filter bubble of personalized search results. It offers useful features like region-specific searching, Safe Search (to protect against explicit content), and an instant answer feature which shows an answer across the top of the screen apart from the search results. To sum it up: Our online privacy is being attacked from all sides. Ads legally track our online activities and hackers steal our personal information. The GDPR may help in the long run but that remains to be seen. What's important is what we do now. These tools will set you on the path to a more secure and private internet experience today. About the Author Dana Jackson, an U.S. expat living in Germany and the founder of PrivacyHub. She loves all things related to security and privacy. She holds a degree in Political Science, and loves to call herself a scientist. Dana also loves morning coffee and her dog Paw.   [divider style="normal" top="20" bottom="20"] Top 5 cybersecurity trends you should be aware of in 2018 Twitter allegedly deleted 70 million fake accounts in an attempt to curb fake news Top 5 cybersecurity myths debunked  
Read more
  • 0
  • 3
  • 22061

article-image-common-problems-in-delphi-parallel-programming
Pavan Ramchandani
27 Jul 2018
12 min read
Save for later

Common problems in Delphi parallel programming

Pavan Ramchandani
27 Jul 2018
12 min read
This tutorial will be explaining how to find performance bottlenecks and apply the correct algorithm to fix them when working with Delphi. Also, teach you how to improve your algorithms before taking you through parallel programming. The article is an excerpt from a book written by Primož Gabrijelčič, titled Delphi High Performance. Never access UI from a background thread Let's start with the biggest source of hidden problems—manipulating a user interface from a background thread. This is, surprisingly, quite a common problem—even more so as all Delphi resources on multithreaded programming will simply say to never do that. Still, it doesn't seem to touch some programmers, and they will always try to find an excuse to manipulate a user interface from a background thread. Indeed, there may be a situation where VCL or FireMonkey may be manipulated from a background thread, but you'll be treading on thin ice if you do that. Even if your code works with the current Delphi, nobody can guarantee that changes in graphical libraries introduced in future Delphis won't break your code. It is always best to cleanly decouple background processing from a user interface. Let's look at an example which nicely demonstrates the problem. The ParallelPaint demo has a simple form, with eight TPaintBox components and eight threads. Each thread runs the same drawing code and draws a pattern into its own TPaintBox. As every thread accesses only its own Canvas, and no other user interface components, a naive programmer would therefore assume that drawing into paintboxes directly from background threads would not cause problems. A naive programmer would be very much mistaken. If you run the program, you will notice that although the code paints constantly into some of the paint boxes, others stop to be updated after some time. You may even get a Canvas does not allow drawing exception. It is impossible to tell in advance which threads will continue painting and which will not. The following image shows an example of an output. The first two paint boxes in the first row, and the last one in the last row were not updated anymore when I grabbed the image: The lines are drawn in the DrawLine method. It does nothing special, just sets the color for that line and draws it. Still, that is enough to break the user interface when this is called from multiple threads at once, even though each thread uses its own Canvas: procedure TfrmParallelPaint.DrawLine(canvas: TCanvas; p1, p2: TPoint; color: TColor); begin Canvas.Pen.Color := color; Canvas.MoveTo(p1.X, p1.Y); Canvas.LineTo(p2.X, p2.Y); end; Is there a way around this problem? Indeed there is. Delphi's TThread class implements a method, Queue, which executes some code in the main thread. Queue takes a procedure or anonymous method as a parameter and sends it to the main thread. After some short time, the code is then executed in the main thread. It is impossible to tell how much time will pass before the code is executed, but that delay will typically be very short, in the order of milliseconds. As it accepts an anonymous method, we can use the magic of variable capturing and write the corrected code, as shown here: procedure TfrmParallelPaint.QueueDrawLine(canvas: TCanvas; p1, p2: TPoint; color: TColor); begin TThread.Queue(nil, procedure begin Canvas.Pen.Color := color; Canvas.MoveTo(p1.X, p1.Y); Canvas.LineTo(p2.X, p2.Y); end); end; In older Delphis you don't have such a nice Queue method but only a version of Synchronize that accepts a normal  method. If you have to use this method, you cannot count on anonymous method mechanisms to handle parameters. Rather, you have to copy them to fields and then Synchronize a parameterless method operating on these fields. The following code fragment shows how to do that: procedure TfrmParallelPaint.SynchronizedDraw; begin FCanvas.Pen.Color := FColor; FCanvas.MoveTo(FP1.X, FP1.Y); FCanvas.LineTo(FP2.X, FP2.Y); end; procedure TfrmParallelPaint.SyncDrawLine(canvas: TCanvas; p1, p2: TPoint; color: TColor); begin FCanvas := canvas; FP1 := p1; FP2 := p2; FColor := color; TThread.Synchronize(nil, SynchronizedDraw); end; If you run the corrected program, the final result should always be similar to the following image, with all eight  TPaintBox components showing a nicely animated image: Simultaneous reading and writing The next situation which I'm regularly seeing while looking at a badly-written parallel code is simultaneous reading and writing from/to a shared data structure, such as a list.  The SharedList program demonstrates how things can go wrong when you share a data structure between threads. Actually, scrap that, it shows how things will go wrong if you do that. This program creates a shared list, FList: TList<Integer>. Then it creates one background thread which runs the method ListWriter and multiple background threads, each running the ListReader method. Indeed, you can run the same code in multiple threads. This is a perfectly normal behavior and is sometimes extremely useful. The ListReader method is incredibly simple. It just reads all the elements in a list and does that over and over again. As I've mentioned before, the code in my examples makes sure that problems in multithreaded code really do occur, but because of that, my demo code most of the time also looks terribly stupid. In this case, the reader just reads and reads the data because that's the best way to expose the problem: procedure TfrmSharedList.ListReader; var i, j, a: Integer; begin for i := 1 to CNumReads do for j := 0 to FList.Count - 1 do a := FList[j]; end; The ListWriter method is a bit different. It also loops around, but it also sleeps a little inside each loop iteration. After the Sleep, the code either adds to the list or deletes from it. Again, this is designed so that the problem is quick to appear: procedure TfrmSharedList.ListWriter; var i: Integer; begin for i := 1 to CNumWrites do begin Sleep(1); if FList.Count > 10 then FList.Delete(Random(10)) else FList.Add(Random(100)); end; end; If you start the program in a debugger, and click on the Shared lists button, you'll quickly get an EArgumentOutOfRangeException exception. A look at the stack trace will show that it appears in the line a := FList[j];. In retrospect, this is quite obvious. The code in ListReader starts the inner for loop and reads the FListCount. At that time, FList has 11 elements so Count is 11. At the end of the loop, the code tries to read FList[10], but in the meantime ListWriter has deleted one element and the list now only has 10 elements. Accessing element [10] therefore raises an exception. We'll return to this topic later, in the section about Locking. For now you should just keep in mind that sharing data structures between threads causes problems. Sharing a variable OK, so rule number two is "Shared structures bad". What about sharing a simple variable? Nothing can go wrong there, right? Wrong! There are actually multiple ways something can go wrong. The program IncDec demonstrates one of the bad things that can happen. The code contains two methods: IncValue and DecValue. The former increments a shared FValue: integer; some number of times, and the latter decrements it by the same number of times: procedure TfrmIncDec.IncValue; var i: integer; value: integer; begin for i := 1 to CNumRepeat do begin value := FValue; FValue := value + 1; end; end; procedure TfrmIncDec.DecValue; var i: integer; value: integer; begin for i := 1 to CNumRepeat do begin value := FValue; FValue := value - 1; end; end; A click on the Inc/Dec button sets the shared value to 0, runs IncValue, then DecValue, and logs the result: procedure TfrmIncDec.btnIncDec1Click(Sender: TObject); begin FValue := 0; IncValue; DecValue; LogValue; end; I know you can all tell what FValue will hold at the end of this program. Zero, of course. But what will happen if we run IncValue and DecValue in parallel? That is, actually, hard to predict! A click on the Multithreaded button does almost the same, except that it runs IncValue and DecValue in parallel. How exactly that is done is not important at the moment (but feel free to peek into the code if you're interested): procedure TfrmIncDec.btnIncDec2Click(Sender: TObject); begin FValue := 0; RunInParallel(IncValue, DecValue); LogValue; end; Running this version of the code may still sometimes put zero in FValue, but that will be extremely rare. You most probably won't be able to see that result unless you are very lucky. Most of the time, you'll just get a seemingly random number from the range -10,000,000 to 10,000,000 (which is the value of the CNumRepeatconstant). In the following image, the first number is a result of the single-threaded code, while all the rest were calculated by the parallel version of the algorithm: To understand what's going on, you should know that Windows (and all other operating systems) does many things at once. At any given time, there are hundreds of threads running in different programs and they are all fighting for the limited number of CPU cores. As our program is the active one (has focus), its threads will get most of the CPU time, but still they'll sometimes be paused for some amount of time so that other threads can run. Because of that, it can easily happen that IncValue reads the current value of FValue into value (let's say that the value is 100) and is then paused. DecValue reads the same value and then runs for some time, decrementing FValue. Let's say that it gets it down to -20,000. (That is just a number without any special meaning.) After that, the IncValue thread is awakened. It should increment the value to -19,999, but instead of that it adds 1 to 100 (stored in value), gets 101, and stores that into FValue. Ka-boom! In each repetition of the program, this will happen at different times and will cause a different result to be calculated. You may complain that the problem is caused by the two-stage increment and decrement, but you'd be wrong. I dare you—go ahead, change the code so that it will modify FValue with Inc(FValue) and Dec(FValue) and it still won't work correctly. Well, I hear you say, so I shouldn't even modify one variable from two threads at the same time? I can live with that. But surely, it is OK to write into a variable from one thread and read from another? The answer, as you can probably guess given the general tendency of this section, is again—no, you may not. There are some situations where this is OK (for example, when a variable is only one byte long) but, in general, even simultaneous reading and writing can be a source of weird problems. The ReadWrite program demonstrates this problem. It has a shared buffer, FBuf: Int64, and a pointer variable used to read and modify the data, FPValue: PInt64. At the beginning, the buffer is initialized to an easily recognized number and a pointer variable is set to point to the buffer: FPValue := @FBuf; FPValue^ := $7777777700000000; The program runs two threads. One just reads from the location and stores all the read values into a list. This value is created with Sorted and Duplicates properties, set in a way that prevents it from storing duplicate values: procedure TfrmReadWrite.Reader; var i: integer; begin for i := 1 to CNumRepeat do FValueList.Add(FPValue^); end; The second thread repeatedly writes two values into the shared location: procedure TfrmReadWrite.Writer; var i: integer; begin for i := 1 to CNumRepeat do begin FPValue^ := $7777777700000000; FPValue^ := $0000000077777777; end; end; At the end, the contents of the FValueList list are logged on the screen. We would expect to see only two values—$7777777700000000 and $0000000077777777. In reality, we see four, as the following screenshot demonstrates: The reason for that strange result is that Intel processors in 32-bit mode can't write a 64-bit number (as int64 is) in one step. In other words, reading and writing 64-bit numbers in 32-bit code is not atomic. When multithreading programmers talk about something being atomic, they want to say that an operation will execute in one indivisible step. Any other thread will either see a state before the operation or a state after the operation, but never some undefined intermediate state. How do values $7777777777777777 and $0000000000000000 appear in the test application? Let's say that FValue^ contains $7777777700000000. The code then starts writing $0000000077777777 into FValue by firstly storing a $77777777 into the bottom four bytes. After that it starts writing $00000000 into the upper four bytes of FValue^, but in the meantime Reader reads the value and gets $7777777777777777. In a similar way, Reader will sometimes see $0000000000000000 in the FValue^. We'll look into a way to solve this situation immediately, but in the meantime, you may wonder—when is it okay to read/write from/to a variable at the same time? Sadly, the answer is—it depends. Not even just on the CPU family (Intel and ARM processors behave completely differently), but also on a specific architecture used in a processor. For example, older and newer Intel processors may not behave the same in that respect. You can always depend on access to byte-sized data being atomic, but that is that. Access (reads and writes) to larger quantities of data (words, integers) is atomic only if the data is correctly aligned. You can access word sized data atomically if it is word aligned, and integer data if it is double-word aligned. If the code was compiled in 64-bit mode, you can also atomically access in 64 data if it is quad-word aligned. When you are not using data packing (such as packed records) the compiler will take care of alignment and data access should automatically be atomic. You should, however, still check the alignment in code, if nothing else to prevent stupid programming errors. If you want to write and read larger amounts of data, modify the data, or if you want to work on shared data structures, correct alignment will not be enough. You will need to introduce synchronization into your program. If you found this post useful, do check out the book Delphi High Performance to learn more about the intricacies of how to perform High-performance programming with Delphi. Delphi: memory management techniques for parallel programming Parallel Programming Patterns Concurrency programming 101: Why do programmers hang by a thread?
Read more
  • 0
  • 1
  • 21685

article-image-why-arc-welder-is-good-choice-to-run-android-apps-on-desktop-using-chrome-browser
Guest Contributor
21 Aug 2019
5 min read
Save for later

Why ARC Welder is a good choice to run Android apps on desktop using the Chrome browser

Guest Contributor
21 Aug 2019
5 min read
Running Android apps on Chrome is a complicated task, especially when you are not using a Chromebook. However, it should be noted that Chrome has an in-built tool (now) that allows users to test Android-based application in the browser, launched by Google in 2015, known as App Runtime for Chrome (ARC) Welder. What is ARC Welder? The ARC Welder tool allows Android applications to run on Google Chrome for Windows, OS X, Linux systems. ARC Welder is basically for app developers who want to test run their Android applications within Chrome OS and confront any runtime errors or bugs. The tool was launched as an experimental concept for developers previously but later was available for download for everyone. Main functions: ARC Welder offers an easy and streamlined method for application testing. At the first step, the user will be required to add the bundle into the existing application menu. Users are provided with the freedom to write to any file or a folder which can be opened via ARC software assistance. Any beginner developer or a user can choose to leave the settings page as they (settings) will be set to default if skipped or left unsaved. Here’s how to run ARC Welder tool for running android application: Download or upgrade to the latest version of Google Chrome browser. Download and run the ARC Welder application from the Google Chrome Store. Add a third-party APK file host. After downloading the APK app file in your laptop/PC, click Open. Select the mode “Phone” and ‘Tablet”--either of which you wish you run the application on. Lastly, click on the "Launch App" button. Points to remember for running ARC Welder on Chrome: ARC Welder tool only works with APK files, which means that in order to get your Android Applications successfully run on your laptop, you will be required to download APK files of the specific application you wish to install on your desktop. You can find APK files from the below mentioned APK databases: APKMirror AndroidAPKsFree AndroidCrew APKPure Points to remember before installing ARC Welder: Only one specific application can be loaded at one single time. On the basis of your application, you will be required to select the portrait/landscape mode manually. Tablet and Phone mode specifications are necessary as they have different outcomes. ARC Welder is based on Android 4.4. This means that users are required to test applications that support Android 4.4 or above. Note: Points 1 and 2 can be considered as limitations of ARC Welder. Pros: Cross-platform as it works on Windows, Linux, Mac and Chrome OS. Developed by Google which means the software will evolve quickly considering the upgrade pace of Android (also developed by Google). Allows application testing in Google Chrome web browser. Cons: Not all Google Play Services are supported by ARC Welder. ARC Welder only supports “ARM” APK format. Keyboard input is spotty. Takes 2-3 minutes to install as compared to other testing applications like BlueStacks (one-click install). No accelerometer simulation. Users are required to choose the “orientation” mode before getting into the detailed interface of ARC Welder. There are competitors of ARC Welder like BlueStacks which is often preferred by a majority of developers due to its one-click install feature. Although ARC Welder gives a much better performance, it still ranks at 7th (BlueStacks stands at 6th). Apart from shortcomings, ARC Welder continues to evolve and secure its faithful following of beginners to expert developers. In the next section, we’ll have a look at the few alternatives to ARC Welder. Few Alternatives: Genymotion - It is an easy to use android emulator for your computer. It works as a virtual machine and enables you to run mobile apps and games on your desktop and laptop efficiently. Andy - It is an operating system that works as an android emulator for your computer. It allows you to open up mobile apps and play mobile games in a version of the Android operating system on your Mac or Windows desktop. BlueStacks - It is a website that has been built to format mobile apps and make them compatible to the desktop computers. It also helps to open ip mobile gaming apps on computers and laptops. MEmu - It is the fastest android emulator that allows you to play mobile games on PC for free. It is known for its performance, and user experience. It supports most of the popular mobile apps and games, and various system configurations. Koplayer - It is a free, one of the best android emulator for PC that supports video recording, multiple accounts, and keyboard. Built on x86 architecture, it is more stable and faster than Bluestacks. Not to mention, it is very interesting to load android apps on chrome browser on your computer and laptop, no matter which operating system you are using. It could be very useful to run android apps on chrome browser when Google play store and Apple app store are prone to exploitation. Although right now we can run a few apps using ARC Welder, one at a time, surely the developers will add more functionality and take this to the next level. So, are you ready to use mobile apps play mobile games on your PC using ARC Welder? If you have any questions, leave in the comment box, we’ll respond back. Author Bio Hilary is a writer, content manager at Androidcrew.com. She loves to share the knowledge and insights she gained along the way with others.    
Read more
  • 0
  • 0
  • 20782

article-image-famous-gang-of-four-design-patterns
Sugandha Lahoti
10 Jul 2018
14 min read
Save for later

Meet the famous 'Gang of Four' design patterns

Sugandha Lahoti
10 Jul 2018
14 min read
A design pattern is a reusable solution to a recurring problem in software design. It is not a finished piece of code but a template that helps to solve a particular problem or family of problems. In this article, we will talk about the Gang of Four design patterns. The gang of four, authors Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides, initiated the concept of Design Pattern in Software development. These authors are collectively known as Gang of Four (GOF). We are going to focus on the design patterns from the Scala point of view. All different design patterns can be grouped into the following types: Creational Structural Behavioral These three groups contain the famous Gang of Four design patterns.  In the next few subsections, we will explain the main characteristics of the listed groups and briefly present the actual design patterns that fall under them. This article is an excerpt from Scala Design Patterns - Second Edition by Ivan Nikolov. In this book, you will learn how to write efficient, clean, and reusable code with Scala. Creational design patterns The creational design patterns deal with object creation mechanisms. Their purpose is to create objects in a way that is suitable to the current situation, which could lead to unnecessary complexity and the need for extra knowledge if they were not there. The main ideas behind the creational design patterns are as follows: Knowledge encapsulation about the concrete classes Hiding details about the actual creation and how objects are combined We will be focusing on the following creational design patterns in this article: The abstract factory design pattern The factory method design pattern The lazy initialization design pattern The singleton design pattern The object pool design pattern The builder design pattern The prototype design pattern The following few sections give a brief definition of what these patterns are. The abstract factory design pattern This is used to encapsulate a group of individual factories that have a common theme. When used, the developer creates a specific implementation of the abstract factory and uses its methods in the same way as in the factory design pattern to create objects. It can be thought of as another layer of abstraction that helps to instantiate classes. The factory method design pattern This design pattern deals with the creation of objects without explicitly specifying the actual class that the instance will have—it could be something that is decided at runtime based on many factors. Some of these factors can include operating systems, different data types, or input parameters. It gives developers the peace of mind of just calling a method rather than invoking a concrete constructor. The lazy initialization design pattern This design pattern is an approach to delay the creation of an object or the evaluation of a value until the first time it is needed. It is much more simplified in Scala than it is in an object-oriented language such as Java. The singleton design pattern This design pattern restricts the creation of a specific class to just one object. If more than one class in the application tries to use such an instance, then this same instance is returned for everyone. This is another design pattern that can be easily achieved with the use of basic Scala features. The object pool design pattern This design pattern uses a pool of objects that are already instantiated and ready for use. Whenever someone requires an object from the pool, it is returned, and after the user is finished with it, it puts it back into the pool manually or automatically. A common use for pools are database connections, which generally are expensive to create; hence, they are created once and then served to the application on request. The builder design pattern The builder design pattern is extremely useful for objects with many possible constructor parameters that would otherwise require developers to create many overrides for the different scenarios an object could be created in. This is different to the factory design pattern, which aims to enable polymorphism. Many of the modern libraries today employ this design pattern. As we will see later, Scala can achieve this pattern really easily. The prototype design pattern This design pattern allows object creation using a clone() method from an already created instance. It can be used in cases when a specific resource is expensive to create or when the abstract factory pattern is not desired. Structural design patterns Structural design patterns exist in order to help establish the relationships between different entities in order to form larger structures. They define how each component should be structured so that it has very flexible interconnecting modules that can work together in a larger system. The main features of structural design patterns include the following: The use of the composition to combine the implementations of multiple objects Help build a large system made of various components by maintaining a high level of flexibility In this article, we will focus on the following structural design patterns: The adapter design pattern The decorator design pattern The bridge design pattern The composite design pattern The facade design pattern The flyweight design pattern The proxy design pattern The next subsections will put some light on what these patterns are about. The adapter design pattern The adapter design pattern allows the interface of an existing class to be used from another interface. Imagine that there is a client who expects your class to expose a doWork() method. You might have the implementation ready in another class, but the method is called differently and is incompatible. It might require extra parameters too. This could also be a library that the developer doesn't have access to for modifications. This is where the adapter can help by wrapping the functionality and exposing the required methods. The adapter is useful for integrating the existing components. In Scala, the adapter design pattern can be easily achieved using implicit classes. The decorator design pattern Decorators are a flexible alternative to sub classing. They allow developers to extend the functionality of an object without affecting other instances of the same class. This is achieved by wrapping an object of the extended class into one that extends the same class and overrides the methods whose functionality is supposed to be changed. Decorators in Scala can be built much more easily using another design pattern called stackable traits. The bridge design pattern The purpose of the bridge design pattern is to decouple an abstraction from its implementation so that the two can vary independently. It is useful when the class and its functionality vary a lot. The bridge reminds us of the adapter pattern, but the difference is that the adapter pattern is used when something is already there and you cannot change it, while the bridge design pattern is used when things are being built. It helps us to avoid ending up with multiple concrete classes that will be exposed to the client. You will get a clearer understanding when we delve deeper into the topic, but for now, let's imagine that we want to have a FileReader class that supports multiple different platforms. The bridge will help us end up with FileReader, which will use a different implementation, depending on the platform. In Scala, we can use self-types in order to implement a bridge design pattern. The composite design pattern The composite is a partitioning design pattern that represents a group of objects that are to be treated as only one object. It allows developers to treat individual objects and compositions uniformly and to build complex hierarchies without complicating the source code. An example of composite could be a tree structure where a node can contain other nodes, and so on. The facade design pattern The purpose of the facade design pattern is to hide the complexity of a system and its implementation details by providing the client with a simpler interface to use. This also helps to make the code more readable and to reduce the dependencies of the outside code. It works as a wrapper around the system that is being simplified and, of course, it can be used in conjunction with some of the other design patterns mentioned previously. The flyweight design pattern The flyweight design pattern provides an object that is used to minimize memory usage by sharing it throughout the application. This object should contain as much data as possible. A common example given is a word processor, where each character's graphical representation is shared with the other same characters. The local information then is only the position of the character, which is stored internally. The proxy design pattern The proxy design pattern allows developers to provide an interface to other objects by wrapping them. They can also provide additional functionality, for example, security or thread-safety. Proxies can be used together with the flyweight pattern, where the references to shared objects are wrapped inside proxy objects. Behavioral design patterns Behavioral design patterns increase communication flexibility between objects based on the specific ways they interact with each other. Here, creational patterns mostly describe a moment in time during creation, structural patterns describe a more or less static structure, and behavioral patterns describe a process or flow. They simplify this flow and make it more understandable. The main features of behavioral design patterns are as follows: What is being described is a process or flow The flows are simplified and made understandable They accomplish tasks that would be difficult or impossible to achieve with objects In this article, we will focus our attention on the following behavioral design patterns: The value object design pattern The null object design pattern The strategy design pattern The command design pattern The chain of responsibility design pattern The interpreter design pattern The iterator design pattern The mediator design pattern The memento design pattern The observer design pattern The state design pattern The template method design pattern The visitor design pattern The following subsections will give brief definitions of the aforementioned behavioral design patterns. The value object design pattern Value objects are immutable and their equality is based not on their identity, but on their fields being equal. They can be used as data transfer objects, and they can represent dates, colors, money amounts, numbers, and more. Their immutability makes them really useful in multithreaded programming. The Scala programming language promotes immutability, and value objects are something that naturally occur there. The null object design pattern Null objects represent the absence of a value and they define a neutral behavior. This approach removes the need to check for null references and makes the code much more concise. Scala adds the concept of optional values, which can replace this pattern completely. The strategy design pattern The strategy design pattern allows algorithms to be selected at runtime. It defines a family of interchangeable encapsulated algorithms and exposes a common interface to the client. Which algorithm is chosen could depend on various factors that are determined while the application runs. In Scala, we can simply pass a function as a parameter to a method, and depending on the function, a different action will be performed. The command design pattern This design pattern represents an object that is used to store information about an action that needs to be triggered at a later time. The information includes the following: The method name The owner of the method Parameter values The client then decides which commands need to be executed and when by the invoker. This design pattern can easily be implemented in Scala using the by-name parameters feature of the language. The chain of responsibility design pattern The chain of responsibility is a design pattern where the sender of a request is decoupled from its receiver. This way, it makes it possible for multiple objects to handle the request and to keep logic nicely separated. The receivers form a chain where they pass the request and, if possible, they process it, and if not, they pass it to the next receiver. There are variations where a handler might dispatch the request to multiple other handlers at the same time. This somehow reminds us of function composition, which in Scala can be achieved using the stackable traits design pattern. The interpreter design pattern The interpreter design pattern is based on the ability to characterize a well-known domain with a language with a strict grammar. It defines classes for each grammar rule in order to interpret sentences in the given language. These classes are likely to represent hierarchies as grammar is usually hierarchical as well. Interpreters can be used in different parsers, for example, SQL or other languages. The iterator design pattern The iterator design pattern is when an iterator is used to traverse a container and access its elements. It helps to decouple containers from the algorithms performed on them. What an iterator should provide is sequential access to the elements of an aggregate object without exposing the internal representation of the iterated collection. The mediator design pattern This pattern encapsulates the communication between different classes in an application. Instead of interacting directly with each other, objects communicate through the mediator, which reduces the dependencies between them, lowers the coupling, and makes the overall application easier to read and maintain. The memento design pattern This pattern provides the ability to roll back an object to its previous state. It is implemented with three objects—originator, caretaker, and memento. The originator is the object with the internal state; the caretaker will modify the originator, and a memento is an object that contains the state that the originator returns. The originator knows how to handle a memento in order to restore its previous state. The observer design pattern This design pattern allows the creation of publish/subscribe systems. There is a special object called subject that automatically notifies all the observers when there are any changes in the state. This design pattern is popular in various GUI toolkits and generally where event handling is needed. It is also related to reactive programming, which is enabled by libraries such as Akka. We will see an example of this towards the end of this book. The state design pattern This design pattern is similar to the strategy design pattern, and it uses a state object to encapsulate different behavior for the same object. It improves the code's readability and maintainability by avoiding the use of large conditional statements. The template method design pattern This design pattern defines the skeleton of an algorithm in a method and then passes some of the actual steps to the subclasses. It allows developers to alter some of the steps of an algorithm without having to modify its structure. An example of this could be a method in an abstract class that calls other abstract methods, which will be defined in the children. The visitor design pattern The visitor design pattern represents an operation to be performed on the elements of an object structure. It allows developers to define a new operation without changing the original classes. Scala can minimize the verbosity of this pattern compared to the pure object-oriented way of implementing it by passing functions to methods. Choosing a design pattern As we already saw, there are a huge number of design patterns. In many cases, they are suitable to be used in combinations as well. Unfortunately, there is no definite answer regarding how to choose the concept of designing our code. There are many factors that could affect the final decision, and you should ask yourselves the following questions: Is this piece of code going to be fairly static or will it change in the future? Do we have to dynamically decide what algorithms to use? Is our code going to be used by others? Do we have an agreed interface? What libraries are we planning to use, if any? Are there any special performance requirements or limitations? This is by no means an exhaustive list of questions. There is a huge amount of factors that could dictate our decision in how we build our systems. It is, however, really important to have a clear specification, and if something seems missing, it should always be checked first. By now, we have a fair idea about what a design pattern is and how it can affect the way we write our code. We've iterated through the most famous Gang of Four design patterns out there, and we have outlined the main differences between them. To know more on how to incorporate functional patterns effectively in real-life applications, read our book Scala Design Patterns - Second Edition. Implementing 5 Common Design Patterns in JavaScript (ES8) An Introduction to Node.js Design Patterns
Read more
  • 0
  • 0
  • 20661
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at £15.99/month. Cancel anytime
article-image-what-software-stack-does-netflix-use
Richard Gall
03 Sep 2017
5 min read
Save for later

What software stack does Netflix use?

Richard Gall
03 Sep 2017
5 min read
Netflix is a company that has grown at an incredible pace. In July 2017 it reached 100 million subscribers around the world - for a company that started life as a DVD subscription service, Netflix has proved to be adaptable, constantly one step ahead of changes in the market and changes in user behaviour. It’s an organization that has been able to scale, while maintaining a strong focus on user experience. This flexibility and adaptability has been driven - or enabled by its approach to software. But what software does Netflix use, exactly? How and why has it made decisions about its software stack? Netflix's front end development tools User experience is critical for Netflix. That’s why React is such a valuable framework for the engineering team. The team moved to React in 2014 as a means to completely overhaul their UI - essentially to make it fit for purpose for the future. As they outline in this piece from January 2015, their core considerations were startup speed, runtime performance and modularity. Or to summarize, to manage scale in terms of users, content, and devices. The piece goes on to explain why React fit the bill. It mentions how important isomorphic JavaScript is to the way they work: React enabled us to build JavaScript UI code that can be executed in both server (e.g. Node.js) and client contexts. To improve our start up times, we built a hybrid application where the initial markup is rendered server-side and the resulting UI elements are subsequently manipulated as done in a single-page application. How Netflix manages microservices at scale A lot has been written about Netflix and microservices (we recommend this blog post as a good overview of how microservices have been used by Netflix over the last decade). A big challenge for a platform that has grown like Netflix has is managing those microservices. To do this, the team built their own tool - called Netflix Conductor. “In a microservices world, a lot of business process automations are driven by orchestrating across services. Conductor enables orchestration across services while providing control and visibility into their interactions.” To get microservices to interact with one another, the team write Shell/Python scripts - however this ran into issues, largely due to scale. To combat this, the engineers developed a tool called Scriptflask.Scriptflask ‘exposes the functionality of utilities as REST endpoints.’ This was developed using Flask (hence the name Scriptflask) - which offered good interoperability with Python, a language used across a wide range of Netflix applications. The team also use Node.js with Docker to manage services - it’s well worth watching this video from Node.JS Interactive in December 2016 where Yunong Xiao, Principal Software Engineer at Netflix talks ‘slaying monoliths’. Netflix and AWS migration Just as we saw with Airbnb, AWS has proven crucial to Netflix success. In fact for the last few years, it has been undergoing a huge migration project to move the bulk of its architecture into AWS. This migration was finally complete at the start of 2016. “We chose Amazon Web Services (AWS) as our cloud provider because it provided us with the greatest scale and the broadest set of services and features,” writes Yuri Izrailevsky, VP of cloud and platform engineering at Netflix. Netflix and big data The move to AWS has, of course, been driven by data related challenges. In fact, the team use Amazon S3 as their data warehouse. The scale of data is stunning - it’s been said that the data warehouse is 60 petabytes. This post from InfoQ elaborates on the Netflix big data infrastructure. How Netflix does DevOps For a company that has proven itself so adaptable at a macro level, it’s unsurprising that the way the engineering teams at Netflix build code is incredibly flexible and agile too. It’s worth quoting this from the team - it says a lot about the culture: The Netflix culture of freedom and responsibility empowers engineers to craft solutions using whatever tools they feel are best suited to the task. In our experience, for a tool to be widely accepted, it must be compelling, add tremendous value, and reduce the overall cognitive load for the majority of Netflix engineers. Clearly the toolchain that supports development teams is open ended - it’s constantly subject to revision and change. But we wanted to flag some of the key tools that helps keep the culture running. First, there is gradle - the team write that “Gradle was chosen because it was easy to write testable plugins, while reducing the size of a project’s build file.” It also makes sense in that Java makes up such a large proportion of the Netflix codebase too. To provide additional support, the team also developed something called Nebula - “an opinionated set of plugins for the Gradle build system, to help with the heavy lifting around building applications”. When it comes to integration, Jenkins is essential for Netflix. “We started with a single massive Jenkins master in our datacenter and have evolved to running 25 Jenkins masters in AWS”. This is just a snapshot of the tools used by the team to build and deploy code - for a much deeper exploration we recommend this post on Medium.
Read more
  • 0
  • 33
  • 20407

article-image-most-commonly-used-java-machine-learning-libraries
Fatema Patrawala
10 Sep 2018
15 min read
Save for later

6 most commonly used Java Machine learning libraries

Fatema Patrawala
10 Sep 2018
15 min read
There are over 70 Java-based open source machine learning projects listed on the MLOSS.org website and probably many more unlisted projects live at university servers, GitHub, or Bitbucket. In this article, we will review the major machine learning libraries and platforms in Java, the kind of problems they can solve, the algorithms they support, and the kind of data they can work with. This article is an excerpt taken from Machine learning in Java, written by Bostjan Kaluza and published by Packt Publishing Ltd. Weka Weka, which is short for Waikato Environment for Knowledge Analysis, is a machine learning library developed at the University of Waikato, New Zealand, and is probably the most well-known Java library. It is a general-purpose library that is able to solve a wide variety of machine learning tasks, such as classification, regression, and clustering. It features a rich graphical user interface, command-line interface, and Java API. You can check out Weka at http://www.cs.waikato.ac.nz/ml/weka/. At the time of writing this book, Weka contains 267 algorithms in total: data pre-processing (82), attribute selection (33), classification and regression (133), clustering (12), and association rules mining (7). Graphical interfaces are well-suited for exploring your data, while Java API allows you to develop new machine learning schemes and use the algorithms in your applications. Weka is distributed under GNU General Public License (GNU GPL), which means that you can copy, distribute, and modify it as long as you track changes in source files and keep it under GNU GPL. You can even distribute it commercially, but you must disclose the source code or obtain a commercial license. In addition to several supported file formats, Weka features its own default data format, ARFF, to describe data by attribute-data pairs. It consists of two parts. The first part contains header, which specifies all the attributes (that is, features) and their type; for instance, nominal, numeric, date, and string. The second part contains data, where each line corresponds to an instance. The last attribute in the header is implicitly considered as the target variable, missing data are marked with a question mark. For example, the Bob instance written in an ARFF file format would be as follows: @RELATION person_dataset @ATTRIBUTE `Name`  STRING @ATTRIBUTE `Height`  NUMERIC @ATTRIBUTE `Eye color`{blue, brown, green} @ATTRIBUTE `Hobbies`  STRING @DATA 'Bob', 185.0, blue, 'climbing, sky diving' 'Anna', 163.0, brown, 'reading' 'Jane', 168.0, ?, ? The file consists of three sections. The first section starts with the @RELATION <String> keyword, specifying the dataset name. The next section starts with the @ATTRIBUTE keyword, followed by the attribute name and type. The available types are STRING, NUMERIC, DATE, and a set of categorical values. The last attribute is implicitly assumed to be the target variable that we want to predict. The last section starts with the @DATA keyword, followed by one instance per line. Instance values are separated by comma and must follow the same order as attributes in the second section. Weka's Java API is organized in the following top-level packages: weka.associations: These are data structures and algorithms for association rules learning, including Apriori, predictive apriori, FilteredAssociator, FP-Growth, Generalized Sequential Patterns (GSP), Hotspot, and Tertius. weka.classifiers: These are supervised learning algorithms, evaluators, and data structures. Thepackage is further split into the following components: weka.classifiers.bayes: This implements Bayesian methods, including naive Bayes, Bayes net, Bayesian logistic regression, and so on weka.classifiers.evaluation: These are supervised evaluation algorithms for nominal and numerical prediction, such as evaluation statistics, confusion matrix, ROC curve, and so on weka.classifiers.functions: These are regression algorithms, including linear regression, isotonic regression, Gaussian processes, support vector machine, multilayer perceptron, voted perceptron, and others weka.classifiers.lazy: These are instance-based algorithms such as k-nearest neighbors, K*, and lazy Bayesian rules weka.classifiers.meta: These are supervised learning meta-algorithms, including AdaBoost, bagging, additive regression, random committee, and so on weka.classifiers.mi: These are multiple-instance learning algorithms, such as citation k-nn, diverse density, MI AdaBoost, and others weka.classifiers.rules: These are decision tables and decision rules based on the separate-and-conquer approach, Ripper, Part, Prism, and so on weka.classifiers.trees: These are various decision trees algorithms, including ID3, C4.5, M5, functional tree, logistic tree, random forest, and so on weka.clusterers: These are clustering algorithms, including k-means, Clope, Cobweb, DBSCAN hierarchical clustering, and farthest. weka.core: These are various utility classes, data presentations, configuration files, and so on. weka.datagenerators: These are data generators for classification, regression, and clustering algorithms. weka.estimators: These are various data distribution estimators for discrete/nominal domains, conditional probability estimations, and so on. weka.experiment: These are a set of classes supporting necessary configuration, datasets, model setups, and statistics to run experiments. weka.filters: These are attribute-based and instance-based selection algorithms for both supervised and unsupervised data preprocessing. weka.gui: These are graphical interface implementing explorer, experimenter, and knowledge flowapplications. Explorer allows you to investigate dataset, algorithms, as well as their parameters, and visualize dataset with scatter plots and other visualizations. Experimenter is used to design batches of experiment, but it can only be used for classification and regression problems. Knowledge flows implements a visual drag-and-drop user interface to build data flows, for example, load data, apply filter, build classifier, and evaluate. Java-ML for machine learning Java machine learning library, or Java-ML, is a collection of machine learning algorithms with a common interface for algorithms of the same type. It only features Java API, therefore, it is primarily aimed at software engineers and programmers. Java-ML contains algorithms for data preprocessing, feature selection, classification, and clustering. In addition, it features several Weka bridges to access Weka's algorithms directly through the Java-ML API. It can be downloaded from http://java-ml.sourceforge.net; where, the latest release was in 2012 (at the time of writing this book). Java-ML is also a general-purpose machine learning library. Compared to Weka, it offers more consistent interfaces and implementations of recent algorithms that are not present in other packages, such as an extensive set of state-of-the-art similarity measures and feature-selection techniques, for example, dynamic time warping, random forest attribute evaluation, and so on. Java-ML is also available under the GNU GPL license. Java-ML supports any type of file as long as it contains one data sample per line and the features are separated by a symbol such as comma, semi-colon, and tab. The library is organized around the following top-level packages: net.sf.javaml.classification: These are classification algorithms, including naive Bayes, random forests, bagging, self-organizing maps, k-nearest neighbors, and so on net.sf.javaml.clustering: These are clustering algorithms such as k-means, self-organizing maps, spatial clustering, Cobweb, AQBC, and others net.sf.javaml.core: These are classes representing instances and datasets net.sf.javaml.distance: These are algorithms that measure instance distance and similarity, for example, Chebyshev distance, cosine distance/similarity, Euclidian distance, Jaccard distance/similarity, Mahalanobis distance, Manhattan distance, Minkowski distance, Pearson correlation coefficient, Spearman's footrule distance, dynamic time wrapping (DTW), and so on net.sf.javaml.featureselection: These are algorithms for feature evaluation, scoring, selection, and ranking, for instance, gain ratio, ReliefF, Kullback-Liebler divergence, symmetrical uncertainty, and so on net.sf.javaml.filter: These are methods for manipulating instances by filtering, removing attributes, setting classes or attribute values, and so on net.sf.javaml.matrix: This implements in-memory or file-based array net.sf.javaml.sampling: This implements sampling algorithms to select a subset of dataset net.sf.javaml.tools: These are utility methods on dataset, instance manipulation, serialization, Weka API interface, and so on net.sf.javaml.utils: These are utility methods for algorithms, for example, statistics, math methods, contingency tables, and others Apache Mahout The Apache Mahout project aims to build a scalable machine learning library. It is built atop scalable, distributed architectures, such as Hadoop, using the MapReduce paradigm, which is an approach for processing and generating large datasets with a parallel, distributed algorithm using a cluster of servers. Mahout features console interface and Java API to scalable algorithms for clustering, classification, and collaborative filtering. It is able to solve three business problems: item recommendation, for example, recommending items such as people who liked this movie also liked…; clustering, for example, of text documents into groups of topically-related documents; and classification, for example, learning which topic to assign to an unlabeled document. Mahout is distributed under a commercially-friendly Apache License, which means that you can use it as long as you keep the Apache license included and display it in your program's copyright notice. Mahout features the following libraries: org.apache.mahout.cf.taste: These are collaborative filtering algorithms based on user-based and item-based collaborative filtering and matrix factorization with ALS org.apache.mahout.classifier: These are in-memory and distributed implementations, includinglogistic regression, naive Bayes, random forest, hidden Markov models (HMM), and multilayer perceptron org.apache.mahout.clustering: These are clustering algorithms such as canopy clustering, k-means, fuzzy k-means, streaming k-means, and spectral clustering org.apache.mahout.common: These are utility methods for algorithms, including distances, MapReduce operations, iterators, and so on org.apache.mahout.driver: This implements a general-purpose driver to run main methods of other classes org.apache.mahout.ep: This is the evolutionary optimization using the recorded-step mutation org.apache.mahout.math: These are various math utility methods and implementations in Hadoop org.apache.mahout.vectorizer: These are classes for data presentation, manipulation, andMapReduce jobs Apache Spark Apache Spark, or simply Spark, is a platform for large-scale data processing builds atop Hadoop, but, in contrast to Mahout, it is not tied to the MapReduce paradigm. Instead, it uses in-memory caches to extract a working set of data, process it, and repeat the query. This is reported to be up to ten times as fast as a Mahout implementation that works directly with disk-stored data. It can be grabbed from https://spark.apache.org. There are many modules built atop Spark, for instance, GraphX for graph processing, Spark Streaming for processing real-time data streams, and MLlib for machine learning library featuring classification, regression, collaborative filtering, clustering, dimensionality reduction, and optimization. Spark's MLlib can use a Hadoop-based data source, for example, Hadoop Distributed File System (HDFS) or HBase, as well as local files. The supported data types include the following: Local vector is stored on a single machine. Dense vectors are presented as an array of double-typed values, for example, (2.0, 0.0, 1.0, 0.0); while sparse vector is presented by the size of the vector, an array of indices, and an array of values, for example, [4, (0, 2), (2.0, 1.0)]. Labeled point is used for supervised learning algorithms and consists of a local vector labeled with a double-typed class values. Label can be class index, binary outcome, or a list of multiple class indices (multiclass classification). For example, a labeled dense vector is presented as [1.0, (2.0, 0.0, 1.0, 0.0)]. Local matrix stores a dense matrix on a single machine. It is defined by matrix dimensions and a single double-array arranged in a column-major order. Distributed matrix operates on data stored in Spark's Resilient Distributed Dataset (RDD), which represents a collection of elements that can be operated on in parallel. There are three presentations: row matrix, where each row is a local vector that can be stored on a single machine, row indices are meaningless; and indexed row matrix, which is similar to row matrix, but the row indices are meaningful, that is, rows can be identified and joins can be executed; and coordinate matrix, which is used when a row cannot be stored on a single machine and the matrix is very sparse. Spark's MLlib API library provides interfaces to various learning algorithms and utilities as outlined in the following list: org.apache.spark.mllib.classification: These are binary and multiclass classification algorithms, including linear SVMs, logistic regression, decision trees, and naive Bayes org.apache.spark.mllib.clustering: These are k-means clustering org.apache.spark.mllib.linalg: These are data presentations, including dense vectors, sparse vectors, and matrices org.apache.spark.mllib.optimization: These are the various optimization algorithms used as low-level primitives in MLlib, including gradient descent, stochastic gradient descent, update schemes for distributed SGD, and limited-memory BFGS org.apache.spark.mllib.recommendation: These are model-based collaborative filtering implemented with alternating least squares matrix factorization org.apache.spark.mllib.regression: These are regression learning algorithms, such as linear least squares, decision trees, Lasso, and Ridge regression org.apache.spark.mllib.stat: These are statistical functions for samples in sparse or dense vector format to compute the mean, variance, minimum, maximum, counts, and nonzero counts org.apache.spark.mllib.tree: This implements classification and regression decision tree-learning algorithms org.apache.spark.mllib.util: These are a collection of methods to load, save, preprocess, generate, and validate the data Deeplearning4j Deeplearning4j, or DL4J, is a deep-learning library written in Java. It features a distributed as well as a single-machinedeep-learning framework that includes and supports various neural network structures such as feedforward neural networks, RBM, convolutional neural nets, deep belief networks, autoencoders, and others. DL4J can solve distinct problems, such as identifying faces, voices, spam or e-commerce fraud. Deeplearning4j is also distributed under Apache 2.0 license and can be downloaded from http://deeplearning4j.org. The library is organized as follows: org.deeplearning4j.base: These are loading classes org.deeplearning4j.berkeley: These are math utility methods org.deeplearning4j.clustering: This is the implementation of k-means clustering org.deeplearning4j.datasets: This is dataset manipulation, including import, creation, iterating, and so on org.deeplearning4j.distributions: These are utility methods for distributions org.deeplearning4j.eval: These are evaluation classes, including the confusion matrix org.deeplearning4j.exceptions: This implements exception handlers org.deeplearning4j.models: These are supervised learning algorithms, including deep belief network, stacked autoencoder, stacked denoising autoencoder, and RBM org.deeplearning4j.nn: These are the implementation of components and algorithms based on neural networks, such as neural network, multi-layer network, convolutional multi-layer network, and so on org.deeplearning4j.optimize: These are neural net optimization algorithms, including back propagation, multi-layer optimization, output layer optimization, and so on org.deeplearning4j.plot: These are various methods for rendering data org.deeplearning4j.rng: This is a random data generator org.deeplearning4j.util: These are helper and utility methods MALLET Machine Learning for Language Toolkit (MALLET), is a large library of natural language processing algorithms and utilities. It can be used in a variety of tasks such as document classification, document clustering, information extraction, and topic modeling. It features command-line interface as well as Java API for several algorithms such as naive Bayes, HMM, Latent Dirichlet topic models, logistic regression, and conditional random fields. MALLET is available under Common Public License 1.0, which means that you can even use it in commercial applications. It can be downloaded from http://mallet.cs.umass.edu. MALLET instance is represented by name, label, data, and source. However, there are two methods to import data into the MALLET format, as shown in the following list: Instance per file: Each file, that is, document, corresponds to an instance and MALLET accepts the directory name for the input. Instance per line: Each line corresponds to an instance, where the following format is assumed: the instance_name label token. Data will be a feature vector, consisting of distinct words that appear as tokens and their occurrence count. The library comprises the following packages: cc.mallet.classify: These are algorithms for training and classifying instances, including AdaBoost, bagging, C4.5, as well as other decision tree models, multivariate logistic regression, naive Bayes, and Winnow2. cc.mallet.cluster: These are unsupervised clustering algorithms, including greedy agglomerative, hill climbing, k-best, and k-means clustering. cc.mallet.extract: This implements tokenizers, document extractors, document viewers, cleaners, and so on. cc.mallet.fst: This implements sequence models, including conditional random fields, HMM, maximum entropy Markov models, and corresponding algorithms and evaluators. cc.mallet.grmm: This implements graphical models and factor graphs such as inference algorithms, learning, and testing. For example, loopy belief propagation, Gibbs sampling, and so on. cc.mallet.optimize: These are optimization algorithms for finding the maximum of a function, such as gradient ascent, limited-memory BFGS, stochastic meta ascent, and so on. cc.mallet.pipe: These are methods as pipelines to process data into MALLET instances. cc.mallet.topics: These are topics modeling algorithms, such as Latent Dirichlet allocation, four-level pachinko allocation, hierarchical PAM, DMRT, and so on. cc.mallet.types: This implements fundamental data types such as dataset, feature vector, instance, and label. cc.mallet.util: These are miscellaneous utility functions such as command-line processing, search, math, test, and so on. To design, build, and deploy your own machine learning applications by leveraging key Java machine learning libraries, check out this book Machine learning in Java, published by Packt Publishing. 5 JavaScript machine learning libraries you need to know A non programmer’s guide to learning Machine learning Why use JavaScript for machine learning?  
Read more
  • 0
  • 0
  • 19690

article-image-building-a-scalable-postgresql-solution
Natasha Mathur
14 Apr 2019
12 min read
Save for later

Building a scalable PostgreSQL solution 

Natasha Mathur
14 Apr 2019
12 min read
The term  Scalability means the ability of a software system to grow as the business using it grows. PostgreSQL provides some features that help you to build a scalable solution but, strictly speaking, PostgreSQL itself is not scalable. It can effectively utilize the following resources of a single machine: It uses multiple CPU cores to execute a single query faster with the parallel query feature When configured properly, it can use all available memory for caching The size of the database is not limited; PostgreSQL can utilize multiple hard disks when multiple tablespaces are created; with partitioning, the hard disks could be accessed simultaneously, which makes data processing faster However, when it comes to spreading a database solution to multiple machines, it can be quite problematic because a standard PostgreSQL server can only run on a single machine.  In this article, we will look at different scaling scenarios and their implementation in PostgreSQL. The requirement for a system to be scalable means that a system that supports a business now, should also be able to support the same business with the same quality of service as it grows. This article is an excerpt taken from the book  'Learning PostgreSQL 11 - Third Edition' written by Andrey Volkov and Salahadin Juba. The book explores the concepts of relational databases and their core principles.  You’ll get to grips with using data warehousing in analytical solutions and reports and scaling the database for high availability and performance. Let's say a database can store 1 GB of data and effectively process 100 queries per second. What if with the development of the business, the amount of data being processed grows 100 times? Will it be able to support 10,000 queries per second and process 100 GB of data? Maybe not now, and not in the same installation. However, a scalable solution should be ready to be expanded to be able to handle the load as soon as it is needed. In scenarios where it is required to achieve better performance, it is quite common to set up more servers that would handle additional load and copy the same data to them from a master server. In scenarios where high availability is required, this is also a typical solution to continuously copy the data to a standby server so that it could take over in case the master server crashes.  Scalable PostgreSQL solution Replication can be used in many scaling scenarios. Its primary purpose is to create and maintain a backup database in case of system failure. This is especially true for physical replication. However, replication can also be used to improve the performance of a solution based on PostgreSQL. Sometimes, third-party tools can be used to implement complex scaling scenarios. Scaling for heavy querying Imagine there's a system that's supposed to handle a lot of read requests. For example, there could be an application that implements an HTTP API endpoint that supports the auto-completion functionality on a website. Each time a user enters a character in a web form, the system searches in the database for objects whose name starts with the string the user has entered. The number of queries can be very big because of the large number of users, and also because several requests are processed for every user session. To handle large numbers of requests, the database should be able to utilize multiple CPU cores. In case the number of simultaneous requests is really large, the number of cores required to process them can be greater than a single machine could have. The same applies to a system that is supposed to handle multiple heavy queries at the same time. You don't need a lot of queries, but when the queries themselves are big, using as many CPUs as possible would offer a performance benefit—especially when parallel query execution is used. In such scenarios, where one database cannot handle the load, it's possible to set up multiple databases, set up replication from one master database to all of them, making each them work as a hot standby, and then let the application query different databases for different requests. The application itself can be smart and query a different database each time, but that would require a special implementation of the data-access component of the application, which could look as follows: Another option is to use a tool called Pgpool-II, which can work as a load-balancer in front of several PostgreSQL databases. The tool exposes a SQL interface, and applications can connect there as if it were a real PostgreSQL server. Then Pgpool-II will redirect the queries to the databases that are executing the fewest queries at that moment; in other words, it will perform load-balancing: Yet another option is to scale the application together with the databases so that one instance of the application will connect to one instance of the database. In that case, the users of the application should connect to one of the many instances. This can be achieved with HTTP load-balancing: Data sharding When the problem is not the number of concurrent queries, but the size of the database and the speed of a single query, a different approach can be implemented. The data can be separated into several servers, which will be queried in parallel, and then the result of the queries will be consolidated outside of those databases. This is called data sharding. PostgreSQL provides a way to implement sharding based on table partitioning, where partitions are located on different servers and another one, the master server, uses them as foreign tables. When performing a query on a parent table defined on the master server, depending on the WHERE clause and the definitions of the partitions, PostgreSQL can recognize which partitions contain the data that is requested and would query only these partitions. Depending on the query, sometimes joins, grouping and aggregation could be performed on the remote servers. PostgreSQL can query different partitions in parallel, which will effectively utilize the resources of several machines. Having all this, it's possible to build a solution when applications would connect to a single database that would physically execute their queries on different database servers depending on the data that is being queried. It's also possible to build sharding algorithms into the applications that use PostgreSQL. In short, applications would be expected to know what data is located in which database, write it only there, and read it only from there. This would add a lot of complexity to the applications. Another option is to use one of the PostgreSQL-based sharding solutions available on the market or open source solutions. They have their own pros and cons, but the common problem is that they are based on previous releases of PostgreSQL and don't use the most recent features (sometimes providing their own features instead). One of the most popular sharding solutions is Postgres-XL, which implements a shared-nothing architecture using multiple servers running PostgreSQL. The system has several components: Multiple data nodes: Store the data A single global transaction monitor (GTM): Manages the cluster, provides global transaction consistency Multiple coordinator nodes: Supports user connections, builds query-execution plans, and interacts with the GTM and the data nodes Postgres-XL implements the same API as PostgreSQL, therefore the applications don't need to treat the server in any special way. It is ACID-compliant, meaning it supports transactions and integrity constraints. The COPY command is also supported. The main benefits of using Postgres-XL are as follows: It can scale to support more reading operations by adding more data nodes It can scale for to support more writing operations by adding more coordinator nodes The current release of Postgres-XL (at the time of writing) is based on PostgreSQL 10, which is relatively new The main downside of Postgres-XL is that it does not provide any high-availability features out of the box. When more servers are added to a cluster, the probability of the failure of any of them increases. That's why you should take care with backups or implement replication of the data nodes themselves. Postgres-XL is open source, but commercial support is available. Another solution worth mentioning is Greenplum. It's positioned as an implementation of a massive parallel-processing database, specifically designed for data warehouses. It has the following components: Master node: Manages user connections, builds query execution plans, manages transactions Data nodes: Store the data and perform queries Greenplum also implements the PostgreSQL API, and applications can connect to a Greenplum database without any changes. It supports transactions, but support for integrity constraints is limited. The COPY command is supported. The main benefits of Greenplum are as follows: It can scale to support more reading operations by adding more data nodes. It supports column-oriented table organization, which can be useful for data-warehousing solutions. Data compression is supported. High-availability features are supported out of the box. It's possible (and recommended) to add a secondary master that would take over in case a primary master crashes. It's also possible to add mirrors to the data nodes to prevent data loss. The drawbacks are as follows: It doesn't scale to support more writing operations. Everything goes through the single master node and adding more data nodes does not make writing faster. However, it's possible to import data from files directly on the data nodes. It uses PostgreSQL 8.4 in its core. Greenplum has a lot of improvements and new features added to the base PostgreSQL code, but it's still based on a very old release; however, the system is being actively developed. Greenplum doesn't support foreign keys, and support for unique constraints is limited. There are commercial and open source editions of Greenplum. Scaling for many numbers of connections Yet another use case related to scalability is when the number of database connections is great.  However, when a single database is used in an environment with a lot of microservices and each has its own connection pool, even if they don't perform too many queries, it's possible that hundreds or even thousands of connections are opened in the database. Each connection consumes server resources and just the requirement to handle a great number of connections can already be a problem, without even performing any queries. If applications don't use connection pooling and open connections only when they need to query the database and close them afterwards, another problem could occur. Establishing a database connection takes time—not too much, but when the number of operations is great, the total overhead will be significant. There is a tool, named PgBouncer, that implements a connection-pool functionality. It can accept connections from many applications as if it were a PostgreSQL server and then open a limited number of connections towards the database. It would reuse the same database connections for multiple applications' connections. The process of establishing a connection from an application to PgBouncer is much faster than connecting to a real database because PgBouncer doesn't need to initialize a database backend process for the session. PgBouncer can create multiple connection pools that work in one of the three modes: Session mode: A connection to a PostgreSQL server is used for the lifetime of a client connection to PgBouncer. Such a setup could be used to speed up the connection process on the application side. This is the default mode. Transaction mode: A connection to PostgreSQL is used for a single transaction that a client performs. That could be used to reduce the number of connections at the PostgreSQL side when only a few translations are performed simultaneously. Statement mode: A database connection is used for a single statement. Then it is returned to the pool and a different connection is used for the next statement. This mode is similar to the transaction mode, though more aggressive. Note that multi-statement transactions are not possible when statement mode is used. Different pools can be set up to work in different modes. It's possible to let PgBouncer connect to multiple PostgreSQL servers, thus working as a reverse proxy. The way PgBouncer could be used is represented in the following diagram: PgBouncer establishes several connections to the database. When an application connects to PgBouncer and starts a transaction, PgBouncer assigns an existing database connection to that application, forwards all SQL commands to the database, and delivers the results back. When the transaction is finished, PgBouncer will dissociate the connections, but not close them. If another application starts a transaction, the same database connection could be used. Such a setup requires configuring PgBouncer to work in transaction mode. PostgreSQL provides several ways to implement replication that would maintain a copy of the data from a database on another server or servers. This can be used as a backup or a standby solution that would take over in case the main server crashes. Replication can also be used to improve the performance of a software system by making it possible to distribute the load on several database servers. In this article, we discussed the problem of building scalable solutions based on PostgreSQL utilizing the resources of several servers. We looked at scaling for querying, data sharding, as well as scaling for many numbers of connections.  If you enjoyed reading this article and want to explore other topics, be sure to check out the book 'Learning PostgreSQL 11 - Third Edition'. Handling backup and recovery in PostgreSQL 10 [Tutorial] Understanding SQL Server recovery models to effectively backup and restore your database Saving backups on cloud services with ElasticSearch plugins
Read more
  • 0
  • 0
  • 19578

article-image-building-better-bundles-why-processenvnodeenv-matters-optimized-builds
Mark Erikson
14 Nov 2016
5 min read
Save for later

Building Better Bundles: Why process.env.NODE_ENV Matters for Optimized Builds

Mark Erikson
14 Nov 2016
5 min read
JavaScript developers are keenly aware of the need to reduce the size of deployed assets, especially in today's world of single-page apps. This usually means running increasingly complex JavaScript codebases through build steps that produce a minified bundle for deployment. However, if you read a typical tutorial on setting up a build tool like Browserify or Webpack, you'll see numerous references to a variable called process.env.NODE_ENV. Tutorials always talk about how this needs to be set to a value like "production" in order to produce a properly optimized bundle, but most articles never really spell out why this value matters and how it relates to build optimization. Here's an explanation of why process.env.NODE_ENV is used and how it fits into the typical build process. Operating system environment variables are widely used as a method of configuring applications, especially as a way to activate behavior based on different deployment environments (such as development vs testing vs production). Node.js exposes the current process's environment variables to the script as an object called process.env. From there, the Express web server framework popularized using an environment variable called NODE_ENV as a flag to indicate whether the server should be running in "development" mode vs "production" mode. At runtime, the script looks up that value by checking process.env.NODE_ENV. Because it was used within the Node ecosystem, browser-focused libraries also started using it to determine what environment they were running in, and using it to control optimizations and debug mode behavior. For example, React uses it as the equivalent of a C preprocessor #ifdef to act as conditional checking for debug logging and perf tracking, roughly like this: function someInternalReactFunction() { // do actual work part 1 if(process.env.NODE_ENV === "development") { // do debug-only work, like recording perf stats } // do actual work part 2 } If process.env.NODE_ENV is set to "production", all those if clauses will evaluate to false, and the potentially expensive debug code won't run. In addition, in conjunction with a tool like UglifyJS that does minification and removal of dead code blocks, a clause that is surrounded with if(process.env.NODE_ENV === "development") will become dead code in a production build and be stripped out, thus reducing bundled code size and execution time. However, because the NODE_ENV environment variable and the corresponding process.env.NODE_ENV runtime field are normally server-only concepts, by default those values do not exist in client-side code. This is where build tools such as Webpack's DefinePlugin or the Browserify Envify transform come in, which perform search-and-replace operations on the original source code. Since these build tools are doing transformation of your code anyway, they can force the existence of global values such as process.env.NODE_ENV. (It's also important to note that because DefinePlugin in particular does a direct text replacement, the value given to DefinePlugin must include actual quotes inside of the string itself. Typically, this is done either with alternate quotes, such as '"production"', or by using JSON.stringify("production")). Here's the key: the build tool could set that value to anything, based on any condition that you want, as you're defining your build configuration. For example, I could have a webpack.production.config.js Webpack config file that always uses the DefinePlugin to set that value to "production" throughout the client-side bundle. It wouldn't have to be checking the actual current value of the "real" process.env.NODE_ENV variable while generating the Webpack config, because as the developer I would know that any time I'm doing a "production" build, I would want to set that value in the client code to "production'. This is where the "code I'm running as part of my build process" and "code I'm outputting from my build process" worlds come together. Because your build script is itself most likely to be JavaScript code running under Node, it's going to have process.env.NODE_ENV available to it as it runs. Because so many tools and libraries already share the convention of using that field's value to determine their dev-vs-production status, the common convention is to use the current value of that field inside the build script as it's running to also determine the value of that field as applied to the client code being transformed. Ultimately, it all comes down to a few key points: NODE_ENV is a system environment variable that Node exposes into running scripts. It's used by convention to determine dev-vs-prod behavior, by both server tools, build scripts, and client-side libraries. It's commonly used inside of build scripts (such as Webpack config generation) as both an input value and an output value, but the tie between the two is still just convention. Build tools generally do a transform step on the client-side code, replace any references to process.env.NODE_ENV with the desired value, and the resulting code will contain dead code blocks as debug-only code is now inside of an if(false)-type condition, ensuring that code doesn't execute at runtime. Minifier tools such as UglifyJS will strip out the dead code blocks, leaving the production bundle smaller. So, the next time you see process.env.NODE_ENV mentioned in a build script, hopefully you'll have a much better idea why it's there. About the author Mark Erikson is a software engineer living in southwest Ohio, USA, where he patiently awaits the annual heartbreak from the Reds and the Bengals. Mark is author of the Redux FAQ, maintains the React/Redux Links list and Redux Addons Catalog, and occasionally tweets at @acemarke. He can be usually found in the Reactiflux chat channels, answering questions about React and Redux. He is also slightly disturbed by the number of third-person references he has written in this bio!
Read more
  • 0
  • 0
  • 19479
article-image-what-difference-between-functional-and-object-oriented-programming
Antonio Cucciniello
17 Sep 2017
5 min read
Save for later

What is the difference between functional and object oriented programming?

Antonio Cucciniello
17 Sep 2017
5 min read
There are two very popular programming paradigms in software development that developers design and program to. They are known as object oriented programming and functional programming. You've probably heard of these terms before, but what exactly are they and what is the difference between functional and object oriented programming? Let's take a look. What is object oriented programming? Object oriented programming is a programming paradigm in which you program using objects to represent things you are programming about (sometimes real world things). These objects could be data structures. The objects hold data about them in attributes. The attributes in the objects are manipulated through methods or functions that are given to the object. For instance, we might have a Person object that represents all of the data a person would have: weight, height, skin color, hair color, hair length, and so on. Those would be the attributes. Then the person object would also have things that it can do such as: pick box up, put box down, eat, sleep, etc. These would be the functions that play with the data the object stores. Engineers who program using object oriented design say that it is a style of programming that allows you to model real world scenarios much simpler. This allows for a good transition from requirements to code that works like the customer or user wants it to. Some examples of object oriented languages include C++, Java, Python, C#, Objective-C, and Swift. Want to learn object oriented programming? We recommend you start with Learning Object Oriented Programming. What is functional programming? Functional programming is the form of programming that attempts to avoid changing state and mutable data. In a functional program, the output of a function should always be the same, given the same exact inputs to the function. This is because the outputs of a function in functional programming purely relies on arguments of the function, and there is no magic that is happening behind the scenes. This is called eliminating side effects in your code. For example, if you call function getSum() it calculates the sum of two inputs and returns the sum. Given the same inputs for x and y, we will always get the same output for sum. This allows the function of a program to be extremely predictable. Each small function does its part and only its part. It allows for very modular and clean code that all works together in harmony. This is also easier when it comes to unit testing. Some examples of Functional Programming Languages include Lisp, Clojure, and F#. Problems with object oriented programming There are a few problems with object oriented programing. Firstly, it is known to be not as reusable. Because some of your functions depend on the class that is using them, it is hard to use some functions with another class. It is also known to be typically less efficient and more complex to deal with. Plenty of times, some object oriented designs are made to model large architectures and can be extremely complicated. Problems with functional programming Functional programming is not without its flaws either. It really takes a different mindset to approach your code from a functional standpoint. It's easy to think in object oriented terms, because it is similar to how the object being modeled happens in the real world. Functional programming is all about data manipulation. Converting a real world scenario to just data can take some extra thinking. Due to its difficulty when learning to program this way, there are fewer people that program using this style, which could make it hard to collaborate with someone else or learn from others because there will naturally be less information on the topic. A comparison between functional and object oriented programming Both programming concepts have a goal of wanting to create easily understandable programs that are free of bugs and can be developed fast. Both concepts have different methods for storing the data and how to manipulate the data. In object oriented programming, you store the data in attributes of objects and have functions that work for that object and do the manipulation. In functional programming, we view everything as data transformation. Data is not stored in objects, it is transformed by creating new versions of that data and manipulating it using one of the many functions. I hope you have a clearer picture of what the difference between functional and object oriented programming. They can both be used separately or can be mixed to some degree to suite your needs. Ultimately you should take into the consideration the advantages and disadvantages of using both before making that decision. Antonio Cucciniello is a Software Engineer with a background in C, C++ and JavaScript (Node.js) from New Jersey. His most recent project called Edit Docs is an Amazon Echo skill that allows users to edit Google Drive files using your voice. He loves building cool things with software, reading books on self-help and improvement, finance, and entrepreneurship. Follow him on Twitter @antocucciniello, and follow him on GitHub here.
Read more
  • 0
  • 1
  • 19454

article-image-libraries-for-geospatial-analysis
Aarthi Kumaraswamy
22 May 2018
12 min read
Save for later

Top 7 libraries for geospatial analysis

Aarthi Kumaraswamy
22 May 2018
12 min read
The term geospatial refers to finding information that is located on the earth's surface. This can include, for example, the position of a cellphone tower, the shape of a road, or the outline of a country. Geospatial data often associates some piece of information with a particular location. Geospatial development is the process of writing computer programs that can access, manipulate, and display this type of information. Internally, geospatial data is represented as a series of coordinates, often in the form of latitude and longitude values. Additional attributes, such as temperature, soil type, height, or the name of a landmark, are also often present. There can be many thousands (or even millions) of data points for a single set of geospatial data. In addition to the prosaic tasks of importing geospatial data from various external file formats and translating data from one projection to another, geospatial data can also be manipulated to solve various interesting problems. Obvious examples include the task of calculating the distance between two points, calculating the length of a road, or finding all data points within a given radius of a selected point. We use libraries to solve all of these problems and more. Today we will look at the major libraries used to process and analyze geospatial data. GDAL/OGR GEOS Shapely Fiona Python Shapefile Library (pyshp) pyproj Rasterio GeoPandas This is an excerpt from the book, Mastering Geospatial Analysis with Python by Paul Crickard, Eric van Rees, and Silas Toms. Geospatial Data Abstraction Library (GDAL) and the OGR Simple Features Library The Geospatial Data Abstraction Library (GDAL)/OGR Simple Features Library combines two separate libraries that are generally downloaded together as a GDAL. This means that installing the GDAL package also gives access to OGR functionality. The reason GDAL is covered first is that other packages were written after GDAL, so chronologically, it comes first. As you will notice, some of the packages covered in this post extend GDAL's functionality or use it under the hood. GDAL was created in the 1990s by Frank Warmerdam and saw its first release in June 2000. Later, the development of GDAL was transferred to the Open Source Geospatial Foundation (OSGeo). Technically, GDAL is a little different than your average Python package as the GDAL package itself was written in C and C++, meaning that in order to be able to use it in Python, you need to compile GDAL and its associated Python bindings. However, using conda and Anaconda makes it relatively easy to get started quickly. Because it was written in C and C++, the online GDAL documentation is written in the C++ version of the libraries. For Python developers, this can be challenging, but many functions are documented and can be consulted with the built-in pydoc utility, or by using the help function within Python. Because of its history, working with GDAL in Python also feels a lot like working in C++ rather than pure Python. For example, a naming convention in OGR is different than Python's since you use uppercase for functions instead of lowercase. These differences explain the choice for some of the other Python libraries such as Rasterio and Shapely, which are also covered in this chapter, that has been written from a Python developer's perspective but offer the same GDAL functionality. GDAL is a massive and widely used data library for raster data. It supports the reading and writing of many raster file formats, with the latest version counting up to 200 different file formats that are supported. Because of this, it is indispensable for geospatial data management and analysis. Used together with other Python libraries, GDAL enables some powerful remote sensing functionalities. It's also an industry standard and is present in commercial and open source GIS software. The OGR library is used to read and write vector-format geospatial data, supporting reading and writing data in many different formats. OGR uses a consistent model to be able to manage many different vector data formats. You can use OGR to do vector reprojection, vector data format conversion, vector attribute data filtering, and more. GDAL/OGR libraries are not only useful for Python programmers but are also used by many GIS vendors and open source projects. The latest GDAL version at the time of writing is 2.2.4, which was released in March 2018. GEOS The Geometry Engine Open Source (GEOS) is the C/C++ port of a subset of the Java Topology Suite (JTS) and selected functions. GEOS aims to contain the complete functionality of JTS in C++. It can be compiled on many platforms, including Python. As you will see later on, the Shapely library uses functions from the GEOS library. In fact, there are many applications using GEOS, including PostGIS and QGIS. GeoDjango, also uses GEOS, as well as GDAL, among other geospatial libraries. GEOS can also be compiled with GDAL, giving OGR all of its capabilities. The JTS is an open source geospatial computational geometry library written in Java. It provides various functionalities, including a geometry model, geometric functions, spatial structures and algorithms, and i/o capabilities. Using GEOS, you have access to the following capabilities—geospatial functions (such as within and contains), geospatial operations (union, intersection, and many more), spatial indexing, Open Geospatial Consortium (OGC) well-known text (WKT) and well-known binary (WKB) input/output, the C and C++ APIs, and thread safety. Shapely Shapely is a Python package for manipulation and analysis of planar features, using functions from the GEOS library (the engine of PostGIS) and a port of the JTS. Shapely is not concerned with data formats or coordinate systems but can be readily integrated with such packages. Shapely only deals with analyzing geometries and offers no capabilities for reading and writing geospatial files. It was developed by Sean Gillies, who was also the person behind Fiona and Rasterio. Shapely supports eight fundamental geometry types that are implemented as a class in the shapely.geometry module—points, multipoints, linestrings, multilinestrings, linearrings, multipolygons, polygons, and geometrycollections. Apart from representing these geometries, Shapely can be used to manipulate and analyze geometries through a number of methods and attributes. Shapely has mainly the same classes and functions as OGR while dealing with geometries. The difference between Shapely and OGR is that Shapely has a more Pythonic and very intuitive interface, is better optimized, and has a well-developed documentation. With Shapely, you're writing pure Python, whereas with GEOS, you're writing C++ in Python. For data munging, a term used for data management and analysis, you're better off writing in pure Python rather than C++, which explains why these libraries were created. For more information on Shapely, consult the documentation. This page also has detailed information on installing Shapely for different platforms and how to build Shapely from the source for compatibility with other modules that depend on GEOS. This refers to the fact that installing Shapely will require you to upgrade NumPy and GEOS if these are already installed. Fiona Fiona is the API of OGR. It can be used for reading and writing data formats. The main reason for using it instead of OGR is that it's closer to Python than OGR as well as more dependable and less error-prone. It makes use of two markup languages, WKT and WKB, for representing spatial information with regards to vector data. As such, it can be combined well with other Python libraries such as Shapely, you would use Fiona for input and output, and Shapely for creating and manipulating geospatial data. While Fiona is Python compatible and our recommendation, users should also be aware of some of the disadvantages. It is more dependable than OGR because it uses Python objects for copying vector data instead of C pointers, which also means that they use more memory, which affects the performance. Python shapefile library (pyshp) The Python shapefile library (pyshp) is a pure Python library and is used to read and write shapefiles. The pyshp library's sole purpose is to work with shapefiles—it only uses the Python standard library. You cannot use it for geometric operations. If you're only working with shapefiles, this one-file-only library is simpler than using GDAL. pyproj The pyproj is a Python package that performs cartographic transformations and geodetic computations. It is a Cython wrapper to provide Python interfaces to PROJ.4 functions, meaning you can access an existing library of C code in Python. PROJ.4 is a projection library that transforms data among many coordinate systems and is also available through GDAL and OGR. The reason that PROJ.4 is still popular and widely used is two-fold: Firstly, because it supports so many different coordinate systems Secondly, because of the routes it provides to do this—Rasterio and GeoPandas, two Python libraries covered next, both use pyproj and thus PROJ.4 functionality under the hood The difference between using PROJ.4 separately instead of using it with a package such as GDAL is that it enables you to re-project individual points, and packages using PROJ.4 do not offer this functionality. The pyproj package offers two classes—the Proj class and the Geod class. The Proj class performs cartographic computations, while the Geod class performs geodetic computations. Rasterio Rasterio is a GDAL and NumPy-based Python library for raster data, written with the Python developer in mind instead of C, using Python language types, protocols, and idioms. Rasterio aims to make GIS data more accessible to Python programmers and helps GIS analysts learn important Python standards. Rasterio relies on concepts of Python rather than GIS. Rasterio is an open source project from the satellite team of Mapbox, a provider of custom online maps for websites and applications. The name of this library should be pronounced as raster-i-o rather than ras-te-rio. Rasterio came into being as a result of a project called the Mapbox Cloudless Atlas, which aimed to create a pretty-looking basemap from satellite imagery. One of the software requirements was to use open source software and a high-level language with handy multi-dimensional array syntax. Although GDAL offers proven algorithms and drivers, developing with GDAL's Python bindings feels a lot like C++. Therefore, Rasterio was designed to be a Python package at the top, with extension modules (using Cython) in the middle, and a GDAL shared library on the bottom. Other requirements for the raster library were being able to read and write NumPy ndarrays to and from data files, use Python types, protocols, and idioms instead of C or C++ to free programmers from having to code in two languages. For georeferencing, Rasterio follows the lead of pyproj. There are a couple of capabilities added on top of reading and writing, one of them being a features module. Reprojection of geospatial data can be done with the rasterio.warp module. Rasterio's project homepage can be found on Github. GeoPandas GeoPandas is a Python library for working with vector data. It is based on the pandas library that is part of the SciPy stack. SciPy is a popular library for data inspection and analysis, but unfortunately, it cannot read spatial data. GeoPandas was created to fill this gap, taking pandas data objects as a starting point. The library also adds functionality from geographical Python packages. GeoPandas offers two data objects—a GeoSeries object that is based on a pandas Series object and a GeoDataFrame, based on a pandas DataFrame object, but adding a geometry column for each row. Both GeoSeries and GeoDataFrame objects can be used for spatial data processing, similar to spatial databases. Read and write functionality is provided for almost every vector data format. Also, because both Series and DataFrame objects are subclasses from pandas data objects, you can use the same properties to select or subset data, for example .loc or .iloc. GeoPandas is a library that employs the capabilities of newer tools, such as Jupyter Notebooks, pretty well, whereas GDAL enables you to interact with data records inside of vector and raster datasets through Python code. GeoPandas takes a more visual approach by loading all records into a GeoDataFrame so that you can see them all together on your screen. The same goes for plotting data. These functionalities were lacking in Python 2 as developers were dependent on IDEs without extensive data visualization capabilities which are now available with Jupyter Notebooks. We've provided an overview of the most important open source packages for processing and analyzing geospatial data. The question then becomes when to use a certain package and why. GDAL, OGR, and GEOS are indispensable for geospatial processing and analyzing, but were not written in Python, and so they require Python binaries for Python developers. Fiona, Shapely, and pyproj were written to solve these problems, as well as the newer Rasterio library. For a more Pythonic approach, these newer packages are preferable to the older C++ packages with Python binaries (although they're used under the hood). Now that you have an idea of what options are available for a certain use case and why one package is preferable over another, here’s something you should always remember. As is often the way in programming, there might be multiple solutions for one particular problem. For example, when dealing with shapefiles, you could use pyshp, GDAL, Shapely, or GeoPandas, depending on your preference and the problem at hand. Introduction to Data Analysis and Libraries 15 Useful Python Libraries to make your Data Science tasks Easier “Pandas is an effective tool to explore and analyze data”: An interview with Theodore Petrou Using R to implement Kriging – A Spatial Interpolation technique for Geostatistics data  
Read more
  • 0
  • 0
  • 19089

article-image-brief-history-minecraft-modding
Aaron Mills
03 Jun 2015
7 min read
Save for later

A Brief History of Minecraft Modding

Aaron Mills
03 Jun 2015
7 min read
Minecraft modding has been around since nearly the beginning. During that time it has gone through several transformations or “eras." The early days and early mods looked very different from today. I first became involved in the community during Mid-Beta, so everything that happened before then is second hand knowledge. A great deal has been lost to the sands of time, but the important stops along the way are remembered, as we shall explore. Minecraft has gone through several development stages over the years. Interestingly, these stages also correspond to the various “eras” of Minecraft Modding. Minecraft Survival was first experienced as Survival Test during Classic, then again in the Indev stage, which gave way to Infdev, then to Alpha and Beta before finally reaching Release. But before all that was Classic. Classic was released in May of 2009 and development continued into September of that year. Classic saw the introduction of Survival and Multiplayer. During this period of Minecraft’s history, modding was in its infancy. On the one hand, Server modding thrived during this stage with several different Server mods available. (These mods were the predecessors to Bukkit, which we will cover later.) Generally, the purpose of these mods was to give server admins more tools for maintaining their servers. On the other hand, however, Client side mods, ones that add new content, didn’t really start appearing until the Alpha stage. Alpha was released in late June of 2010, and it would continue for the rest of the year. Prior to Alpha, came Indev and Infdev, but there isn’t much evidence of any mods during that time period, possibly because of the lack of Multiplayer in Indev and Infdev. Alpha brought the return of Multiplayer, and during this time Minecraft began to see its first simple Client mods. Initially it was just simple modification of existing content: adding support for Higher Resolution textures, new arrow types, bug fixes, compass modifications, etc. The mods were simple and small. This began to change, though, beginning with the creation of the Minecraft Coder Pack, which was later renamed the Mod Coder Pack, commonly known as MCP. (One of the primary creators of MCP, Michael “Searge” Stoyke, now actually works for Mojang.) MCP saw its first release for Alpha 1.1.2_01 sometime in mid 2010. Despite being easily decompiled, Minecraft code was also obfuscated. Obfuscation is when you take all the meaningful names and words in the code and replace it with non-human readable nonsense. The computer can still make sense of it just fine, but humans have a hard time. MCP resolved this limitation by applying meaningful names to the code, making modding significantly easier than ever before. At the same time, but developing completely independently, was the server mod hMod, which gave some simple but absolutely necessary tools to server admins. However, hMod was in trouble as the main dev was MIA. This situation eventually led to the creation of Bukkit, a server mod designed from the ground up to support “plugins” and do everything that hMod couldn’t do. Bukkit was created by a group of people who were also eventually hired by Mojang: Nathan 'Dinnerbone' Adams, Erik 'Grum' Broes, Warren 'EvilSeph' Loo, and Nathan 'Tahg' Gilbert. Bukkit went on to become possibly the most popular Minecraft mod ever created. Many in fact believe its existence is largely responsible for the popularity of online Minecraft servers. However, it will remain largely incompatible with client side mods for some time. Not to be left behind, the client saw another major development late in the year: Risugami’s ModLoader. ModLoader was transformational. Prior to the existence of ModLoader, if you wanted to use two mods, you would have to manually merge the code, line by line, yourself. There were many common tasks that couldn’t be done without editing Minecraft’s base code, things such as adding new blocks and items. ModLoader changed that by creating a framework where simple mods could hook into ModLoader code to perform common tasks that previously required base edits. It was simple, and it would never really expand beyond its original scope. Still, it led modding into a new era. Minecraft Beta, what many call the “Golden Age” of modding, was released just before Christmas in 2010 and would continue through 2011. Beta saw the rise of many familiar mods that are still recognized today, including my own mod, Railcraft. Also IndustrialCraft, Buildcraft, Redpower, and Better than Wolves all saw their start during this period. These were major mods that added many new blocks and features to Minecraft. Additionally, the massive Aether mod, which recently received a modern reboot, was also released during Beta. These mods and more redefined the meaning of “Minecraft Mods”. They existed on a completely new scale, sometimes completely changing the game. But there were still flaws. Mods were still painful to create and painful to use. You couldn’t use IndustrialCraft and Buildcraft at the same time; they just edited too many of the same base files. ModLoader only covered the most common base edits, barely touching the code, and not enough for a major mod. Additionally, to use a mod, you still had to manually insert code into the Minecraft jar, a task that turned many players off of modding. Seeing that their mods couldn’t be used together, the creators of several major mods launched a new project. They would call it Minecraft Forge. Started by Eloraam of Redpower and SpaceToad of Buildcraft, it would see rapid adoption by many of the major mods of the time. Forge built on top of ModLoader, greatly expanding the number of base hooks and allowing many more mods to work together than was previously possible. This ushered in the true “Golden Age” of modding, which would continue from Beta and into Release. Minecraft 1.0 was released in November of 2011, heralding Minecraft’s “Official” release. Around the same time, client modding was undergoing a shift. Many of the most prominent developers were moving on to other things, including the entire Forge team. For the most part, their mods would survive without them, but some would not. Redpower, for example ceased all development in late 2012. Eloraam, SpaceToad, and Flowerchild would hand the reigns of Forge off to LexManos, a relatively unknown name at the time. The “Golden Age” was at an end, but it was replaced by an explosion of new mods and modding was becoming even more popular than ever. The new Forge team, consisting mainly of LexManos and cpw, would bring many new innovations to modding. Eventually they even developed a replacement for Risugami’s ModLoader, naming it ForgeModLoader and incorporating it into Forge. Users would no longer be required to muck around with Minecraft’s internals to install mods. Innovation has continued to the present day, and mods for Minecraft have become too numerous to count. However, the picture for server mods hasn’t been so rosy. Bukkit, the long dominant server mod, suffered a killing blow in 2014. Licensing conflicts developed between the original creators and maintainers, largely revolving around the who “owned” the project after the primary maintainers resigned. Ultimately, one of the most prolific maintainers used a technicality to invalidate the rights of the project to use his code, effectively killing the entire project. A replacement has yet to develop, leaving the server community limping along on increasingly outdated code. But one shouldn’t be too concerned about the future. There have been challenges in the past, but nearly every time a project died, it was soon replaced by something even better. Minecraft has one of the largest, most vibrant, and most mainstream modding communities ever to exist. It’s had a long and varied history, and this has been just a brief glimpse into that heritage. There are many more events, both large and small, that have helped shape the community. May the future of Minecraft continue to be as interesting. About the Author Aaron Mills was born in 1983 and lives in the Pacific Northwest, which is a land rich in lore, trees, and rain. He has a Bachelor's Degree in Computer Science and studied at Washington State University Vancouver. He is best known for his work on the Minecraft Mod, Railcraft, but has also contributed significantly to the Minecraft Mods of Forestry and Buildcraft as well some contributions to the Minecraft Forge project.
Read more
  • 0
  • 0
  • 18832
article-image-how-to-create-a-web-designer-resume-that-lands-you-a-job
Guest Contributor
19 Jul 2018
7 min read
Save for later

How to create a web designer resume that lands you a Job

Guest Contributor
19 Jul 2018
7 min read
Clearly, there’s lot of competition for web designer jobs, - with the salary rising each year - so it’s crucial you find a way to make your resume stand out. You need to balance creativity with professionalism, all while making sure your experience, personality, and skills shine through. Over the years, people have created numerous resumes for web designers and everybody knows what it takes to get that job. Follow this guide to write a creative and attention-grabbing web designer resume. Note: All images in this article are courtesy of zety.com resume templates page and the guide. Format is a window to your mind Because you’re applying for a design position, the look and format of your resume is very important. It gives prospective employers a sense of your design philosophies. Use white space, and clear, legible fonts to help a hiring manager easily find your information. To stand apart from the crowd, avoid using a word processor to create your resume. Instead use InDesign or Illustrator to design something creative and less cookie-cutter like. Submit your resume in PDF format to avoid formatting errors that will ruin the look of your document. Sometimes a job posting will specifically forbid submitting in PDF though, so watch out for that. Highlight your Experience The key to writing a good experience section for a web designer is keeping it brief and relevant, while highlighting your career achievements. Add no more than three to five bullet points with measurable achievements per past position. Don’t just list your past company and when you worked there. “Discuss what you did and include some tangible accomplishments. If you created a custom graphic set for clients, mention that, and also what percentage were satisfied (hopefully it’s very high.) Prove you have the necessary experience to do the job well,” advises Terrence Wood, resume proofreader at Paper Fellows. Education Your education is not nearly as important as your experience, but you still need to present it right. That means using this section to talk about your strong points. Include coursework and achievements that are relevant to the job description. Maybe you wrote a column about web design in your college’s newspaper, things like this help you stand out to a hiring manager, especially if you are just starting your career in web design. Also, include the GPA here. Showcase your skills Everyone is going to list their skills, if you want that interview you need to do something to make yourself seem exceptional. The first thing you’ll do is take a good look at the job description and note what skills and responsibilities they mention. Now you know what skills to include, but including them is not enough. You need to prove you have them by giving examples of times you used them at past jobs. Don’t just list that you are proficient in Adobe Creative Suite, prove it by describing how you used it to tackle web design for 90% of client projects. Up the ante even more and include samples of your past work. This is a good spot to include your portfolio, any certifications that you have or anything that can help you let them know just how good you are at your job and confirm your skills. If you've had a predominantly freelance career, list the companies or individuals that you've worked for and include examples of work for each of them. You can do the same thing if you had a '9 to 5' career – simply list your previous jobs and show examples of your best work. It's a good idea to let your potential employers know about any future skills you plan to acquire. Use infographics You can give your resume a really unique look by using infographics, while still keeping it professional. Divide your resume layout into a grid with two columns and four or five rows. Now place one category of data into each square of your grid. Next, transform each section into an infographic. Use icons to represent different skills or awards. You can use your design software’s shape tools to create charts and graphs. Programs like Adobe InDesign can be used to create your infographics. You can also use Canva or Visme. Keep it professional It’s a great idea to inject some creativity into you resume. You want to stand out, and after all, it is a design resume. But it’s also important to balance that creativity with professionalism. A hiring manager will make some judgements about your personality based on how your resume looks. Be subtle in your creativity. Use colors that are easy on the eyes, and keep fonts reasonable. There can be a lot of beauty in simplicity. Stick to the basics, place content in an order familiar to recruiters to avoid making them have to work for the information they need. Remember that your primary goal is to communicate your information clearly. Write a cover letter Some people say that it's not really necessary but it's your chance to stand out. Maybe there is something that isn't on your resume or you want to seem more appealing and human – cover letter is a good chance to do all of that. Cover letter is a great place to elaborate how you'll be able to meet their needs. It's a good opportunity to also show them that you have done your research and know their company. Writing resources for your resume ViaWriting and Writing Populist: These are grammar resources you can use to check over your resume for grammatical mistakes. Resume Service: This is a resume service you can use to improve the quality of your web designer resume. Boomessays and EssayRoo: These are online proofreading tools, suggested by Revieweal, you can use to make sure you resume is polished and free of errors. My Writing Way and Academadvisor: Check out these career writing blogs for tips and ideas on how to improve your resume. You’ll find posts here by people who have written web designer resumes before. OXEssays and UKWritings: These are editing tools, recommended by UKWritings review, you can use to go over your resume for typos and other errors. StateofWriting and SimpleGrad: Check out these writing guides for suggestions on how to improve your resume. Even experienced writers can benefit from some extra guidance now and then. ResumeLab: Learn what to include in a cover letter. The job market for web designers is competitive. Make sure you lead with your best qualities and skills, and be sure they fit the job description as closely as possible. Be creative, but ensure you keep your resume professional as well. Now go have fun using this guide to write a creative and attention-grabbing web design resume. Author bio Grace Carter is a resume proofreader at Assignment Writing Service and at Australian Help, where she helps with CV editing and cover letter proofreading. Also, Grace teaches business writing at Academized educational website.   Is your web design responsive? “Be objective, fight for the user, and test with real users on the go!” – Interview with design purist, Will Grant Tips and tricks to optimize your responsive web design  
Read more
  • 0
  • 2
  • 18817

article-image-tools-for-reinforcement-learning
Pravin Dhandre
21 May 2018
4 min read
Save for later

Top 5 tools for reinforcement learning

Pravin Dhandre
21 May 2018
4 min read
After deep learning, reinforcement Learning (RL), the hottest branch of Artificial Intelligence that is finding speedy adoption in tech-driven companies. Simply put, reinforcement learning is all about algorithms tracking previous actions or behaviour and providing optimized decisions using trial-and-error principle. Read How Reinforcement Learning works to know more. It might sound theoretical but gigantic firms like Google and Uber have tested out this exceptional mechanism and have been highly successful in cutting edge applied robotics fields such as self driving vehicles. Other top giants including Amazon, Facebook and Microsoft have centralized their innovations around deep reinforcement learning across Automotive, Supply Chain, Networking, Finance and Robotics. With such humongous achievement, reinforcement learning libraries has caught the Artificial Intelligence developer communities’ eye and have gained prime interest for training agents and reinforcing the behavior of the trained agents. In fact, researchers believe in the tremendous potential of reinforcement learning to address unsolved real world challenges like material discovery, space exploration, drug discovery etc and build much smarter artificial intelligence solutions. In this article, we will have a look at the most promising open source tools and libraries to start building your reinforcement learning projects on. OpenAI Gym OpenAI Gym, the most popular environment for developing and comparing reinforcement learning models, is completely compatible with high computational libraries like TensorFlow. The Python based rich AI simulation environment offers support for training agents on classic games like Atari as well as for other branches of science like robotics and physics such as Gazebo simulator and MuJoCo simulator. The Gym environment also offers APIs which facilitate feeding observations along with rewards back to agents. OpenAI has also recently released a new platform, Gym Retro made up of 58 varied and specific scenarios from Sonic the Hedgehog, Sonic the Hedgehog 2, and Sonic 3 games. Reinforcement learning enthusiasts and AI game developers can register for this competition. Read: How to build a cartpole game using OpenAI Gym TensorFlow This is an another well-known open-source library by Google followed by more than 95,000 developers everyday in areas of natural language processing, intelligent chatbots, robotics, and more. The TensorFlow community has developed an extended version called TensorLayer providing popular RL modules that can be easily customized and assembled for tackling real-world machine learning challenges. The TensorFlow community allows for the framework development in most popular languages such as Python, C, Java, JavaScript and Go. Google & its TensorFlow team are in the process of coming up with a Swift-compatible version to enable machine learning  on Apple environment. Read How to implement Reinforcement Learning with TensorFlow Keras Keras presents simplicity in implementing neural networks with just a few lines of codes with faster execution. It provides senior developers and principal scientists with a high-level interface to high tensor computation framework, TensorFlow and centralizes on the model architecture. So, if you have any existing RL models written in TensorFlow, just pick the Keras framework and you can transfer the learning to the related machine learning problem. DeepMind Lab DeepMind Lab is a Google 3D platform with customization for agent-based AI research. It is utilized to understand how self-sufficient artificial agents learn complicated tasks in large, partially observed environments. With the victory of its AlphaGo program against go players, in early 2016, DeepMind captured the public’s attention. With its three hubs spread across London, Canada and France, the DeepMind team is focussing on core AI fundamentals which includes building a single AI system backed by state-of-the-art methods and distributional reinforcement learning. To know more about how DeepMind Lab works, read How Google’s DeepMind is creating images with artificial intelligence. Pytorch Pytorch, open sourced by Facebook, is another well-known deep learning library adopted by many reinforcement learning researchers. It was recent preferred almost unanimously by top 10 finishers in Kaggle competition. With dynamic neural networks and strong GPU acceleration, Rl practitioners use it extensively to conduct experiments on implementing policy-based agent and to create new adventures. One crazy research project is Playing GridWorld, where Pytorch unchained its capabilities with renowned RL algorithms like policy gradient and simplified Actor-Critic method. Summing It Up There you have it, the top tools and libraries for reinforcement learning. The list doesn't end here, as there is a lot of work happening in developing platforms and libraries for scaling reinforcement learning. Frameworks like RL4J, RLlib are already in development and very soon would be full-fledged available for developers to simulate their models in their preferred coding language.
Read more
  • 0
  • 0
  • 18490