Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-oracle-releases-graphpipe-standardizes-machine-learning-model-deployment

16 Aug 2018

3 min read

Oracle releases GraphPipe: An open source tool that standardizes machine learning model deployment

16 Aug 2018

Oracle has released GraphPipe, an open source tool to simplify and standardize deployment of Machine Learning (ML) models easier. Development of ML models is difficult, but deploying the model for the customers to use is equally difficult. There are constant improvements in the development model but, people often don’t think about deployment. This is were GraphPipe comes into the picture! What are the key challenges GraphPipe aims to solve? No standard way to serve APIs: The lack of standard for model serving APIs limits you to work with whatever the framework gives you. Generally, business application will have an auto generated application just to talk to your deployed model. The deployment situation becomes more difficult when you are using multiple frameworks. You’ll have to write custom code to create ensembles of models from multiple frameworks. Building model server is complicated: Out-of-the-box solutions for deployment are very few because deployment gets less attention than training. Existing solution not efficient enough: Many of the currently used solutions don't focus on performance, so for certain use cases they fall short. Here’s how the current situation looks like: Source: GraphPipe’s User Guide How GraphPipe solves these problems? GraphPipe uses flatbuffers as the message format for a predict request. Flatbuffers are like google protocol buffers, with an added benefit of avoiding a memory copy during the deserialization step. A request message provided by the flatbuffer definition includes: Input tensors Input names Output names The request message is then accepted by the GraphPipe remote model and returns one tensor per requested output name, along with metadata about the types and shapes of the inputs and outputs it supports. Here’s how the deployment situation will look like with the use of GraphPipe: Source: GraphPipe’s User Guide What are the features it comes with? Provides a minimalist machine learning transport specification based on flatbuffers, which is an efficient cross platform serialization library for C++, C#, C, Go, Java, JavaScript, Lobster, Lua, TypeScript, PHP, and Python. Comes with simplified implementations of clients and servers that make deploying and querying machine learning models from any framework considerably effortless. It's efficient servers can serve models built in TensorFlow, PyTorch, mxnet, CNTK, or Caffe2. Provides efficient client implementations in Go, Python, and Java. Includes guidelines for serving models consistently according to the flatbuffer definitions. You can read plenty of documentation and examples at https://oracle.github.io/graphpipe. The GraphPipe flatbuffer spec can be found on Oracle's GitHub along with servers that implement the spec for Python and Go. Oracle reveals issues in Object Serialization. Plans to drop it from core Java. What Google, RedHat, Oracle, and others announced at KubeCon + CloudNativeCon 2018 Why Oracle is losing the Database Race

0
0
3004

article-image-whats-in-the-upcoming-sqlite-3-25-0-release-windows-functions-better-query-optimizer-and-more

Savia Lobo

16 Aug 2018

3 min read

What’s in the upcoming SQLite 3.25.0 release: windows functions, better query optimizer and more

Savia Lobo

16 Aug 2018

3 min read

The SQLite community has released a sneak peek to what users can expect in the upcoming version, SQLite 3.25.0, which could be released next month. The SQLite 3.25.0 draft which the community published on its official website yesterday includes a list of some upcoming features and bug fixes. The primary update being support for windows functions and improvements in the query optimizer. Expectations from SQLite 3.25.0 Support for windows functions will be added This release will bring in an added window function support. Prior to this, SQLite developers used the PostgreSQL window function documentation as their primary reference for how window functions ought to behave. The community has carried out several test cases against PostgreSQL to ensure that window functions operate the same way in both SQLite and PostgreSQL. Improvements in the Query Optimizer Unnecessary loads of columns in an aggregate query are avoided. These columns are neither within an aggregate function nor a part of the GROUP BY clause. The IN-early-out optimization: When doing a look-up on a multi-column index, an IN operator is used on a column other than the left-most column. If no rows match against the first IN value, one should check the existence of rows that match the columns to the right before continuing with the next IN value. Transitive property can be used to propagate constant values within the WHERE clause. For example, convert "a=99 AND b=a" into "a=99 AND b=99". Separate mutex on every inode In the SQLite 3.25.0, one can use a separate mutex on every inode in the Unix VFS, rather than a single mutex shared among them all. This results in better concurrency in multi-threaded environments. Improvised PRAGMA integrity_check command The PRAGMA integrity_check command will be enhanced for improved detection of problems on the page freelist. The integrity_check pragma looks for out-of-order records, missing pages, malformed records, missing index entries, and UNIQUE, CHECK, and NOT NULL constraint errors. .dump command infinity output This version will showcase the infinity output as 1e999 in the ".dump" command of the command-line shell. Bug fixes in the upcoming version SQLite 3.25.0 A fix for ticket 79cad5e4b2e219dd197242e9e On an UPSERT when the order of constraint checks is rearranged, ensure that the affinity transformations on the inserted content occur before any of the constraint checks. Fix for ticket 7be932dfa60a8a6b3b26bcf76 Avoid using a prepared statement for ".stats on" command of the CLI after it has been closed by the ".eqp full" logicc.. To know more about SQLite Release 3.25.0 visit its release log draft. How to use SQLite with Ionic to store data? Introduction to SQL and SQLite NHibernate 3.0: Testing Using NHibernate Profiler and SQLite

0
0
2285

article-image-tensorflow-2-0-is-coming-heres-we-can-expect

Richard Gall

15 Aug 2018

3 min read

TensorFlow 2.0 is coming. Here's what we can expect.

Richard Gall

15 Aug 2018

3 min read

The last couple of months have seen TensorFlow releases coming thick and fast. Clearly, the Google team are working hard to ship new updates for a framework that seems to be defining deep learning as we know it. But TensorFlow 2.0 remains on the horizon - and that, really, is the release we've all been waiting for. Amid speculation and debate, we now have the first inkling of what we can expect thanks to a post by Google Brain engineer Martin Wicke. In a somewhat unassuming post on Google Groups, Wicke said that work was underway on TensorFlow 2.0, with a preview version expected later this year. The big changes that the team are working towards include: Making TensorFlow easier to learn and use by putting eager execution (TensorFlow's programming environment) at the center of the new release Support for more platforms and languages Removing deprecated APIs How you can support the TensorFlow 2.0 design process Wicke writes that TensorFlow 2.0 still needs to go through a public review process. To do this, the project will be running a number of public design reviews that run through the proposed changes in detail and give users the opportunity to give feedback and communicate their views. What TensorFlow 2.0 means for the TensorFlow project Once TensorFlow 2.0 is released migration will be essential - Wicke explains that "We do not anticipate any further feature development on TensorFlow 1.x once a final version of TensorFlow 2.0 is released" and that the project "will continue to issue security patches for the last TensorFlow 1.x release for one year after TensorFlow 2.0’s release date." The end of tf.contrib? TensorFlow 2.0 will bring an end (of sorts) to tf.contrib, the repository where code contributed to TensorFlow sits, waiting to be merged. "TensorFlow’s contrib module has grown beyond what can be maintained and supported in a single repository." Wicke writes. "Larger projects are better maintained separately, while we will incubate smaller extensions along with the main TensorFlow code." However, Wicke promises that TensorFlow will help the owners of contributed code to migrate appropriately. Some modules could be integrated into the core project, others moved into another, separate repository, and others simply removed entirely. If you have any questions about TensorFlow 2.0 you can get in touch with the team directly by emailing [email protected]. TensorFlow has also set up a mailing list for anyone interested in regular updates - simply subscribe to [email protected]. Read next Why Twitter (finally!) migrated to Tensorflow Python, Tensorflow, Excel and more – Data professionals reveal their top tools Can a production ready Pytorch 1.0 give TensorFlow a tough time?

0
0
3591

article-image-statistical-model-compression-to-help-reduce-footprint-in-alexas-nlu-models-allowing-offline-use

Bhagyashree R

15 Aug 2018

5 min read

Statistical model compression to help reduce footprint in Alexa’s NLU models, allowing offline use

Bhagyashree R

15 Aug 2018

5 min read

With the release of Alexa Auto Software Development Kit (SDK) the integration of Alexa into in-vehicle infotainment systems will become easier for the developers. Currently, the SDK assumes that automotive systems will have access to the cloud all the time, but it would be better if Alexa-enabled vehicles have some core functions even when they’re offline. This means that we need to reduce the size of the underlying machine-learning models, so they can fit in local memory. In this year’s Interspeech, Grant Strimel with his colleagues will present a new technique for compressing machine-learning models that could reduce their memory footprints by 94% while leaving their performance almost unchanged. What is the aim behind statistical model compression? Amazon Alexa and Google Assistant support skills built by external developers. These skills have Natural Language Understanding (NLU) models that extend the functionality of the main NLU models. Because there are numerous skills, their NLU models are loaded on demand only when they are needed to process a request. If the skill’s NLU model is large, loading them into memory adds significant latency to utterance recognition. To provide quick NLU response and good customer experience, small-footprint NLU models are important. Also, cloud-based NLU is unsuitable for local system deployment without appropriate compression because it has large memory footprints. To solve this, Grant and his colleagues have designed an algorithm which take large statistical NLU models and produce models which are equally predictive but have smaller memory footprint. What are the techniques introduced? Alexa’s NLU systems use several different types of machine learning models, but they all share some common traits. One common trait that Alexa’s NLU systems share is, extracting features (strings of text with particular predictive value) from input utterances. Another common trait is that each feature has a set of associated weights, which determines how large a role it should play in different types of computation. These weights of millions of features are stored, making the ML models memory intensive. Two techniques are proposed to perform statistical model compression: Quantization The first technique for compressing an ML model is to quantize the feature weights: Take the total range of weights Divide the range into equal intervals Finally, round each weight off to the nearest boundary value for its interval Currently, 256 intervals are used, allowing the representation of every weight in the model with a single byte of data, with minimal effect on the network’s accuracy. The additional benefit is that the low weights are discarded because they are rounded off to zero. Perfect Hashing In this technique we use hashing to perform mapping a particular feature to the memory locations of the corresponding weight. For example, play ‘Yesterday,’ by the Beatles,” we want our system to pull up the weights associated with the feature “the Beatles” — not the weights associated with “Adele”, “Elton John”, and the rest. To perform such mappings we will use hashing. A hash function is a function, which maps arbitrary sized inputs and maps them with fixed sized outputs and also have no predictable relationship to the inputs. One side effect of hashing is that it sometimes produces collisions, which means, two inputs that hash to the same output. The collision problem is addressed through perfect hashing: Source: Amazon Alexa We first assume that we have access to a family of conventional hash functions, all of which produce random hashes. For this we use the hash function MurmurHash, which can be seeded with a succession of different values. We represent the input strings to be hashed by N. We begin with an array of N 0’s and apply our first hash function called Hash1. We change a 0 in the array to a 1 for every string that yields a unique hash value. Next, a new array of 0’s is built for only the strings that yielded collisions under Hash1. We apply a different hash function to those strings. Similar to step 2, we toggle the 0’s to collision-free hashes. This process is repeated until every input string has a corresponding 1 in some array. All these arrays are then combined into one giant array. The position of a 1 in this giant array indicates the unique memory location assigned to the corresponding string. When the trained input receives an unseen input string, it will apply Hash1 to each of the input’s substrings and, if it finds a 1 in the first array, it goes to the associated address. If it finds a 0, it applies Hash2 and repeats the process. This process does causes a slight performance problem, but it’s a penalty that’s only paid when a collision occurs. To know more about the statistical model compression you can visit the Amazon Alexa page and also check out the technical paper by the researchers. Amazon Alexa and AWS helping NASA improve their efficiency Amazon Echo vs Google Home: Next-gen IoT war Diffractive Deep Neural Network (D2NN): UCLA-developed AI device can identify objects at the speed of light

0
0
1440

article-image-ibm-files-patent-for-managing-a-database-management-system-using-a-blockchain-database

Melisha Dsouza

15 Aug 2018

3 min read

IBM Files Patent for "Managing a Database Management System using a Blockchain Database"

Melisha Dsouza

15 Aug 2018

3 min read

IBM has added yet another to achievement to its kitty by receiving a patent application grant for "Managing a Database Management System using a Blockchain Database." This patent was solely for the purpose of developing a database tampering detection system [DT-DS]. It’s no secret that IBM deals with huge amounts of data which are highly confidential and sensitive in nature due to the various services it provides its consumers. It is for this reason that the patent was filed on 22nd December, 2017 (As per the records of U.S. Patent and Trademark Office (USPTO) ). What the Patent states The proposed system would detect data tampering of any kind stored in a central database. A partial copy of the same data is stored on the blockchain database. The patent states that, "Aspects of the disclosure include a method, system, and computer program product for managing a database management system (DBMS)," The patent further adds that, "A central database to include a set of central data may be structured with respect to the DBMS. A blockchain database which is linked with the central database may be constructed with respect to the DBMS. A set of blockchain data may be established in the blockchain database corresponding to the set of central data of the central database." The DBMS should receive an access request to enable the system to be accessed by the sender. Once the DBMS receives the access request, both the central database and the blockchain database would be maintained simultaneously. This initiative on IBM’s part to leverage the blockchain technology depicts its growing interest in blockchains’ potentialities. The grassroots began with its contribution to Fabric, a permission blockchain framework aimed at integration projects. IBM already offers IBM D2. This supports database management, operational database, data warehouse, data lake, and fast data. A step to append blockchain in existing systems would definitely assist IBM to resolve issues related to data inconsistencies and security loopholes. The internet is abuzz with the wonders of the blockchain technology and IBM seems to completely concur with the same. IBM has always trusted the blockchain to bring about a new generation of transactional applications that strengthen the trust, accountability, and transparency. We couldn’t agree more! Want to know more about the patent? Head over to cnn.com for a more in-depth coverage. Four IBM facial recognition patents in 2018, we found intriguing Four interesting Amazon patents in 2018 that use machine learning, AR, and robotics IBM’s DeepLocker: The Artificial Intelligence powered sneaky new breed of Malware

0
0
3087

article-image-vitalik-buterin-new-consensus-algorithm-to-make-ethereum-99-fault-tolerant

Prasad Ramesh

14 Aug 2018

3 min read

Vitalik Buterin's new consensus algorithm to make Ethereum 99% fault tolerant

Prasad Ramesh

14 Aug 2018

3 min read

Vitalik Buterin, co-founder of blockchain platform Ethereum, posted a paper on a new kind of consensus algorithm last week. This algorithm requires only 1% of the nodes to be honest for the network to be secure. This results in 99% fault tolerance, increasing the security greatly. How can Ethereum achieve 99% fault tolerance? The new consensus algorithm introduces a new kind of validator nodes—“independent observer nodes”. These observer nodes observe the chain in real-time to filter any inconsistencies in the network. This original idea was published in 1982 by Turing award winning computer scientist, Leslie Lamport. The new algorithm to be implemented in Ethereum is Vitalik’s attempt to reformulate Leslie’s algorithm in a simplified form. The new algorithm adds one’s own signature as a bump on the timeout of a message. This ability guarantees that an honest node saw the message on time. This can ensure that every other node sees the message on time as well. The definition of on time increments more than the network latency with every added signature. On implementation, the algorithm will render 51% attacks useless. What are the benefits? Ethereum developer, Conrad Barski states that there are several benefits of introducing this new protocol: “Usually, all blockchain consensus[algorithms] care about is what the validators (i.e. miners) of a chain do. Vitalik is proposing that if an independent observer of the network traffic (i.e. just the blockchain client a user is running, not a miner/validator) watches what’s happening in real time and pays attention to when messages appear, they can detect ‘foul play’ by miners performing a 51% attack and this can provide additional safety guarantees that can protect against such an attack. This is somewhat similar to how merchants are already checking for 51% attacks themselves, only Vitalik’s version is more large-scale and complex.” As of now, the Ethereum blockchain works on proof-of-work (PoW). This new protocol is expected to be set in action when Ethereum is shifted to proof-of-stake (PoS). The exact timeline for the PoS switch is not known, it might happen next year. In PoS, the more the number of coins/tokens a user has, the greater is his/her mining capacity. PoS will reduce the Ethereum reward by 80%; 0.6 ETH from the existing 3 ETH per block. With the implementation of this consensus network, Ethereum might become the most secure blockchain network in the public domain. You can read Vitalik’s paper for a detailed explanation of the implementation. Microsoft Azure’s new governance DApp: An enterprise blockchain without mining How to set up an Ethereum development environment [Tutorial] Everything you need to know about Ethereum

0
0
3580

article-image-deepmind-artificial-intelligence-can-spot-over-50-sight-threatening-eye-diseases-with-expert-accuracy

Sugandha Lahoti

14 Aug 2018

3 min read

DeepMind Artificial Intelligence can spot over 50 sight-threatening eye diseases with expert accuracy

Sugandha Lahoti

14 Aug 2018

3 min read

DeepMind Health division has achieved a major milestone by developing an artificial intelligence system that can detect over 50 sight-threatening eye diseases with the accuracy of an expert doctor. This system can quickly interpret eye scans and correctly recommend how patients should be referred for treatment. It is the result of a collaboration with Moorfields Eye Hospital; the partnership was announced in 2016 to jointly address some of the current eye conditions. How Artificial Intelligence beats current OCT scanners Currently, eyecare doctors use optical coherence tomography (OCT) scans to help diagnose eye conditions. OCT scans are often hard to read and require time to be interpreted by experts. The time required can cause long delays between scan and treatment, which can be troublesome if someone needs urgent care. Deepmind’s AI system can automatically detect the features of eye diseases within seconds. It can also prioritize patients by recommending whether they should be referred for treatment urgently. System architecture The system uses an easily interpretable representation sandwiched between two different neural networks. The first neural network, known as the segmentation network, analyses the OCT scan and provides a map of the different types of eye tissue and the features of the disease it observes. The second network, known as the classification network, analyses the map to present eyecare professionals with diagnoses and a referral recommendation. The system expresses the referral recommendation as a percentage, allowing clinicians to assess the system’s confidence. AI-powered dataset DeepMind has also developed one of the best AI-ready databases for eye research in the world. The original dataset held by Moorfields was suitable for clinical use, but not for machine learning research. The improved database is a non-commercial public asset owned by Moorfield. It is currently being used by hospital researchers for nine separate studies into a wide range of conditions. DeepMind’s initial research is yet to turn into a usable product and then undergo rigorous clinical trials and regulatory approval before being used in practice. Once validated for general use, the system would be used for free across all 30 of Moorfields’ UK hospitals and community clinics, for an initial period of five years. You can read more about the announcement on the DeepMind Health blog. You can also read the paper on Nature Medicine. Reinforcement learning optimizes brain cancer treatment to improve patient quality of life. AI beats Chinese doctors in a tumor diagnosis competition. 23andMe shares 5mn client genetic data with GSK for drug target discovery

0
0
2471

article-image-ml-net-0-4-is-here-with-support-for-symsgd-f-and-word-embeddings-transform

Natasha Mathur

13 Aug 2018

2 min read

ML.NET 0.4 is here with support for SymSGD, F#, and word embeddings transform!A

Natasha Mathur

13 Aug 2018

2 min read

After the release of ML.NET 0.1 at //Build 2018, back in May, the Microsoft team announced ML.NET 0.4 last week. The latest release includes features such as Word Embedding Transform, SymSGD Learner, and improvements to F# API and samples for ML.NET. ML.NET is a cross-platform, open source machine learning framework for .NET developers. Let’s have a quick look at the major features in ML.NET 0.4. Word Embeddings Transform for Text Scenarios Word embeddings is a method which allows mapping words to numeric vectors for capturing the meaning of the words. This is used for visualization or model training. With ML.NET 0.4, the word embedding transform is added to ML.NET allowing you to use pre-trained or existing word embedding models in pipelines. There are several different pretrained models such as GloVe, fastText, and SSWE which are available. Adding this transform method along with the existing transforms help improve the model’s metrics. SymSGD Learner for Binary Classification SymSGD is now available in ML.NET 0.4 for binary classification. SymSGD is a parallel SGD algorithm which retains the sequential semantics of SGD. It offers a much better performance based on multithreading. It is fast, scales well on multiple cores, and achieves the same accuracy as sequential SGD. SymSGD can be applied to any linear learner where update rule is linear like a binary classification or a linear regression. This is how you can add a SymSGD Binary Classifier learner to the pipeline: pipeline.Add(new SymSgdBinaryClassifier() { NumberOfThreads = 1}); Even though there is no multithreading enabled in SymSGD at the moment, it can be helpful in cases where you want to try many different learners and limit each of them to a single thread. Improvements to F# API and samples for ML.NET ML.NET did not provide support for F# records. With ML.NET 0.4 release, you can use property-based row classes in F#. To get more coverage, check out the official Microsoft blog. Microsoft Open Sources ML.NET, a cross-platform machine learning framework Create machine learning pipelines using unsupervised AutoML [Tutorial Top AutoML libraries for building your ML pipelines

0
0
1932

article-image-bbc-experiments-with-speed-reading-on-smart-watches

Prasad Ramesh

13 Aug 2018

3 min read

BBC experiments with speed reading on smart watches

Prasad Ramesh

13 Aug 2018

3 min read

A surfeit of speed reading apps were released a few years back for computers and smartphones. Now, the BBC has teamed up with a start-up called Spritz to experiment with speed reading on smart watches. Spritz was founded in 2011 and displays one word at a time on your screen. The average human reading speed is 200 wpm (words per minute). With the plethora of information being in circulation nowadays, it is a task to read and keep up. To top this, people read most of their news on smartphones which is already smaller than what our evolution is accustomed to; newspapers and books. What’s the fuss with speed reading? Speed reading displays one single word at a time on the screen. One of the letters on the screen is highlighted and believed to be the focus point of that word. The word present on screen changes with variable speeds. As a result, the time taken to move your eyes across words is eliminated which increases the reading speed. The BBC is looking to get this idea in action with smart watches. Imagine reading a whole news article much faster than your average speed without even taking out your smartphone. The BBC Blog states: “Sherlock Holmes and Spock from Star Trek might be fictional characters, but the idea of having a superhuman ability to process information quickly is an exciting one.” BBC’s take on speed reading Cyrus Saihan, Head of Digital Partnerships, BBC states in the Blog: “We are reading more and more on mobile phones, but the screen sizes and text sizes of mobiles are smaller than what we have traditionally been used to with books and magazines. Technologies such as this therefore have the potential to make it much easier for us to read on mobile phones. This way of reading could also possibly be useful on devices such as smart watches, which have even smaller screen sizes.” The demonstration videos from the Blog display the method being used on BBC articles at 300, 400, and 800 wpm. It takes some time to get used to but once you do, you can read at much faster speeds. It gets relatively easy after reading a couple articles in this method. This is still in the internal testing phase and BBC isn’t planning on rolling it out anytime soon. It sure is an interesting idea that can potentially save us time on reading every day. For more information and examples, head on to the BBC website. Read next Using your smart watch to control networked LEDs The Risk of Wearables – How Secure is Your Smartwatch

0
0
2364

article-image-android-9-pies-smart-linkify-how-androids-new-machine-learning-based-feature-works

Natasha Mathur

13 Aug 2018

4 min read

Android 9 pie’s Smart Linkify: How Android’s new machine learning based feature works

Natasha Mathur

13 Aug 2018

4 min read

Last week, Google launched Android 9 pie, the latest machine learning based Android operating system after Android Oreo. One of the features in Android 9 pie, named, smart linkify, a new version of the existing Android Linkify API adds clickable links on identifying entities such as dates, flights, addresses, etc, in content or text input via TextClassifier API. Smart linkify Smart linkify API is trained in TensorFlow which uses a small feedforward neural network. This enables it to figure out whether or not a series of numbers or words is a phone number or address, just like Android Oreo’s Smart Text Selection feature. But, what’s different with this new feature is that instead of just making it easier to highlight and copy the associated text manually, it adds a relevant actionable link allowing users to immediately take action with a just a click. How does smart linkify work? Smart linkify follows three basic steps: Locating entities in an input text Processing the input text Training the network Let’s have a quick look at each of the above-mentioned steps. Finding entities in an input text The underlying process for detecting entities within texts is not an easy task. It poses many problems as people follow different ways to write addresses and phone numbers. There can also be confusion regarding the type of entity. For instance, “Confirmation number: 857-555-3556” can look like a phone number even though it’s not. So, to fix this problem, an inference algorithm with two small feedforward neural networks was designed by the Android team. The two feedforward neural networks look for context surrounding words and perform all kinds of entity chunking beyond just addresses and phone numbers. The first input text is split into words and then all the possible combination of entries, named “candidates” are analyzed. After analyzing the candidates, a score is assigned on a scale of validity. Any overlapping candidates are removed, favoring the ones with the higher score. After this, the second neural network takes over and assigns a type of entity, as either a phone number, address or in some cases, a non-entity. Smart Linkify finding entities in a string of text Processing the input text After the entities have been located in the text, it’s time to process it. The neural networks determine whether the given entity candidate in the input text is valid or not. After knowing the context surrounding the entity, the network classifies it. With the help of machine learning, the input text is split into several parts and each is fed to the network separately. Smart linkify processing the input text Google uses character n-grams and a binary capitalization feature to “represent the individual words as real vectors suitable as an input of the neural network”. Character n-grams represent the word as a set of all character subsequences of a certain length. Google used lengths 1 to 5. The binary feature indicates whether the word starts with a capital letter. This is important as the capitalization in postal addresses is quite distinct, thereby, helping the networks to differentiate. Training the network Google has a training algorithm in place for datasets. It involves collecting lists of addresses, phone numbers and named entities (such as product, place, business names, etc). These are then used to synthesize the data for training neural networks. “We take the entities as they are and generate random textual contexts around them (from the list of random words on Web). Additionally, we add phrases like “Confirmation number:” or “ID:” to the negative training data for phone numbers, to teach the network to suppress phone number matches in these contexts”, says the Google team. There are a couple of other techniques that Google used for training the network such as: Quantizing the embedding matrix to 8-bit integers Sharing embedding matrices between the selection and classification networks. Varying the size of the context before/after the entities Creating artificial negative examples out of the positive ones for classification network. Currently, Smart Linkify offers support for 16 languages and plans to support more languages in the future. Google still relies on traditional techniques using standard regular expressions for flight numbers, date, times, IBAN, etc, but it plans to include ML models for these in the future. For more coverage on smart linkify, be sure to check out the official Google AI blog. All new Android apps on Google Play must target API Level 26 (Android Oreo) or higher Android P Beta 4 is here, stable Android P expected in the coming weeks! Is Google planning to replace Android with Project Fuchsia?

0
0
3360

article-image-googles-censored-chinese-search-engine-is-a-stupid-stupid-move-says-former-exec-lokman-tsui

Richard Gall

13 Aug 2018

3 min read

Google's censored Chinese search engine is a stupid, stupid move, says former exec Lokman Tsui

Richard Gall

13 Aug 2018

3 min read

Google's mission is famously 'Don't be evil', but with its latest venture - a pre-censored search engine that complies with Chinese regulations - it looks like it could be compromising on those values. And one former senior executive, Lokman Tsui, has spoken out, calling it a "stupid, stupid move." News of the search engine, named Project Dragonfly, was first reported at the start of August. Some information about the project was leaked, leading considerable anger from Google employees. One employee told The Intercept "our internal meme site and Google Plus are full of talk, and people are a.n.g.r.y." However, Tsui's intervention is notable because of his position as 'Head of Freedom of Expression' for Asia and the Pacific between 2011 and 2014. Tsui contrasts the new project with Google shutting down its previous Chinese search engine over concerns over significant cyber attacks from within the country. Speaking to the Intercept, he said "Google made a grand statement in 2010. The message was that ‘We care about human rights and we care about free expression, we are the champions of this, we have responsibility, we don’t want to self-censor any more." For Tsui, returning to China with a new search product has a real symbolic impact in terms of Google legitimizing and accepting the Chinese government's record of online censorship. What's also important, according to Tsui, is that the situation in China has deteriorated since 2010. The move would, he says, "be a moral victory for Beijing... I can’t see a way to operate Google search in China without violating widely held international human rights standards." Tsui claims Google will lose employees over China issue Tsui believes that Google could lose employees over the move. In compromising its core principles - of which "don't be evil" is just one - it could lose "the hearts and minds of people working for it." However, it's not just Google employees - past and present - who are concerned about Google's project. A number of U.S. senators have raised concerns, along with human rights organizations, including Amnesty International, Human Rights Watch and Reporters Without Borders. Google's move could, these groups argue, lead to bigger issues than just censorship. This is because Google's servers would be located on the Chinese mainland, making them potentially accessible to the Chinese government, which could use data from servers to closely monitor activities of anyone who voices criticism. Read next Google’s new facial recognition patent uses your social network to identify you! Google’s second innings in China: Exploring cloud partnerships with Tencent and others Google Cloud Next: Fei-Fei Li reveals new AI tools for developers

0
0
2079

article-image-jpeg-committee-wants-to-apply-blockchain-to-image-sharing

Prasad Ramesh

13 Aug 2018

3 min read

JPEG committee wants to apply blockchain to image sharing

Prasad Ramesh

13 Aug 2018

3 min read

The Joint Photographic Experts Group (JPEG) had their 78th quarterly meeting earlier this year between January and February. There was a press release afterward mentioning blockchain relating to security, privacy but mainly DRM. There wasn’t much coverage on this but this can have serious implications. JPEG think that they can implement Digital Rights Management (DRM) for JPEG images. This involves automated copy protection and access control with the help of blockchain. This might actually make DRM for images work, which as of now practically doesn’t. The press release contains this text: “JPEG explores blockchain and distributed ledger technologies During the 78th JPEG meeting in Rio de Janeiro, the JPEG committee organized a special session on blockchain and distributed ledger technologies and their impact on JPEG standards. As a result, the committee decided to explore use cases and standardization needs related to blockchain technology in a multimedia context. Use cases will be explored in relation to the recently launched JPEG Privacy and Security, as well as in the broader landscape of imaging and multimedia applications. To that end, the committee created an ad hoc group with the aim to gather input from experts to define these use cases and to explore eventual needs and advantages to support a standardization effort focused on imaging and multimedia applications. To get involved in the discussion, interested parties can register to the ad hoc group’s mailing list.” Then, after six months of collaboration, the ad-hoc group produced a white paper. In the 80th conference’s press release, July 2018, they stated: “Fake news, copyright violation, media forensics, privacy and security are emerging challenges for digital media. JPEG has determined that blockchain technology has great potential as a technology component to address these challenges in transparent and trustable media transactions.” The white paper lists some challenges and opportunities in the media industry such as access and distribution, global distribution, combating piracy, and others. JPEG isn’t just working on image compression standards, they’ve also been exploring ways for per-image access control. But in case of images, the image can just be screenshotted, or a picture can take be taken at any point. In the general sense DRM protected content is perceived to be of bad quality. After six months of working on this, the white paper states “A formal call for proposals will be issued if there are enough interests and requirements of a standard or protocol are identified.” JPEG plans a free public workshop at its 81st meeting in Vancouver to be held in October. You can read a more detailed coverage for more information. Read next LedgerConnect: A blockchain app store by IBM, CLS, Barclays, Citi and 7 other banks in the trials Google Cloud Launches Blockchain Toolkit to help developers build apps easily Packt Supports Day Against DRM 2017

0
0
2239

article-image-reinforcement-learning-model-optimizes-brain-cancer-treatment-reduces-dosing-cycles-and-improves-patient-quality-of-life

Melisha Dsouza

13 Aug 2018

6 min read

Reinforcement learning model optimizes brain cancer treatment, reduces dosing cycles and improves patient quality of life

Melisha Dsouza

13 Aug 2018

6 min read

Researchers at MIT have come up with an intriguing approach to combat ‘Glioblastoma’- a malignant tumor of the brain/spinal cord- using machine learning techniques. By reducing the toxic chemotherapy and radiotherapy that is involved in treating this cancer, the researchers aim to improve the quality of life for patients, while also reducing the various side effects caused by the former using Reinforcement learning techniques. While the prognosis for adults is no more than 5 years, medical professionals try to shrink the tumor by administering drug doses in safe amounts. However, the pharmaceuticals are so strong that patients end up suffering from their side effects. Enter Machine Learning and Artificial Intelligence to save the day. While it's no hidden truth that machine learning is being incorporated into healthcare on a huge scale, the MIT researchers have taken this to the next level. Using Reinforcement Learning as the Big Idea to train the model Media Lab researcher Gregory Yauney will be presenting a paper next week at the 2018 Machine Learning for Healthcare conference at Stanford University. This paper details how the MIT Media Lab researchers have come up with a model that could make dosing cycles less toxic but still effective. Incorporating a “self-learning” machine-learning technique, the model studies treatment regimens being used presently, and iteratively changes the measurements. In the end, it finds an ideal treatment design suited to the patient. This has proven to reduce the tumor sizes to a degree almost identical to that of original medical regimens. The model simulated trials of 50 patients and designed treatments that either reduced dosages to twice a year or skipped them all together. This was done keeping in mind that the model has to shrink the size of the tumor but at the same time ensuring that reduced dosages did not lead to harmful side effects. The model is designed to used reinforced learning (RL)- that comprises artificially intelligent “agents” that complete “actions” in an unpredictable, complex environment to reach the desired outcome. The model’s agent goes through traditionally administered regimens. It uses a combination of the drugs temozolomide (TMZ) and procarbazine, lomustine, and vincristine (PVC), administered to the patients over weeks or months. These regimens are based on protocols that have been used clinically for ages and are based on both, animal testing and various clinical tests and scenarios. The protocols are then used by Oncologists to predict how many doses the patients have to be administered based on weight. As the model explores the regimen, it decides on one of the two actions- Initiate a dose Withhold a dose If it does administer a dose, it has to make the decision if the patient needs the entire dose, or only a portion. After a decision is taken, the model checks with another clinical model to see if the tumor’s size has changed or if it’s still the same. If the tumor’s size has reduced, the model receives a reward else it is penalised. Rewards and penalties essentially are positive and negative numbers, say +1 or – 1. The researchers also had to ensure that the model does not over-dose or give out the maximum number of doses to reduce the mean diameter of the tumor. Therefore, the model is programmed in such a way that whenever it chooses to administer all full doses, it gets penalized. Thus the model is forced to administer fewer, smaller doses. Patik Shah, a principal investigator at the Media Lab who supervised this research, further stresses on the fact that, as compared to traditional RL models that work toward a single outcome, such as winning a game, and take any and all actions that maximize that outcome, the model implemented by the MIT researchers is a “unorthodox RL model that weighs potential negative consequences of actions (doses) against an outcome (tumor reduction)” The model is strikingly wired to find a dose that does not necessarily maximize tumor reduction, but also establishes a perfect balance between maximum tumor reduction and low toxicity for the patients. The training and testing methodology used The model was trained on 50 simulated patients - randomly selected from a large database of glioblastoma patients. These patients had previously undergone traditional treatments. The model conducted about 20,000 trial-and-error test runs for every patient. Once training was complete, the model understood the parameters for optimal regimens. The model was then tested on 50 new simulated patients and used the above-learned parameters to formulate new regimens based on various constraints that the researchers provided. The models treatment regimen was compared to the results of a conventional regimen using both TMZ and PVC. The outcome obtained was practically similar to the results obtained after the human counterparts administered treatments. The model was also able to treat each patient individually, as well as in a single cohort, and achieved similar results (medical data for each patient was available to the researchers). In short, the model has helped to generate precision medicine-based treatments by conducting one-person trials using unorthodox machine-learning architectures. Nicholas J. Schork, a professor and director of human biology at the J. Craig Venter Institute, and an expert in clinical trial design explains “Humans don’t have the in-depth perception that a machine looking at tons of data has, so the human process is slow, tedious, and inexact,” he further adds “Here, you’re just letting a computer look for patterns in the data, which would take forever for a human to sift through, and use those patterns to find optimal doses.” To sum it all up, Machine learning is again proving to be an essential asset in the medical field- helping both researchers as well as patients to view medical treatments in an all new perspective. If you would like to know more about the progress done so far, head over to MIIT news. 23andMe shares 5mn client genetic data with GSK for drug target discovery Machine learning for genomics is bridging the gap between research and clinical trials 6 use cases of Machine Learning in Healthcare

0
0
2594

article-image-tensorflow-1-10-arrives-cmake-out-bazel-in

Pravin Dhandre

10 Aug 2018

2 min read

TensorFlow 1.10 arrives, cmake out, Bazel in

Pravin Dhandre

10 Aug 2018

2 min read

Open source contributors from TensorFlow Community has successfully released TensorFlow 1.10 loaded with numerous features, multiple bug fixes and improvements. Let’s have a look at the key improvements added to the TensorFlow framework. New Features and Improvements: Runtime tf.lite now supports complex64 tf.data gets Bigtable integration tf.estimator.train_and_evaluate enhanced with improved local run behaviour Added restriction support in RunConfig for speeding up training and clean shutdown assurance. Moved Distributions and Bijectors from tf.contrib.distributions to Tensorflow Probability (TFP) Added new endpoints like tf.debugging, tf.dtypes, tf.image, tf.io, tf.linalg, tf.manip, tf.math, tf.quantization, tf.strings Breaking Changes: tf.contrib.distributions deprecation in the process and to be removed by the end of year Dropping off official support for cmake Support to Bazel from TensorFlow 1.11 onwards Bug Fixes and Miscellaneous Changes: tf.contrib.data.group_by_reducer() is now available via the public API Added drop_remainder argument to tf.data.Dataset.batch() and tf.data.Dataset.padded_batch() Custom savers for Estimator included in EstimatorSpec useful during export Supports sparse_combiner in canned Linear Estimators. Added batch normalization to DNNClassifier, DNNRegressor, and DNNEstimator. Added ranking support and center bias option for boosted trees. You can visit TensorFlow official release page on Github to review the full release notes on the complete list of added features and changes. Why Twitter (finally!) migrated to Tensorflow Build and train an RNN chatbot using TensorFlow Implementing feedforward networks with TensorFlow

0
0
3522

article-image-apache-flink-version-1-6-0-released

Savia Lobo

10 Aug 2018

4 min read

Apache Flink version 1.6.0 released!

Savia Lobo

10 Aug 2018

4 min read

The Apache Flink community released its 1.6.0 version yesterday. Apache Flink 1.6.0 release is the seventh major release in the 1.x.y series. This Flink version is API-compatible with the previous 1.x.y releases for APIs annotated with the @Public annotation. Apache Flink 1.6.0 enables users to seamlessly run fast data processing and also build data-driven, data-intensive applications effortlessly. Features and Improvements in Apache Flink 1.6.0 In this version, the Flink community has added a Jepsen based test suite (FLINK-9004). This suite validates the behavior of Flink’s distributed cluster components under real-world faults. It is the community’s first step towards a higher test coverage for Flink’s fault tolerance mechanisms. The other major features include, An improved State Support for Flink The support for State TTL feature allows one to specify a time-to-live (TTL) for Flink state. One the TTL exceeds, Flink will no longer give access to the respective state values. The expired data is cleaned up on access such that the operator keyed state doesn’t grow infinitely and it won’t be included in subsequent checkpoints. This feature fully complies with new data protection regulations (e.g. GDPR). With the scalable Timers Based on RocksDB, Flink’s timer state can now be stored in RocksDB, allowing the technology to support significantly bigger timer state since it can go out of core/spill to disk. One can perform fast timer deletions with Flink’s improvised internal timer data structure such that the deletion complexity is reduced from O(n) to O(log n). This significantly improves Flink jobs using timers. Extended Deployment Options in Flink 1.6.0 Flink 1.6.0 provides an easy-to-use container entrypoint to bootstrap a job cluster. Combining this entrypoint with a user-code jar creates a self-contained image which automatically executes the contained Flink job when deployed. With a fully RESTified job submission, the Flink client can now send all job-relevant content via a single POST call to the server. This allows a much easier integration with cluster management frameworks and container environments since opening custom ports is no longer necessary. SQL and Table API enhancements The SQL Client CLI now supports the registration of user-defined functions, which improves the CLI’s expressiveness. This is because SQL queries can be enriched with more powerful custom table, aggregate, and scalar functions. The Apache Flink 1.6.0 now supports Batch Queries in SQL Client CLI, INSERT INTO Statements in SQL Client CLI, and SQL Avro. Table sinks can now be defined in a YAML file using string-based properties without having to write a single line of code, in this release. New Kafka Table Sink uses the new unified APIs and supports both JSON and Avro formats. Improved Expressiveness of SQL and Table API where SQL aggregate functions support the DISTINCT keyword. Queries such as COUNT(DISTINCT column) are supported for windowed and non-windowed aggregations. Both SQL and Table API now include more built-in functions such as MD5, SHA1, SHA2, LOG, and UNNEST for multisets. Hardened CEP Library The CEP operator’s internal NFA state is now backed by Flink state supporting larger use cases. More Expressive DataStream Joins Flink 1.6.0 adds support for interval joins in the DataStream API. With this feature it is now possible to join together events from different streams to each other. Intra-Cluster Mutual Authentication Flink’s cluster components now enforce mutual authentication with their peers. This allows only Flink components to talk to each other, making it difficult for malicious actors to impersonate Flink components in order to eavesdrop on the cluster communication. Read more about this release in detail in Apache Flink 1.6.0 release notes. Implementing fault-tolerance in Spark Streaming data processing applications with Apache Kafka How to get started with Azure Stream Analytics and 7 reasons to choose it Performing Vehicle Telemetry job analysis with Azure Stream Analytics tools

0
0
2753

Tech News - Data

Oracle releases GraphPipe: An open source tool that standardizes machine learning model deployment

What’s in the upcoming SQLite 3.25.0 release: windows functions, better query optimizer and more

TensorFlow 2.0 is coming. Here's what we can expect.

Statistical model compression to help reduce footprint in Alexa’s NLU models, allowing offline use

IBM Files Patent for "Managing a Database Management System using a Blockchain Database"

Vitalik Buterin's new consensus algorithm to make Ethereum 99% fault tolerant

DeepMind Artificial Intelligence can spot over 50 sight-threatening eye diseases with expert accuracy

ML.NET 0.4 is here with support for SymSGD, F#, and word embeddings transform!A

BBC experiments with speed reading on smart watches

Android 9 pie’s Smart Linkify: How Android’s new machine learning based feature works

Trending Topics

Google's censored Chinese search engine is a stupid, stupid move, says former exec Lokman Tsui

JPEG committee wants to apply blockchain to image sharing

Reinforcement learning model optimizes brain cancer treatment, reduces dosing cycles and improves patient quality of life

TensorFlow 1.10 arrives, cmake out, Bazel in

Apache Flink version 1.6.0 released!