Modern cyber attacks, such as Botnets and Ransomware, are becoming increasingly dependent on (seemingly) randomly generated domain names. Those domains are used as a way to establish Command & Control with their owners, which is a technique called Domain Fluxing. The recent WannaCry ransomware was famously stopped simply by registering one of those domain names.
The ability to quickly classify a domain name as *safe* or *malicious* is a critical task in the cybersecurity world. It can help alert security experts of any suspicious activity or even block that activity. Such a system will have two requirements:
- Needs to be accurate, you don’t want to block your users from accessing safe websites
- Needs to be scalable, able to handle thousands of transactions per second
There are plenty of approaches to this problem, especially in the academic world (S. Yadav – 2010, J. Munro – 2013). The fine folks at H2O.ai also have an excellent code sample we found here. This blog post will briefly describe how H2O’s implementation works and how you can deploy and scale it on Algorithmia. Read More…
This post is a summary of a talk Diego Oppenheimer recently gave at the 2017 GeekWire Tech Cloud Summit, titled “Building an Operating System for AI”. You can listen to the original talk from here. For better view of the slides, you can follow along from here.
The operating system on your laptop is running tens or hundreds of processes concurrently. It gives each process just the right amount of resources that it needs (RAM, CPU, IO). It isolates them in their own virtual address space, locks them down to a set of predefined permissions, allows them to inter-communicate, and allow you, the user, to safely monitor and control them. The operating system abstracts away the hardware layer (writing to a flash drive is the same as writing to a hard drive) and it doesn’t care what programming language or technology stack you used to write those apps – it just runs them, smoothly and consistently.
As machine learning penetrates the enterprise, companies will soon find themselves productionizing more and more models and at a faster clip. Deployment efficiency, resource scaling, monitoring and auditing will start to become harder and more expensive to sustain over time. Data scientists from different corners of the company will each have their own set of preferred technology stacks (R, Python, Julia, Tensorflow, Caffe, deeplearning4j, H2O, etc.) and data center strategies will shift from one cloud to hybrid. Running, scaling, and monitoring heterogeneous models in a cloud-agnostic way is a responsibility analogous to an operating system – that’s what we want to talk about. Read More…
TL;DR A resilient Data Science Platform is a necessity to every centralized data science team within a large corporation. It helps them centralize, reuse, and productionize their models at peta scale. We’ve built Algorithmia Enterprise for that purpose.
You’ve built that R/Python/Java model. It works well. Now what?
Sharing, reusing, and running models at peta-scale is not part of the data scientist’s workflow. This inefficiency is amplified in a corporate environment where data scientists need to coordinate every move with IT, continuous deployment is a mess (if not impossible), reusability is low, and the pain snowballs as different corners of the company start to “Googlify their business”. Read More…