The history of computing features a succession of organisations that looked, for a while at least, as if they were so deeply embedded in our lives that we’d never do without them.
IBM looked like that, and Microsoft did too. More recently it’s been Google and Facebook.
Sometimes they look unassailable because, in the narrow territory they occupy, they are.
When they do fall it isn’t because somebody storms that territory, they fall because the ground beneath them shifts.
For years and years Linux enthusiasts proclaimed “this will be the year that Linux finally competes with Windows on the desktop!”, and every year it wasn’t.
But Linux, under the brand name Android, eventually smoked Microsoft when ‘Desktop’ gave way to ‘Mobile’.
Google has been the 800-pound gorilla of web search since the late 1990s and all attempts to out-Google it have failed. Its market share is rock solid and it’s seen off all challengers from lumbering tech leviathans to nimble and disruptive startups.
Google will not cede its territory to a Google clone but it might one day find that its territory is not what it was.
The web is getting deeper and darker and Google, Bing and Yahoo don’t actually search most of it.
They don’t search the sites on anonymous, encrypted networks like Tor and I2P (the so-called Dark Web) and they don’t search the sites that have either asked to be ignored or that can’t be found by following links from other websites (the vast, virtual wasteland known as the Deep Web).
The big search engines don’t ignore the Deep Web because there’s some impenetrable technical barrier that prevents them from indexing it – they do it because they’re commercial entities and the costs and benefits of searching beyond their current horizons don’t stack up.
That’s fine for most of us, most of the time, but it means that there are a lot of sites that go un-indexed and lots of searches that the current crop of engines are very bad at.
That’s why the US’s Defence Advanced Research Projects Agency (DARPA) invented a search engine for the deep web called Memex.
Memex is designed to go beyond the one-size-fits-all approach of Google and deliver the domain-specific searches that are the very best solution for narrow interests.
In its first year it’s been tackling the problems of human trafficking and slavery – things that, according to DARPA, have a significant presence beyond the gaze of commercial search engines.
When we first reported on Memex in February, we knew that it would have potential far beyond that. What we didn’t know was that parts of it would become available more widely, to the likes of you and me.
A lot of the project is still somewhat murky and most of the 17 technology partners involved are still unnamed, but the plan seems to be to lift the veil, at least partially, over the next two years, starting this Friday.
That’s when an initial tranche of Memex components, including software from a team called Hyperion Gray, will be listed on DARPA’s Open Catalog.
The Hyperion Gray team described their work to Forbes as:
Advanced web crawling and scraping technologies, with a dose of Artificial Intelligence and machine learning, with the goal of being able to retrieve virtually any content on the internet in an automated way.
Eventually our system will be like an army of robot interns that can find stuff for you on the web, while you do important things like watch cat videos.
More components will follow in December and, by the time the project wraps, a “general purpose technology” will be available.
Memex and Google don’t overlap much, they solve different problems, they serve different needs and they’re funded in very different ways.
But so were Linux and Microsoft.
The tools that DARPA releases at the end of the project probably won’t be a direct competitor to Google but I expect they will be mature and better suited to certain government and business applications than Google is.
That might not matter to Google but there are three reasons why Memex might catch its eye.
The first is not news but it’s true none the less – the web is changing and so is internet use.
When Google started there was no Snapchat, Bitcoin or Facebook. Nobody cared about the Deep Web because it was hard enough to find the things you actually wanted and nobody cared about the Dark Web (remember FreeNet?) because nobody knew what it was for.
The second is this statement made by Christopher White, the man heading up the Memex team at DARPA, who’s clearly thinking big:
The problem we're trying to address is that currently access to web content is mediated by a few very large commercial search engines - Google, Microsoft Bing, Yahoo - and essentially it's a one-size fits all interface...
We've started with one domain, the human trafficking domain ... In the end we want it to be useful for any domain of interest.
That's our ambitious goal: to enable a new kind of search engine, a new way to access public web content
And the third is what we’ve just discovered – Memex isn’t just for spooks and G-Men, it’s for the rest of us to use and, more importantly, to play with.
It’s one thing to use software and quite another to be able to change it. The beauty of open source software is that people are free to take it in new directions – just like Google did when it picked up Linux and turned it into Android.
Image of torch searchlight courtesy of Shutterstock.
19 comments on “Is DARPA’s Memex search engine a Google-killer?”
will this mean then that I’ll get relevant search returns instead of what Google (other search engines are available) thinks I want?
If the “Deep Web” becomes searchable, is it still deep?
I guess this is a rhetorical question but so our readers are clear – the Deep Web is everything beyond the ‘event horizon’ of commercial search engines.
Erm no. The true deep web is deeper than that. It contains sites which cannot even be connected to unless you are on an anonymized distributed network and the sites have a TLD of .onion.
At Naked Security we use Deep Web and Dark Web as follows (because we think it’s how they’re most commonly understood):
Deep Web refers to sites on the World Wide Web that could be indexed by commercial search engines but aren’t because there are no references to them or because they’ve elected not to be indexed.
Dark Web refers to the collection of websites on encrypted networks like I2P and Tor that you’re referring to.
The Dark Web is a subset of Deep Web – it is true that sites on Tor are beyond the search horizon of commercial search engines but it is not true that all sites beyond the search horizon of commercial search engines are on Tor.
Interestingly it’s a bit muddier than that – some .onion site content is available on the WWW, and indexable by Google, as a result of Tor2Web services like Onion City.
That’s deep, man.
NO ITS DARK!
Nice article !!!
Had to read three pages to be informed that DARPA is building a deep-web search engine. I already knew that.
Always good to have multiple sources to reinforce your knowledge, eh 🙂
“Advanced web crawling and scraping technologies, with a dose of Artificial Intelligence and machine learning, with the goal of being able to retrieve virtually any content on the internet in an automated way.”
How long until it’s capable of watching video streams and understanding pictures? It sounds like the basis to a God AI. Something out of Eagle Eye or Person of Interest. Especially if it actually gets smarter based on what it “reads.”
“You are being watched. The government has a secret system: a machine that spies on you every hour of every day. I know, because I built it. I designed the machine to detect acts of terror, but it sees everything. Violent crimes involving ordinary people; people like you. Crimes the government considered ‘irrelevant’. They wouldn’t act, so I decided I would. But I needed a partner, someone with the skills to intervene. Hunted by the authorities, we work in secret. You’ll never find us, but victim or perpetrator, if your number’s up… we’ll find you”
— Harold Finch
Didn’t the name .’Memex’ first appear on a high performance database engine which, I think, came out of Glasgow University in the 1980s? Any connection? Check out this web page:
No. The word memex (a portmanteau or word-combination of “memory” and “index”) was the subject of a 1945 paper by Vannevar Bush envisioning a future device very similar to today’s Apple or Android tablets.
Since DARPA is funded by … well you know who. It would seem to me this would just be another means of “TRACKING” folks. If the “Web absent of Light” which supposedly contains a large number of “Obscure” web sites with sometimes nefarious subject matter. Well I can’t help but think that if an individual were to search for said subject matter it would tag that search for follow-up. I’m just not buyin into a Government funded agency creating a Search Engine that should theoretically allow the user to be “Anonymous”
Government is a massive, many-headed beast. Don’t forget that the US Navy and DARPA developed and funded both the ARPANET (from which the Deep Web grew) and Tor (which makes up most of the Dark Web).
By your own logic they wouldn’t need Memex to track people because they created the environments that Memex will be tracking you in.
Memex components will be open source, like Tor, which means you’ll be free to find and remove any tracking you discover and give other people your tracking-free versions.
But then, let me ask you this: Could not Google, MS, et all….
Access MEMEX with their own resources….
And then, could they not make it (the data) available publicly, after you “click/sign” some waiver?
Maybe even have to pay some triffle sum (say through Paypal (or whomever) for the privilege….
So now you could use the data on all of Google’s other properties. Gmail, Youtube, G+, etc…. And maybe use it on all of Microsoft’;s properties.
The plot could be easily thickened.
Because your “search” would then NOT be your search, but say a Google search… Good for paranoids.
What DO you think, Sophos?
Google, MS et al will be free to access, use and reuse Memex components, and so will you. Tech start-ups will make hay with this.