Self-hosted search option is a new approach to bursting the filter bubble

If you’re worried about Google’s attempts to track you more closely than ever before, there’s another approach you can take to online search engines: host your own.

Google came under fire recently for its super-intrusive proposal to track our in-store purchases. Privacy groups are doing their best to fight it in the courts, but in the meantime its users seem doomed to live under its ever-watchful eye.

The company knows an awful lot about you, as Naked Security has detailed here. Sure, you can delete all of your cookies, sign out of Google when conducting sensitive searches and use Tor for anonymity. But let’s be honest – we don’t really make the effort, do we?

“Most users search Google while signed in, so all of the information on their online life is available: YouTube searches, emails and past search history,” says Adam Tauber.

Tauber is the founder of Searx, an alternative search engine that prides itself on its user privacy. Unlike many other search engines, Searx doesn’t monetize its users. Users don’t even have to use its hosted search service to take advantage of it.

Written in Python, Searx is a meta-search engine, pulling in search results from a wide variety of sources. The program is self-hosted, meaning that you can use one of several instances hosted by other people like this one, or just install and run your own.

Tauber conceived the idea five years ago in a camp organised by a hacker space in his native Hungary.

“During a discussion around the campfire we realised that we don’t really have any option to search privately on the web without tons of browser addons/operating system hardening/VPN tunnels/etc,” he told Naked Security. Two weeks after the camp, he launched an alpha version on Github.

Bursting the filter bubble

Searx doesn’t crawl and index the web itself. Instead, it sends searches to about 70 supported search engines, and supports custom integrations with others of your choosing (this is open source software, so you can code it yourself). It submits searches without cookies or identifying information, meaning that the engines – including Google – don’t know anything about who’s searching.

“When using Searx, the IP address of Searx, a random User-Agent and a search query is sent to Google by default,” he says. “Of course, you can customize Searx to forward other extra parameters like search language or the page number of the requested result page.”

Cloaking the user from Google in this way has its upsides and downsides. For those that don’t buy the “nothing to hide” argument, it stops the search engine from invading your privacy by tracking what you’re doing. On the other hand, if it doesn’t know anything about you, then it can’t return the localized, more relevant results that search engines use to add value for users. Searching for restaurants nearby won’t focus the lens on local eateries. Says Tauber:

In this case Google cannot access any personal information or preferences. [It] only selects results for you based on the IP address of the Searx instance you use. By not having all the possible information on you, it makes it harder to tailor results to your taste.

For him, this is a feature worth having. “Thus, [the] filter bubble can be escaped.”

That’s a big deal for those worried about the echo chambers created by the likes of Google, Facebook and other large search and social media hubs. Some worry that by personalizing what they show us, these companies limit alternative perspectives. If a search engine only ever shows you stories about football and fashion, it will limit your potential to grow and expand your horizons.

More worryingly, if you lean one way politically and search engines only show you content that supports your viewpoint, you risk losing the ability to see someone else’s perspective and think critically. That has far-reaching implications, especially for younger people who grew up not knowing any different. You might argue that this has contributed to our current polarization problem.

Six years ago, when Eli Pariser wrote the defining book on this problem, The Filter Bubble, he said that Google used 57 signals about you to personalize your content. How many might there be now?

Alternative search engines

There have been other attempts to wrest users away from Google. Startpage in effect acts as a proxy for Google, while Disconnect offers private search as part of its broader privacy protection and tracker blocking service.

Then, there’s DuckDuckGo, which draws search results from third party sites such as Bing and Yandex without tracking you. Privacy is a key selling point for DuckDuckGo, which doesn’t log IP addresses, cookies, or search history. It includes a Tor exit relay and to help speed up search results for users of the anonymizing network.

DuckDuckGo uses its rel=”nofollow”community members to enhance its results. DuckDuckHack lets them code their own search engine responses, pulling from third-party databases online.

The search engine makes its money from advertising but avoids the intricate user tracking that you see on sites like Facebook and Google, arguing that serving ads based on keywords are enough.

While DuckDuckGo monetizes its users, Tauber wants to continue with his purist non-profit vision. He has big plans for Searx in the future.

Apart from implementing features requested by the community on GitHub, it is planned to make hosting and running Searx instances more accessible to everyone.

The Searx community has already created a package for its software in the Debian Linux distribution, and it also plans an administrative interface and images that can be preinstalled on multiple other platforms, in effect creating a baked-in, self-hosted search engine. “This way not only tech-savvy people are able to run their own instances,” Tauber concludes.

Most people will continue using data-slurping search engines. DuckDuckGo doesn’t even feature in this list, for example, which shows Google with more than four fifths of the global market share. Users generally either take the time to adjust their privacy settings on Google, don’t bother, or simply don’t know or think about them.

For those that are privacy-conscious, however, a handful of alternatives ranging from DuckDuckGo through to the ultra-private Searx represent viable alternatives.