Decentralization: I Want to Believe

Decentralization: I Want to Believe

Note: this is a very old post! It’s from 2014 before ZeroTier launched and describes some of the rationale for its design and what it was attempting to achieve.

My current project is a network virtualization engine called ZeroTier One. It has a new network protocol that I designed from scratch, and in the process of doing so I put a lot of thought and research into the question of just how decentralized I could make it.

There are two reasons I cared. One is sort of techno-political: I wanted to help swing the pendulum back in the direction “personal” computing. Indeed, one of the first communities I showed ZeroTier to and got feedback from was called redecentralize.org. The second reason is more pragmatic: the more peer-to-peer and the less centralized I could make the basic protocol, the less it would cost to run.

The design I settled on is ultimately rather boring. I built a peer-to-peer protocol with a central hub architecture comprised of multiple redundant shared-nothing anchor nodes at geographically diverse points on the global Internet.

I designed the protocol to be capable of evolving toward a more decentralized design in the future without disrupting existing users, but that’s where it stands today.

Why didn’t I go further? Why didn’t I go for something more sexy and radical, like what MaidSafe and TeleHash and others are trying to accomplish? It certainly would have netted me more attention. As it stands, there’s a cadre of users that dismiss ZeroTier when they learn there are central points.

I stopped because I got discouraged. I’m not convinced it’s possible, at least not without sacrifices that would be unacceptable to most users.

Defining the Problem

I began the technical design of ZeroTier with a series of constraints and goals for the underlying peer-to-peer network that would host the Ethernet virtualization layer. They were and are in my head, so let me try to articulate them now. These are in order of importance, from most to least.

Any device must be able, by way of a persistent address, to contact any other connected device in the world at any time. Initialization of connectivity should take no longer than one second on average, ideally as close as possible to underlying network latency. (Non-coincidentally, this is what IP originally did before mobility broke the “static” part and NAT and other kinds of poorly conceived fail broke IP generally.)

It must just work. It must be “zero configuration.” The underlying design must enable a user experience that does not invite the ghost of Steve Jobs to appear in my dreams and berate me.

If the underlying network location of a peer changes, such as by leaving hotel WiFi and joining a 4G access point, connectivity and reachability as seen by any arbitrary device in the world should not lapse for longer than ten seconds. (This is just an aspect of goal two, but it’s worth stating separately because it will have implications later in the discussion.)

It must work for users obtaining their Internet service via all kinds of real-world networks including misconfigured ones, double-NAT and other horrors, firewalls whose clueless administrators think blocking everything but http and https does anything but inconvenience their users more than their attackers, etc.

Communication should be private, encrypted, authenticated, and generally secure. Security should be end to end, with secret keys not requiring central escrow. The network must be robust against “split brain” and other fragmentations of the address space (whether intentionally-induced or not) and Sybil attacks.

The overall network should be as decentralized as possible. Peer to peer connectivity should always be preferred, and centralized points of control should be minimized or eliminated.

There are other technical constraints, but they’re not pertinent to this discussion.

The important thing is the ordering. Decentralization comes in dead last. Even security comes in second to last.

Why?

Because if the first two items are not achieved, nobody will use this and I’m wasting my time by developing it.

After a lot of reading and some experimentation with my own code and the work of others, I concluded that the first five are almost totally achievable at the expense of decentralization.

I came up with a triangle relationship: efficiency, security, decentralization, pick two.

Security is equivalent with reliability, since reliability is just security vs “attacks” that do not originate in human agency.

Why?

It’s not a government conspiracy, and it’s not market forces… at least not directly. This is absolutely a technical problem. If it looks like any of the other two, that’s a side effect of technical difficulty.

People want decentralization. Any time someone attempts it seriously, it gets bumped to the very top on sites like Reddit and Hacker News. Even top venture capitalists are talking about it and investing in decentralized systems when investable opportunities in the area present themselves. Governments even want it, at least in certain domains. For years military R&D labs have attempted to build highly robust decentralized mesh networks. They want them for the same reason hacktivists and cyber-libertarians do: censorship resistance. In their case the censorship they want to avoid is adversarial interference with battlefield communication networks.

From a market point of view the paucity of decentralized alternatives is a supply problem, not a demand problem. People want speed, security, and an overall positive user experience. Most don’t care at all how these things are achieved, but enough high-influence users are biased toward decentralization that in a contest between equals their influence would likely steer the whole market in that direction. There aren’t any decentralized alternatives to Facebook, Google, or Twitter because nobody’s been able to ship one.

This paper, which I found during my deep dive on distributed systems, reaches more or less the same conclusion I did in the first section and elaborates upon it mathematically. It speaks in terms of queueing and phase transitions, but it’s not hard to leap from that to more concrete factors like speed, stability, and scalability.

I think the major culprit might be the CAP theorem. A distributed database cannot simultaneously provide consistency, availability, and partition tolerance. You have to pick two, which sounds familiar. I haven’t tried to do this rigorously, but my intuition is that my trichotomy on distributed networks is probably a domain-specific restatement of CAP.

Every major decentralized system we might want to build involves some kind of distributed database.

Let’s say we want to make the ZeroTier peer-to-peer network completely decentralized, eliminating central fixed anchor nodes. Now we need a fast, lightweight distributed database where changes in network topology can rapidly be updated and made available to everyone at the same time. We also need a way for peers to reliably find and advertise help when they need to do things like traverse NAT or relay.

If we pick consistency and availability, the network can’t tolerate Internet weather or edge nodes popping up and shutting down. That’s a non-starter for a decentralized system built out of unreliable consumer devices and cloud servers that are created and destroyed at a whim. If we pick availability and partition tolerance we have a system trivially vulnerable to Sybil attacks, not to mention failing to achieve goals #2 and #3. If we pick consistency and partition tolerance, the network goes down all the time and is horrendously slow.

But Bitcoin!

Nope.

Bitcoin operates within the constraints of “efficiency, security, decentralization, pick two.” Bitcoin picks security and decentralization at the expense of efficiency.

A centralized alternative to Bitcoin would be a simple SQL database with a schema representing standard double-entry accounting and some meta-data fields. The entire transaction volume of the Bitcoin network could be handled by a Raspberry Pi in a shoebox.

Instead, we have a transaction volume of a few hundred thousand database entries a day being handled by a compute cluster comparable to those used to simulate atomic bomb blasts with physically realistic voxel models or probabilistically describe a complete relationship graph for the human proteome.

Bitcoin gets away with this by riding on a speculative frenzy, and by compensating people for their compute power via mining. If every Bitcoin transaction included a fee equal to the energy and amortized hardware cost required to complete, verify, and record it, I’m not convinced Bitcoin would be any cheaper than conventional banking with all its political, regulatory, and personnel overhead.

I am not bashing Bitcoin. It’s arguably one of the most impressive and creative hacks in the history of cryptography and distributed systems. It’s a genuine innovation. It’s useful, but it’s not the holy grail.

(A few more of my thoughts on Bitcoin can be found here. TL;DR: I also think Bitcoin, when considered in totality as a system with multiple social, economic, and technical components, has many hidden informal points of central coordination and control. It’s not truly “headless.”)

The holy grail would be something at least lightweight enough (in bandwidth and CPU) for a mobile phone, cryptographically secure, as reliable as the underlying network, and utterly devoid of a center.

I want to believe, but I’ve become a skeptic. Hackers of the world: please prove me wrong.

Blind Idiot Gods

Perhaps we should stop letting the perfect be the enemy of the good.

Facebook, Twitter, Google, the feudal “app store” model of the mobile ecosystem, and a myriad other closed silos are clearly a lot more centralized than what many of us would want. All that centralization comes at a cost: monopoly rents, mass surveillance, functionality gate keeping, uncompensated monetization of our content, and diminished opportunities for new entrepreneurship and innovation.

The title of the Tsitsiklis/Xu paper I linked above is “On the Power of (even a little) Centralization in Distributed Processing.”

The question we should be asking is: how little centralization can we get away with? Do we have to dump absolutely everything into the gaping maw of “the cloud” to achieve security, speed, and a good user experience, or can we move most of the intelligence in the network to the edges?

This is the reasoning process that guided the design of the ZeroTier network. After concluding that total decentralization couldn’t (barring future theoretical advances) be achieved without sacrificing the other things users cared about, I asked myself how I might minimize centralization. The design I came up with relegates the task of the “supernodes” in the center to one of simple peer location lookup and dumb packet relaying. I was also able to architect the protocol so there isn’t much in the way of design asymmetry, allowing supernodes to run exactly the same software as regular peers. Instead of client server, I came up with a design that was peer to (super-)peer to peer.

Can we go further? If we start with a client-server silo and decentralize until it hurts, what’s left?

Unless I’m missing it, I don’t think the Tsitsiklis/Xu paper establishes a precise definition for exactly what kind of centralization is needed. It does provide a mathematical model that can tell us something about thresholds, but I don’t see a precise demarcation of responsibility and character.

Once we know that, perhaps we can create a blind idiot God for the Internet.

I’m borrowing the phrase from H.P. Lovecraft, who wrote of a “blind idiot God in the center of the universe” whose “piping” dictates the motion of all. The thing I’m imagining is a central database that leverages cryptography to achieve zero knowledge centralization sufficient to allow maximally decentralized systems to achieve the phase transition described in the Tsitsiklis/Xu paper. Around this system could be built all sorts of things. It would coordinate, but it would have no insight. It would be blind.

Could such a provably minimal hub be built? Could it be cheap enough to support through donations, like Wikipedia? Could it scale sufficiently to allow us to build decentralized alternatives to the entire closed silo ecosystem?

It’s one of the things I’d like to spend a bit of time on in the future. For now I’m concentrating on improving ZeroTier and getting something like a “real business” under it. Maybe then I can do R&D in this area and get paid for it. After all, some of these questions are synonymous with “how can I scale ZeroTier out to >100,000,000 users without eating it on bandwidth costs and performance degradation?”

In the meantime, please do your own hacking and ask your own questions. If you come up with different answers, let me know. The feudal model is closing in. We need alternatives.

很好的文章,为了方便阅读,我用 ChatGPT 将其翻译成了中文。


去中心化:我想相信

注: 这是一篇非常早期的文章!写于 2014 年,在 ZeroTier 正式发布之前,主要描述了其设计背后的部分动机以及试图实现的目标。

我目前的项目是一个名为 ZeroTier One 的网络虚拟化引擎。它采用了一种我从零开始设计的全新网络协议,在设计过程中,我对“究竟能把它做得多去中心化”这个问题投入了大量思考和研究。

我之所以关心这个问题,有两个原因。
第一个多少带点技术—政治色彩:我希望能把技术发展的钟摆重新拉回“个人计算”的方向。事实上,我最早向其展示 ZeroTier 并获取反馈的社区之一就叫 redecentralize.org
第二个原因更为务实:基础协议越是点对点、越少中心化,运行成本就越低。

最终我采用的设计,其实相当“无聊”。我构建了一个点对点协议,但在其之上仍有一个中心枢纽式的架构,由分布在全球不同地理位置、相互冗余、彼此不共享状态的锚节点(anchor nodes)组成。

我在协议设计时就考虑到了未来向更强去中心化演进的可能性,并且保证这一过程不会破坏现有用户的使用体验。但就目前而言,它就是这样。

为什么我没有走得更远?为什么没有选择更“性感”、更激进的方案,比如 MaidSafe、TeleHash 等项目试图实现的那种?那样做显然会为我赢得更多关注。而现实是,一旦有人得知 ZeroTier 仍然存在中心节点,就会有一部分用户立刻否定它。

我之所以停下来,是因为我感到灰心。我并不确信这件事真的可行——至少在不做出大多数用户无法接受的牺牲的前提下,是不可能的。


问题的定义

在进行 ZeroTier 的技术设计之初,我为承载以太网虚拟化层的底层点对点网络设定了一系列约束和目标。这些想法一直存在于我的脑海里,现在我试着把它们清楚地表达出来。下面按重要性从高到低排序:

  1. 任何设备 都必须能够通过一个持久不变的地址,在任何时间与世界上任何其他已连接的设备通信。连接初始化的平均时间不应超过一秒,理想情况下应尽量接近底层网络的固有延迟。(顺便一提,这正是 IP 在“静态地址”假设尚未被移动性、NAT 等设计失误打破之前所做到的事情。)

  2. 它必须开箱即用。 必须是“零配置”的。底层设计必须能支撑一种用户体验,不至于让我在梦里被史蒂夫·乔布斯的幽灵训斥。

  3. 如果某个节点的底层网络位置发生变化,比如从酒店 WiFi 切换到 4G 网络,那么从世界上任意其他设备的视角来看,它的连通性和可达性中断时间不应超过 10 秒。(这其实是目标 2 的一个方面,但值得单独强调,因为它在后文中会产生重要影响。)

  4. 它必须能在各种真实世界的网络环境中工作,包括配置错误的网络、双重 NAT、各种“灾难现场”,以及那些愚蠢地认为只允许 http 和 https 就能提高安全性、实际上只是在折腾用户而非攻击者的防火墙。

  5. 通信必须是私密的、加密的、可认证的,并且整体上是安全的。安全性应是端到端的,密钥不应需要中心化托管。网络必须能够抵御“脑裂”等地址空间分裂问题(无论是否人为造成),以及 Sybil 攻击。

  6. 整体网络应尽可能去中心化,应始终优先使用点对点连接,并将中心化控制点最小化,甚至消除。

还有一些其他技术约束,但与本文讨论无关。

关键在于排序:去中心化排在最后,甚至连安全性都只是倒数第二。

为什么?

因为如果前两条目标无法实现,那就没人会用这个系统,我开发它纯粹是在浪费时间。

在大量阅读和一些基于自己与他人代码的实验之后,我得出的结论是:前五个目标几乎都可以实现,但代价是牺牲去中心化。

我提出了一个三角关系:

效率、安全性、去中心化 —— 三者只能选其二。

安全性在这里几乎等同于可靠性,因为可靠性不过是“面对非人为攻击时的安全性”。


为什么会这样?

这不是政府阴谋,也不是市场力量——至少不是直接原因。这本质上是一个技术问题。如果它看起来像政治或经济问题,那只是技术难度带来的副作用。

人们确实想要去中心化。每当有人认真尝试去中心化系统,这类项目就会在 Reddit、Hacker News 等网站被顶到最前面。顶级风投也在谈论并投资去中心化系统。政府在某些领域也想要它。多年来,军事研发机构一直试图构建高度健壮的去中心化网状网络,原因与黑客行动主义者和赛博自由主义者相同:抗审查。只不过他们想避免的是战场通信网络被对手干扰。

从市场角度看,去中心化方案的匮乏是供给问题,而不是需求问题。人们想要速度、安全性和良好的用户体验,大多数人并不关心这些是如何实现的。但有足够多具有影响力的用户偏好去中心化,在条件相当的竞争中,他们很可能会把整个市场引向那个方向。之所以没有去中心化版的 Facebook、Google 或 Twitter,并不是没人想要,而是没人能真正把它做出来并交付

我在深入研究分布式系统时读到的一篇论文,在数学层面上得出了与我在第一部分中类似的结论。论文使用了排队论和相变理论的语言,但不难将其映射到更直观的因素上,比如速度、稳定性和可扩展性。

我认为主要的“罪魁祸首”可能是 CAP 定理:一个分布式数据库无法同时满足一致性(Consistency)、可用性(Availability)和分区容忍性(Partition Tolerance),只能三选二。这听起来是不是很熟悉?直觉上,我提出的分布式网络三难困境,可能正是 CAP 在该领域的一个具体化版本。

而我们想构建的几乎所有大型去中心化系统,本质上都涉及某种形式的分布式数据库。

假设我们要让 ZeroTier 的点对点网络完全去中心化,彻底移除中心锚节点。那么我们就需要一个快速、轻量级的分布式数据库,能够让网络拓扑的变化被迅速更新并同时传播给所有节点;还需要一种机制,让节点在需要穿越 NAT 或进行中继时,能够可靠地发现并发布“我需要帮助 / 我能帮忙”的信息。

如果选择一致性和可用性,系统就无法容忍互联网的“天气变化”或边缘节点频繁上下线——这对由不可靠的消费级设备和随时创建销毁的云服务器组成的去中心化系统来说是不可接受的。
如果选择可用性和分区容忍性,系统将轻易遭受 Sybil 攻击,而且也无法实现目标 2 和 3。
如果选择一致性和分区容忍性,网络将频繁不可用,并且慢得令人发指。


但是比特币呢?

不行。

比特币同样遵循“效率、安全、去中心化,三选二”的规则。它选择了安全性和去中心化,牺牲了效率。

如果比特币是中心化的,实现方式只需要一个简单的 SQL 数据库,采用标准的复式记账结构,加上一些元数据字段即可。整个比特币网络的交易量,用一台鞋盒里的树莓派就能轻松处理。

而现实是,我们每天只有几十万条交易,却要依赖一个算力集群,其规模堪比用于模拟原子弹爆炸、或者构建完整人类蛋白质组关系图的计算系统。

比特币之所以能运转,是因为它踩在投机狂热之上,并通过挖矿为算力提供经济激励。如果每一笔比特币交易都要真实支付完成、验证和记录该交易所需的能源和硬件折旧成本,我并不确信它会比包含大量政治、监管和人力成本的传统银行体系更便宜。

我并不是在贬低比特币。它可能是密码学和分布式系统史上最令人惊叹、最具创造性的黑客式创新之一。它是真正的创新,也确实有用,但它并不是“圣杯”。

(我在别处还写过一些关于比特币的想法。简而言之:当你把比特币视为一个包含社会、经济和技术多方面因素的整体系统时,会发现它仍然存在许多隐性的、非正式的中心化协调与控制点。它并不是真正“无头”的。)

真正的圣杯,应当是:
在带宽和 CPU 上足够轻量,连手机都能运行;
具备密码学安全性;
可靠性不低于底层网络;
并且完全没有中心

我想相信它的存在,但我已经变得怀疑。
世界各地的黑客们:请证明我错了。


盲目愚神

或许,我们该停止让“完美”成为“更好”的敌人。

Facebook、Twitter、Google、移动生态中封建式的“应用商店”模式,以及无数封闭的信息孤岛,显然比我们期望的要中心化得多。而这种中心化是有代价的:垄断租金、大规模监控、功能把关、对我们内容的无偿变现,以及对新创业和创新机会的压制。

我之前提到的 Tsitsiklis / Xu 论文标题是:《分布式处理中的(哪怕一点点)中心化的力量》。

我们真正该问的问题是:
我们究竟能容忍多一点点中心化?
为了安全、速度和良好的用户体验,是否真的必须把一切都扔进“云”的血盆大口?
还是说,我们可以把网络的大部分智能推到边缘?

这正是指导 ZeroTier 网络设计的思考过程。在得出“完全去中心化在不牺牲用户真正关心的东西的前提下无法实现”的结论后,我开始思考如何最小化中心化。最终的设计将中心“超级节点”的职责限制为:简单的节点定位查询和“傻瓜式”的数据包转发。同时,我还让这些超级节点与普通节点运行完全相同的软件,从而在设计上避免明显的不对称性。与其说是客户端—服务器,不如说是节点—(超级)节点—节点

还能再进一步吗?如果我们从一个客户端—服务器的封闭系统开始,一路去中心化,直到“开始痛”,最终会剩下什么?

除非我遗漏了什么,否则 Tsitsiklis / Xu 的论文并没有精确定义“究竟需要哪种中心化”。它给出了一个关于阈值的数学模型,但并没有明确责任和性质的边界。

一旦我们弄清这一点,也许就能为互联网创造一个“盲目愚神”。

这个说法借自 H.P. 洛夫克拉夫特,他写到宇宙中心有一位“盲目的愚神”,万物的运动都由它的“吹奏”所支配。我设想的是一种中心数据库:它通过密码学实现零知识式的中心化,只提供足以让最大程度去中心化系统跨越相变所需的最小协调能力。它负责协调,却无法洞察任何内容——它是“盲”的。

这样一个可证明“最小”的中心枢纽,是否可能被构建?
它是否能像维基百科那样,通过捐赠维持?
它是否能扩展到足以支撑整个封闭生态系统的去中心化替代品?

这是我未来想投入时间研究的问题之一。眼下,我仍然专注于改进 ZeroTier,并把它发展成一个“真正的业务”。或许到那时,我可以在这一方向做研发并获得报酬。毕竟,这些问题本质上等同于:

如何在不被带宽成本和性能退化拖垮的情况下,把 ZeroTier 扩展到超过一亿用户?

在此之前,请你自己动手探索、提出问题。如果你得出了不同的答案,欢迎告诉我。
封建式模型正在逼近。我们需要替代方案。