RSS Feed
  1. 丹麦

    October 15, 2016 by xudifsd

    这次十一去欧洲逛了下,主要原因是有个在丹麦工作的同学准备回国,想趁他回国前去玩玩。丹麦本身似乎并没有什么可玩的地方,没有任何景点,唯一一个小美人鱼雕像也被评为最让人失望的景点。但是在那里生活却很舒服,空气和环境都非常好。

    我觉得这次去丹麦最有意思的事就是体验了下生活在丹麦的感觉:在同学那住了几天,每天早上去超市买早晨,然后出去玩,下午去超市或者中国城买菜回来自己做。算是体验了下在世界另一个地方生活的感觉吧。

    多元化

    丹麦的人种很多,街上什么人都有:黑人、金发白人、黑发白人、红发的东欧人、棕色人以及东亚的黄种人。本来我以为只有美国有这么多人种,所以在丹麦街上看到这么多人种还是觉得很神奇的。中国大约确实是世界上少有的很少能接触到其他种族的国家了,大部分人一辈子都没有接触过其他种族的人。

    在丹麦生活如果只会英语似乎真还不行,虽然大部分场合英语交流没有问题,但是超时里卖的各种东西、车票以及很多指示牌只会有丹麦语,如果不懂则完全如文盲。另外哥本哈根里瑞典挺近,我和同学一起坐火车去了一趟,火车站也是基本上没有英语全是瑞典语。这种过了个桥就是另一个国家,用完全不同的语言,这对我来说还是很难理解的,不过据说瑞典语和丹麦语很像,只是多了两个字母,不过我都不懂所以也完全无法理解。

    据同学说他们组也没有任意两个人属于同一国家,我估计这在欧美应该挺普遍的。丹麦应该是一个相当多元化的国家,之前还以为只有北欧的白人呢。相比来看,中国,特别是中国东部基本上没有多元化的存在,单一的种族,单一的文化和语言,而且由于中国人口多市场大,很多的企业做好国内的市场也就可以过得很好了,所以绝大多数企业都是没有出去扩展海外市场的动力,更不可能去雇佣其他国家的人,而且由于户籍制度以及中国是世界上最难办绿卡的国家,所以来华外国人估计大多是外企的高管短期过来。另外由于封锁大部分的外企不太可能触摸到中国的市场,所以消费者接触到国外其他产品的机会其实也相当少。感觉这和明清那时候的自给自主完全一样啊。就这点而言我觉得台湾都比大陆做得好。

    五个不同状态的中国人

    这次在丹麦虽然没有待多长时间,但是在丹麦碰到了五个不同状态的中国人:我的同学是在丹麦工作但没有拿到永居的,同学的房东是在丹麦工作且拿到永居的,同学的另一个室友是被丹麦人收养的中国女孩,另外我还见了一个在丹麦读本科的学生,而她的男朋友是丹麦中国移民的二代,父亲从事很中国的职业——哥本哈根大学教授。

    先说说被收养的女孩。第一次进屋的时候见面的,打招呼的时候直接用英语问我会不会英语,当时想我要听不懂英语怎么回答。因为长得就是一个东方女孩的面孔,最开始以为是中国人,但是我们用中文聊天她都听不懂,而且只跟我们说中文,我就以为是东南亚国家的人。最后一天的时候实在好奇就问了下,告诉我自己是被收养的中国人,由于我没有想到这种可能性,也因为收养并不好细问所以就转移话题了。我估计她是在很小的时候就被收养了,也正因如此不会听说中文,甚至她学校还有中文课,放学回来时会找房东学中文。很有意思的女孩,如果能长期接触倒挺想了解下她的世界观。

    再说说另一个在丹麦留学的女生和她的男朋友。和他们是在哥本哈根大学见面的,期间带着我逛了逛学校,他们都是学 CS 的,不过他们的课程就比国内有意思很多,一般国内学校教语言就教那些民工语言,但是他们教的是 haskell 和 erlang ,和那个女生还是因为她问我 haskell 的问题认识的。这样来看我们学校教得挺功利,就是为了毕业之后能用到。逛完后男生就回去打 Dota 去了。。然后我和女生去吃披萨。大部分去北欧留学的学生都是想留在丹麦并且拿到永居的,她也不例外,这边的工作基本上很轻松,而且公民失业了政府会给补助,而且补助不比程序员工资少多少,并且公民和永居的孩子在上大学的时候不需要学费,政府每月还会发 5K 克朗,所以除了买房买车外维持正常的生活并不需要太多的钱,名副其实的高福利。

    最后说说同学的房东,同学的房东是个很有意思的人,也是留学过来然后工作并且拿到永居了。因为来丹麦相当长时间了,一起出去逛的时候会跟我科普各种丹麦的小故事和文化,感觉他去当导游也是不会失业的。因为他拿到了永居,所以就问了一个我一直很好奇的问题:高福利社会会不会变得懒惰、竞争力下降。他就回答高福利其实可以保证做的事情是自己想做的,而不需要为了生存去做事,并且即使是高福利,其实失业的人和有工作的生活条件也会很不一样,有些追求的也不会希望失业。其实我觉得这可以推导出高福利社会不会产生无用发明吧?不过高福利社会的激励效应也会大大降低,社会的发展会因此慢下来很多。所以我觉得北欧可能会生活得很好,但是发展会很慢,一旦慢到一定程度就会出现大量的失业吧。

    每次出去玩都会感叹,中国其实还是相当穷啊,这些年的发展也只是基础设施和制造业的发展,都是相当消耗环境的,而且由于并没有和世界进行接轨,其实很多国内的企业完全不具备全球的竞争力,更别提全球的影响力了。互联网行业还是是比较和国际接轨的,但是也还欠缺很多。西方的资本可以很轻松就能雇佣全球的人才,而国内的资本就没有这能力和吸引力,而且国内民间资本本身也很弱,强的都是国家资本,只适合出去做基础设施。


  2. Lease Activating

    August 30, 2016 by xudifsd

    Lease is a very useful and pervasive technique in distributed system, it can be used to authorize other nodes in the system. For example, in master election, follower nodes would use lease to promise current master that they will not elect another master until the lease is expired. But the problem is how can you make sure that current master will not think it still holds the lease after lease grantors think otherwise?

    It is not an easy question in practice, because you have to cope with clock skew and asynchronous network. So you can not grant lease on absolute time, and you have to assume your package may have arbitrary time delays.

    Normally, we can make a conservative guess about network delay. To be absolutely safe, we have to be very conservative. So we may build a system that its lease is valid for 60 seconds, but the master have to assume the package has been delayed, say 30 seconds, and the master refresh its lease every 10 seconds. This looks good if the lease duration is long enough, but it won’t work if the lease expired quickly. The benefit of shorter lease is higher availability in the system, because long-lived lease will prevent system from working for longer time if the master crashed.

    I didn’t have a very good solution to this until I read a paper. The paper proposed a novel way to perform reads with high throughput and low latency in Paxos system without sacrificing consistency. It is especially useful in wide-area scenarios. Apart from the main topic of the paper, it also has a new way to grant & refresh leases without depending on external clock synchronization.

    lease

    As above picture shown, it uses guard to bound the promise duration: if grantor does not receive promise_ACK during t_guard, lease would expire at T3 + t_guard + t_lease. If holder does not receive promise during t_guard, the lease won’t be activated at all. The receival of promise_ACK only shorten the lease duration. When renewing active leases, there is no need to send the guard anymore, the most recent promise_ACK plays the role of the guard.

    With this protocol, we can use very short lease in the system because we make no guess at all. In the evaluation section of the paper, authors set lease duration to 2 seconds, and let grantor renew the current lease after 500ms. In this case if the holder crashed or unavailable, the lease won’t prevent the system from working for more than 2 seconds.


  3. chubby & zookeeper: different consistency level

    June 12, 2016 by xudifsd

    Many people know zookeeper, which is a widely used open source project. But few people know chubby, which is a service only used internally in Google and public can only know its detail through paper published by Google. Zookeeper provides almost the same functionality as chubby, but there’re a few subtly differences between them, and I will discuss the differences in this post. So later time when you read a Google paper saying they used chubby as a underlaying infrastructure, do not treat it as a zookeeper equivalent.

    Zookeeper is a distributed process coordinator, as O’Reilly puts it. Chubby is a distributed lock service provides strong consistency. Although both of them have a file system like API from user’s perspective, they provide different level of consistency, you can get a clue from their descriptions: coordinator is a much weaker word compare to lock service.

    Like most other systems, there are few write operations in chubby, so its network flow is dominated by read operations and heartbeat between client and server, because chubby provides coarse-grained locking, it’s necessary to use cache, also since chubby is intended to provide strong consistency, it needs to ensure that the client should never see staled data. It achieve these by cache data in library, and if anyone want to change the data, chubby ensures that all the caches are invalidated before the change take effect. From the paper, chubby has a rather heavy client library, and the library is the essential part of the system, provides not only connection and session management but also cache management as we discussed earlier. And it is the cache management that makes huge difference between zookeeper and chubby. Most users thought zookeeper provides a strong consistency but find out they’re wrong later.

    Not like chubby, zookeeper do not provides such strong consistency, actually, native client library of zookeeper do not have cache at all. What the difference can make by cache, you asked. Well, for example: if two zookeeper clients want to agree on a single volatile value, they should use the same path to write file, and content of that file signify the value of given key. If client A want to know the latest value, and client B want to change it, client A can either poll the file regularly or use watch mechanism provided by native zookeeper library, let’s assume A uses watch since it is more efficient. If B changed the value, zookeeper would respond to the change call before notifying client A, once B got response, it may assume any other clients who watched that value has been notified, but it is not the case: A may not received notification timely due to slow network, so it is possible that two clients see different value at the same time. This usually is not a problem if users do not require strong consistency, but if they do, they have to invent some buggy ad-hoc solution to get around of this. The author of zookeeper book argued that it would have made the design of zookeeper more complex if the library manage cache, and could cause zookeeper operations to stall while waiting for a client to acknowledge cache invalidation request. It is true that strong consistency come at cost, but people would find it is easier to use if strong consistency is guaranteed.

    To remedy this, Netflix created curator library which later moved to Apache foundation, this library provides the commonly used functionality and cache management. This additional layer to zookeeper allows it providing strong consistency needed by some users. So whenever you want to use zookeeper, use curator library instead of native library unless you know what you are doing.


  4. Use percentile in performance stats

    May 8, 2016 by xudifsd

    Here is a fun story of mine: last year, when I was preparing Google Summer of Code for finagle project, I encountered a class I didn’t understand. It is not too long, but from only source and comments I can not imagine its usage and why it is useful. Since it’s not a very important class, I skipped it, and tried to read other parts of the project. Unfortunately, finagle failed to be accepted as an organization in GSoC 2015(what a shame), so I stopped my reading.

    This year, I accepted a full-time job working on a distributed system, this is my favorite field in computer programming, I worked very hard to do my best. Recently, I read a post <Notes on Distributed Systems for Young Bloods>, it mentioned a lot of things that new recruits in distributed system should paying attention to. It has a lot of valuable advice, in particular, it emphasized the importance of metrics: to get a better understanding of the behaviour of the system, we, as system engineers, should expose metrics, and when doing so, we should not only expose averages, but also percentiles. This, however, reminds me that the mysterious class I met in finagle has something to do with metrics and stats, so I read that class again, and find out I can fully understand that class and its usage now.

    From <The Tail at Scale>, I understood that it is critical for the system to keep tail latency low, otherwise the overall tail latency would make the system intolerable slow. But I failed to see the implication that we should expose latency data, especially tail latency data to let people tune the system. Well, no wonder why finagle needs that class.

    In searching for a better algorithm to implement percentile metrics, I found another interesting paper <Effective Computation of Biased Quantiles over Data Streams>, this paper proposed a way to calculate quantiles over streams with finer error guarantees at higher ranks. It’s very efficient since its space bound is O(log n), but this algorithm still doesn’t fit in normal needs for performance stats, since the server is long time running, the number of metrics data is unbounded, this algorithm would consume too much memory if the server kept running for a long time. From this point BucketedHistogram in finagle perform great, we can specify error we can tolerate with limits, if we want more precise data in higher rank we can add more slots in higher rank limits, and it’s very easy to understand, what’s more, it uses constant space although this constant is a little large.


  5. 台湾游记

    February 24, 2016 by xudifsd

    来台湾之前听到一个说法:“中国的学生应该去下面三个地方旅行:纽约、台湾和日本。去纽约看看世界金融中心是什么样的,和我们有哪些差距;去台湾看看中国的另一个发展的可能性;去日本看看曾经的敌对国发展成什么样了。”

    虽然并不是因为这句话来台湾玩的,但是却觉得说得有点道理,在旅游的时候也注意比较了台湾和大陆的区别,以及产生区别的原因。

    国际化和发展程度

    去年我去美国走了一趟,感觉美国确实可以代表世界了:城市里面有日本人开的书店,卖日文和英文漫画;有意大利人开的披萨店;有中国人开的火锅店。在这里几乎可以见到任何一个国家的人,用对应的任何一种语言。不过当时并没有觉得特别震撼,因为毕竟是移民国家而且是世界唯一的超级大国。但是在台湾看到的国际化却很震撼,虽然台湾还是没有美国那么国际化,但是相比大陆的任何一个城市做得都要好了。之前觉得上海就比较国际化而且发达了,但是我觉得离台北还是差很远。台北各个厕所都有残疾人专用室和母婴室,而且各个阶梯也都有残疾人专用通道,在高铁上也都有相应设施,而大陆这点做得很差。

    另一个就是外国人会有很多,台北有很多并不奇怪,但是就连高雄都有很多,很多都是日韩来旅游的,所以相应的设施以及报站都有日文和韩文。

    大陆的各种设施都并不齐全,而且文化很单一也没什么活力。北上广都这样,更别说其他的城市了。之前看到网传的一篇文章说中国并没有兑现绝大多数加入世贸组织的承诺,我没有去验证谣言,但是我还是愿意相信这样的说法,回想新千年那时候各种宣传说与国际接轨了就觉得特别好笑。

    民主和自由

    一般在旅游过程中很难体会民主,但是在台北逛的时候台北正在为大运会建小巨蛋体育场,我们经过那里的时候看到各种贴着的类似大字报的东西,大约是一些民间组织或个人贴的,写着各种理由为什么没有必要建小巨蛋,为什么弊大于利之类的,而政府的宣传话语也被涂改成讽刺的话。这些场景在大陆大约也就是八八、八九年那时候能见到了?

    这里轮子很多,我们在台北的很多旅游景点都看到他们的身影,大约是向大陆人宣传吧。有一个大妈在中正纪念堂看到我们,估计是只有大陆的学生会去这种本地人不去的地方(笑),就跟我们宣传了起来,她提到了轮子,但是说的过程中基本上和轮子不沾边,我问了下她,是台湾本地人,去过几次大陆,说了很多要民主的话,希望两岸都发展好。我估计她是那种打着轮子的外衣宣传民主的人吧。另外在101大楼下还看到反轮子举着标语说滚出台湾之类的,一边的轮子沉默不语。从和当地人的其他谈话也可以知道轮子在台湾并不受欢迎,因为毕竟是邪教,宣传也有很多混淆视听,但是像之前的大妈需要借着轮子的外衣宣传民主也是无奈之举吧。

    与民主相对应的就是自由了,最为重要的一点就是言论和出版自由。我们在台湾逛了很多诚品书店,这里有很多外籍的原版书,日文的、英文的,这在国内的任何书店都看不到。很大一部分原因估计就是大陆对出版商的限制,很多与政治无关的外文书也被挡在了墙外,当年买的《The Google Planet》就是我在学校的一个路边小摊买的盗版书。其实我觉得这是和GFW一样的对于知识的禁锢,别说与国际接轨了,这完全就是闭关锁国。

    国际形象

    大陆的国际形象实在是太差了。很多时候在墙外根本不愿意跟别人讨论这种问题,一个防止尴尬的做法就是多谈谈旅行景点和天气。

    在台北的时候一个人带我们去见了台湾的网友,他们还带了两个西方的交换生。那个台湾的女生去过很多地方,而且在北大做过交换生。聊的过程中跟另外两个西方的交换生介(tu)绍(cao)大陆的情况,说北大宿舍很差,很难每天都洗澡,而且澡堂没有隔间,没有任何隐私,医院的护士服务态度很差之类的。幸好那两个交换生没去过大陆,对中国的情况也不了解,虽然这些东西我也想吐槽,但是秉着家丑不外扬的原则,我只好满脸黑线地跟他们解释北方缺水、习俗问题、人口问题。

    另外,在一个商店里,一个大妈指着电视里的蔡英文问我们,你们大陆能骂总统不?她做得不好的话,我们能骂蔡英文。看上去还特别骄傲,这大概是普通人对于民主的理解了吧,另一个侧面也反映了别人对中国言论控制的理解。

    总结

    我们总结了下,大陆在新中国建立之后就没有几年是不在走弯路的,50年代大跃进、60,70年代文革。就算是之后的改革开放也就只是开放了很小一块的经济,大部分的经济其实还都是受到各种限制和管控,加上很强硬的言论和出版限制,我其实并不觉得大陆走出了弯路。

    浪费了这么多时间在弯路上摸爬滚打,对岸的人均GDP都已经达到发达国家水平,而且积极融入世界经济。觉得到台湾来做一下对比还真是值得。