• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 •    下一篇

基于私有云和物理机的混合型大数据平台设计及实现

王永坤1,罗萱1,金耀辉1,2   

  1. (1.上海交通大学网络信息中心,上海 200240;2.上海交通大学光纤通信国家重点实验室,上海 200240)
  • 收稿日期:2017-09-03 修回日期:2017-10-30 出版日期:2018-02-25 发布日期:2018-02-25
  • 基金资助:

    国家自然科学基金(61371084)

A hybrid big data platform based on
private cloud VMs and bare metals

WANG Yong-kun1,LUO Xuan1,JIN Yao-hui1,2   

  1. (1.Network and Information Center,Shanghai Jiao Tong University,Shanghai 200240;
    2.State Key Laboratory of Advanced Optical Communication System and Network,
    Shanghai Jiao Tong University,Shanghai 200240,China)
  • Received:2017-09-03 Revised:2017-10-30 Online:2018-02-25 Published:2018-02-25

摘要:

大数据分析技术的广泛应用离不开大数据平台的支撑,构建大数据平台已经是很多企业和机构的重要需求。构建大数据平台需要复杂的系统性的技术,特别是需要考虑系统性能和可扩展性两方面需求。随着数据体量不断增大、用户需求不断增多,规划时的数据平台规模很可能不能满足不断变化的需求。因此,设计了一种混合的大数据平台架构:混合使用物理服务器和私有云云主机的大数据平台。这样就兼顾了性能和可扩展性:由于物理服务器性能一般要高于云上的虚拟机,所以构建在物理服务器上的大数据平台,性能一般要好于构建在私有云上大数据平台;从私有云上启动云服务器非常方便、快捷,所以大数据平台的计算和存储结点可以动态弹性地扩容到私有云上,从而保证高峰期的时候大数据平台仍然可以有充足的处理能力。在生产环境实现了这种混合型设计,在生产环境中的测试也表明了这种设计的有效性。
 

关键词: 大数据, 私有云, 大数据分析, 大数据处理, 数据平台, Hadoop, Openstack

Abstract:

The wide application of big data analysis technology cannot be separated from the support of big data platforms. Building big data platforms is an important demand of many enterprises and institutions. Building a big data platform requires sophisticated, system-wide technologies, and system performance and scalability should be considered especially. With the increasing volume of data, user needs continue to increase, and hence the scale of the planned data platform may not be able to meet the changing needs. Therefore, we design a hybrid big data platform that uses both bare metals and private cloud Virtual Machines (VM) . This takes into account performance and scalability. Because bare metals generally outperform private cloud VMs, the big data platforms built on bare metals generally perform better than the big data platforms built on private cloud VMs. It is very convenient and quick to start the cloud servers in the private cloud, so the computing and storage nodes of the big data platform can be flexibly expanded to the private cloud so as to ensure that the big data platform can still have sufficient processing capacity during the peak period. We implemented this hybrid design in a production environment. Tests in the production environment also demonstrate the effectiveness of this design.
 

Key words: big data, private cloud, big data analysis, big data processing, data platform, Hadoop, Openstack