With the advent of cloud computing, software developers were provided the opportunity to bring scale and proximity to their applications – vastly improving their performance as more and more users became connected. This presented some unique challenges to infrastructure engineers and cloud hosting environments on how to dynamically allocate compute, serverless, and storage resources on demand in locations that may be far flung or harder to reach.
The most popular web applications of today are storing ever more data, Petabytes an hour, and the need for getting that data as close to the end user as possible to improve performance is greater still. Given the article’s focus is on Ceph, we’ll be focusing today on how you may go about solving the infrastructure engineer’s challenge of providing reliable, scalable storage across vast geographically disparate locales.
What is Geo-location?
Geo-location in technology can take on many definitions, depending on the technologies being discussed. Geo-location can refer to the technology that’s used to determine a user’s location based on GPS data, IP address, or self-declared location information. Geo-location data can be used to identify where a user is connecting from, and it can also establish other more marketable metrics such as demographics and income data. Geo-location data can also be as simple as a user’s IP address when used by upstream router BGP routing tables via Anycast to route a user to the closest endpoint.
Geo-location data is important when talking about Geo-replication, as it’s the prime data source for determining user location when placing users on certain geographically distinct servers for servicing requests.
What is Geo-replication?
Geo-replication is a term that applies to the replication of user or other data used in the provisioning of a web, database, storage, or serverless application to multiple geographically distinct hosts in an internet connected network for the purpose of providing a consistent set of data to end users in multiple geographically distinct locations.
As previously discussed, in today’s hyper-connected world of cloud computing, developers are demanding more of their infrastructure – which includes bringing data processing and other application components as close to the end user as possible. The trouble with distributed and geographically disparate applications is keeping a consistent set of user data in each location such that the user experience anywhere in the globe is the same, regardless of the edge compute environment selected.
Geo-replication solves this problem for a number of application components, including databases, sessions, images, videos, and many other pieces of data that need to be accessed from anywhere the end user is accessing the application. By replicating this data using built-in utilities or custom scripts, application developers are guaranteed a consistent environment for their code, regardless of the endpoint.
Geo-replication in Ceph
Ceph, as we’ve discussed many times before, is an open source distributed storage solution offering tremendous flexibility and adaptability. Its power lies in it’s customizable nature, and it’s ability to massively scale linearly across multiple hardware platforms.
Ceph can store data in a number of ways. Objects can be stored via an HTTP API gateway, known as the RADOS Gateway. Objects can also be written directly to pools using the RBD plugin in Linux. Finally, objects can be stored in a familiar POSIX compliant filesystem called CephFS that looks and feels like
As you might expect, Ceph can be leveraged to provide data replication that would be necessary to facilitate a geo-replicated storage environment for images, documents, videos, or any other application specific data you may want to replicate, with a few caveats.
- First, CephFS is generally unsupported, due to the need for POSIX to have a consistent metadata accessible on the mount points with low latency.
- Second, for the RBD Linux client to work properly, it must be configured to leverage the RADOS gateway instead of an underlying Ceph configuration file. Writing must be consistent, and done across all nodes, and the RBD client can’t directly facilitate the geo-replication.
- Third, as the backbone of this configuration is the RADOS Gateways. Redundancy here is key.
More Questions? Speak with a consultant Today
Ceph Geo-Replication Diagram
By configuring the entire storage array as the Master Zonegroup, utilizing multiple RADOS GW servers, and multiple “zones”, both redundancy and geographical resiliency can be obtained.
Geo-replication is an important strategy for providing both redundancy and performance to applications that have global ambitions. Ceph can provide the backbone to that Geo-replication strategy for important application data, allowing you to leverage Ceph’s data reliability featureset.
AMDS Cosmos Engineers are available to assist you in the architecture, design, selection, deployment and ongoing maintenance of your Ceph cluster, or any related Linux, Windows, or storage project. With extensive experience in both vendor management and open source software, we can augment your teams existing skillset to help you grow into new technology, including Ceph’s powerful geo-replication tools.