Content found in this wiki may not reflect official Church information. See Terms of Use for more information.

What is MarkLogic?

From TechWiki
Jump to navigationJump to search

« Back to MarkLogic

Describing MarkLogic

MarkLogic is a REST application server, document database, and search engine. It is REST through and through. It is built specifically for hypertext documents, links, metadata, URIs, MIME types, and HTTP. It is schema-agnostic because it is automatically aware of the independent structure of each of its JSON and XML documents. It is search-centric because it can search for any combination of words, values, structures, and links within and across documents. It scales horizontally to hundreds of servers within and between data centers while maintaining ACID-compliant transactions.

MarkLogic has the following enterprise features:

What is REST?

REST is an architectural style that uses a uniform resource identifier (URI) and a web protocol (HTTP/HTTPS) to request and transfer a representation (MIME media type) of the state of a resource (document) at a point in time from a server to a client.

REST was coined and defined by Roy Fielding in his dissertation, Architectural Styles and the Design of Network-based Software Architectures. REST standardizes and documents the patterns used in the world wide web: self-documenting hypertext with data and metadata (HTML), stateless request/response communication protocol (HTTP/HTTPS), resource locators (URL), multiple representations per URL (MIME types), and downloading code on demand to process resources (JavaScript, CSS, etc.).

REST consists of three main concepts:

  • (RE) Representation
  • (S) State
  • (T) Transfer

Why is MarkLogic RESTful?

Representation and MarkLogic

  • A representation is a document that represents a resource. It has three requirements: MIME media type, URI, and hypertext.
  • MIME type: Each resource can be represented by one or more types of documents. The MIME media type defines the representation a client requests. A client may request the resource to be represented as JSON, XML, HTML, PNG, PDF, etc.
    • MarkLogic stores any type of document and assigns the appropriate MIME type to it. It can fully index, query, search, and process JSON, XML, and RDF documents. It meets the REST requirement of being able to transform JSON, XML, and RDF documents into any other MIME type. It knows how to execute JavaScript, XQuery, XSLT, and SPARQL documents. It knows how to transform into XHTML the content, formatting, and structure of Microsoft Word, PowerPoint, Excel, textual PDF, DocBook, and CSS documents. (It can also use Microsoft Office to create, edit, and manage content in MarkLogic.) It knows how to extract metadata and text from over 138 types of binary documents, such as raster images, vector images, videos, archive files, database files, encoded emails, presentations, spreadsheets, word-processing documents, text formats.
    • Few NoSQL databases use MIME types to identify the media type of each document. Most NoSQL databases support only one type of data and it is usually proprietary: columnar, BSON, binary, etc. Most cannot transform from one MIME type to another.
  • URI: A resource is identified by a globally unique identifier (URI). MarkLogic identifies each document with a unique URI. A document in MarkLogic is like a row in a table in a schema in a relational database. A URI is liberating. It provides random access to any resource anywhere. It is like being able to retrieve any row in any table in any schema of a relational database without having to know what table and schema the row is stored in.
    • Navigating URL hierarchy is fundamental to REST. MarkLogic understands the hierarchy within a URI, which is represented by slashes "/", such as https://www.lds.org/scriptures/ot/gen/1 MarkLogic treats the items between the slashes as folders. The URI of each document automatically places it in a folder in a folder hierarchy. A URI automatically defines the folder hierarchy. In the example URI above, the document for Genesis chapter 1, is located in the Genesis folder, which is located in the Old Testament folder, which is located in the Scriptures folder on lds.org. MarkLogic indexes the documents in each folder and its subfolders. This makes it fast and easy to retrieve any or all documents in any folder and/or its subfolders.
    • Few NoSQL databases use the URI as the primary key for their documents or data. They also don't index the URI hierarchically to filter documents by folder and subfolder.
  • Hypertext: A document should contain data that represents the resource. The data should be human readable and self-documenting, like JSON, XML, RDF, and HTML. It should be linked data (i.e. the "hyper" in "hypertext"). A document should contain metadata links about the resource, such as RDF. It should contain action links to define what further actions can be done with the resource. It should contain related links to related resources, such as images, audio, video, related documents, etc. Each related link should define what actions can be done with the referenced resource, such as download it, display a link to it, execute a command against it, etc.)
    • Hypertext or hypermedia documents must have all these features. Hyperlinks are what the "hyper" in hypertext and hypermedia refers to. You can't have REST without metadata links to define what the data means, action links to know how to work with the resource, and related links to connect resources. It should all be human readable and self-documenting so a developer does not have to read documentation to know how to interact with a REST web service and its documents.
    • MarkLogic meets all the requirements for hypertext representation. It is designed around MIME types, URIs, and Linked data. It stores documents with their MIME types as JSON, XML, RDF, HTML, etc. These documents are human-readable and self-documenting, which MarkLogic leverages to recognize and index each document's data, data structure, metadata links, action links, and related links. This makes it easy to search, query, transform, and deliver hypertext documents. MarkLogic can also store any type of binary document and deliver it as a related resource, such as an image, video, or audio. MarkLogic is designed to process simple links and RDF links using SPARQL and XInclude. MarkLogic can represent links in many formats: XPointer, RDF/XML, RDF/JSON, Turtle, N-Triples, N-Quads, and TriG.
    • No other NoSQL database natively indexes and fully processes all the document types required for REST hypertext: JSON, XML, RDF, HTML, CSS, and JavaScript.

State and MarkLogic

  • State in REST exists on clients and in documents, data, and state machine data stored in servers. It does not exist in the communications protocol or in session data. It often exists in caches.
  • Server: All information needed to process a request must be presented in the request and processed against documents in the database. State must only be in the request and in database documents: state cannot be anywhere else, such as in a session cache. The documents in the server define the state of the server. A REST server should explicitly create a state machine that defines the acceptable actions that can transacted against documents in specific contexts.
    • A REST transaction occurs at a point in time. The state of the data in the request is unchanging, but the state of the documents and state machines in the database are often changing. Since request state and database state are both required to process the request and since shifting state creates unpredictable results, a REST transaction should run at a point in time with unchanging state. Only an ACID-compliant database can ensure consistent state because it isolates each transaction from every other transaction. The only time REST does not need an ACID-compliant database is when database documents do not change, database state machines do not change, or when clients can live with the resultant level of unreliable and unpredictable results.
    • MarkLogic meets all the requirements for REST state. Its web services are stateless: there is no session cache. It is ACID compliant. It ensures each transaction occurs at a point in time and is isolated from all other transactions. This ensures consistent processing during a transaction -- even across billions of documents. MarkLogic is an MVCC database which provides transaction isolation without slowing the performance of reads -- even when documents being read are being modified simultaneously by other transactions. (Also, like any other ACID database, when multiple updates and deletes compete for the same documents, they will impact each other's performance because change has to be serialized.)
    • Most other NoSQL databases are not ACID compliant. They are only suitable for REST services when their documents or data do not change or when the rate of change is slow enough or dispersed enough that it creates an acceptable level of unreliability and unpredictability.
  • Client: A REST client, such as a web browser or spider, locally maintains transactional state, such as what to do in response to documents and result codes that are returned from server transactions. An application exists only in the client -- not in the server (although, a server may deliver application code to a client, such as when a web server downloads HTML, CSS, and JavaScript to a browser). Client application code decides when to execute web service calls and it ties the results together to accomplish its purpose. This allows multiple authorized applications to reuse web services for a variety of purposes.
    • The server helps the client know what web service calls are available by providing action links with each response. Action links are contextual and the context is based on the application account, user account, database documents, links within and between documents, the server state machine, etc. Through action links, the server can inform a client application what web service calls are permissible in any given context.
    • MarkLogic meets the needs of client applications through its built-in ability to process and send action links to clients based on context. MarkLogic supports RDF triples and SPARQL, which enables context to be defined across applications, users, document state, links between documents, server state machine, etc. MarkLogic processes triples very quickly, which enables context to scale to billions of documents, millions of users, etc.
    • MarkLogic also uses application and user permissions to filter which documents are returned to clients. MarkLogic does this automatically and transparently by adding security filtering constraints into every search and query. This ensures no account can access unauthorized documents. This is fast because all security permissions are built-into MarkLogic's indexes -- which allows document-level security to scale across billions of documents.
    • Most other NoSQL databases do not provide government-grade, document-level security and they also do not support RDF triples and SPARQL.

Because MarkLogic can provide both the web service and database in one server, it is easy to use the state of the documents and the

Transfer and MarkLogic

  • Transfer in REST is a communication protocol that enables a client to send a hypertext request to a server and receive back a hypertext response. The transfer must be stateless and be a request/response protocol. It must have human readable, self-descriptive headers. The header must contain metadata about the request, such as the requested resource URI, MIME type of the resource, action to perform on the resource (such as get, put, post, patch, and delete). HTTP (Hypertext transfer protocol) and HTTPS (secure HTTP) are designed specifically for REST (that is why they have "hypertext" in their name).
  • All MarkLogic communication is through through HTTP and HTTPS REST services (except for its SQL JDBC feature). This includes all internal cluster communication. MarkLogic provides out-of-the-box REST interfaces for manipulating resources (insert, update, delete, query, search, transform, etc.) and administering MarkLogic (REST app servers, databases, indexes, clusters, etc). MarkLogic makes it very easy to create custom REST services because everything in MarkLogic is built around REST and because they provide simple and powerful application server APIs.

MarkLogic Links