The Big Data Challenge


Digital information is superabundant, increasing tenfold every five years and within the enterprise, 85% of all data is unstructured and growing exponentially. In tandem, the era of Web 2.0 is yielding large volumes of unstructured data feeds external to the enterprise (social media, blogs, voicemail transcripts, call center notes, websites, text docs, etc.). As the amount of raw information and number of systems increase traditional business intelligence, index search and natural language processing-based data analytics methods become less effective.

Traditional approaches for drawing understanding from unstructured data do so by creating explicit semantic mappings such as ontologies, dictionaries, and semantic rule sets, intended to bridge the gap between queried concepts and related concepts or content within the same semantic domain.

The challenge with traditional approaches is that semantic relationships need to be known in order to be coded and referenced. Static ontologies are time-consuming and expensive to develop, and are in constant need of maintenance (which is in turn time-consuming and expensive).

More sophisticated systems are able to step beyond attribute-matching to exploit ontologies: graphically-expressed relationships between attributes or attribute sets. These allow relationships that are understood ("my sister is my son's aunt", "the Nexus One is a Google, Android-based cellular phone).

However, the underlying semantics of some domains evolve rapidly and "local terminologies" often develop for the same concepts that subsequently need to be reconciled. In addition, traditional ontology-based approaches require explicit, manual, annotated definition of all relationships among concepts in a domain as shown in Figure 1.

ontology example
Figure 1: Traditional Ontology

xPatterns takes a "relevance discovery" approach that delivers on the promise of deriving actionable intelligence from an enterprise's disparate sources of structured and unstructured data.

The xPatterns Difference


xPatterns is a contextual semantic search platform that learns through the application of domain experts. The domain expert is built as a Relevance Neural Network (RNN) that maps relationships between a set of terms (i.e., semantic concepts) and related terms (output layer), intermediated by context (i.e., documents or articles) as shown in Figure 2.

Domain Expert RNN
Figure 2: Relevance Neural Network (RNN)

The network weights are initialized (or bootstrapped) with statistically optimal values based on frequency statistics. Thereafter, the weights are strengthened or weakened through training by live interaction with users as well as new data. This learning capability enables improved relevance by leveraging the wisdom of the crowds. The query flow process is shown in Figure 3:

query flow
Figure 3: xPatterns Query Flow Process

Semantic


xPatterns is based on a ground-breaking approach to automated creation and dynamic maintenance of "Domain Experts", which are akin to dynamic, non-hierarchical ontologies. A Domain Expert (or DE) captures and represents relationships between concepts within a given domain. DEs are created automatically from analyzing and processing large bodies of unstructured text information about the domain. DEs can be leveraged to determine indirect semantic relationships between queried concepts and related concepts, and to facilitate understanding of the relevance of a specific document to a specific concept. As a truly semantic platform, xPatterns:

  • Automatically creates and dynamically maintains semantic ontologies known as Domain Experts (DEs). DEs represent "IsAssociatedWith" relationships for domains, derived simply from reading and reviewing large bodies of unstructured text information about a given area of interest;
  • Determines indirect semantic relationships between queried concepts and relevant documents.
  • Leverages existing ontologies where they are available and applicable to building DEs.

DE-visualization

Multilingual System


xPatterns' capabilities are language-agnostic. Using xPatterns, text and even symbolic content can be analyzed in native form without translation (xPatterns has been used to implement many-to-many applications with input and output in multiple languages including English, Spanish, and non-Latin character set languages such as Mandarin and Farsi). The solution can ingest a broad array of data in a variety of languages, unencumbered by linguistic or character set boundaries.

Contextual


With the unbiased nature of "IsAssociatedWith" relationships in a DE, reasoning can be achieved by using contextual information. The approach generally induces a large and widely-connected graph associating concepts which can be queried by defining a list of concepts that contextualize the query such as:

  • The target document set referred to as the Content Corpus from among which xPatterns finds the most relevant documents, ranked by their corresponding relevance score.
  • The user profile can be initialized from existing enterprise data sources
  • The domain or area of knowledge for a given query (e.g., Medicine, Sports, Movies, etc.) referred to as the Domain Expert.
  • The geo-location, role, and/or intent of the person (i.e., user) who makes the query (or on behalf of whom it is made).

Our unique privacy model provides the enterprise and its partners the ability to leverage customer data without ever exposing user profile attributes to a partner ecosystem. Privacy-controlled querying for relevant and personalized content for specific users empowers enterprises to provide highly relevant content and experiences.

Personalization with Personas


Additionally, with its unique ability to leverage contextual attributes, xPatterns enables delivery of highly-relevant personalized experiences via the persona module. End-user preferences or customer data attributes (observed, inferred, or explicit) can be leveraged in situations where there is no direct match for a given query, but there is a match based on indirect (i.e., semantic) relationships as mediated by xPatterns. The persona module is characterized as follows:

  • All content types are given a relevance score based on the personalized attributes of the user
  • The user profile can be initialized from existing enterprise data sources
  • Profile attributes can be dynamically updated from real-time implicit/behavioral data
  • Applications may be designed to give consumers full management of their personas
  • Persona attributes are unstructured: i.e., they need not be selected from static lists

Furthermore, the unique xPatterns privacy model provides enterprises the ability to leverage customer data on behalf of their partner ecosystems without ever exposing the private profile attributes to any partner. Privacy-controlled querying for relevant and personalized content empowers enterprises to provide highly-relevant content and experiences.

 

xPaterns Persona
Figure 4: xPatterns Persona Privacy Model

Search that Learns


At its core, xPatterns is based on Relevance Neural Networks (RNNs) that can learn. Learning happens in a variety of instances and scenarios:

  • Bootstrapping: As xPatterns analyzes and processes the initial set of documents for a given domain, the RNNs are initialized with a starting set of weights, representing the strength of relationships between concepts and documents, and between concepts and other concepts
  • Interaction with new content: As new content is introduced (as is invariably the case in almost all fields of knowledge), internal weights are modified to accommodate new terms and new relationships
  • Interaction with end-users:: When usage-based learning is enabled, during runtime, as users make selections from among the result sets returned by xPatterns, the relationships between the query terms, the expanded terms (generated by the DE) and the selected result are strengthen, thus enabling the gradual refinement of the RNNs by means of crowdsourcing
  • DE-CT (Domain Expert Crowdsourcing Tool): DE-CT enables an injection of usage-based learning by reordering network weights explicitly to reflect expert opinion, or an infusion of crowd opinion (whereas usage-based learning is gradual, DE-CT is designed to affect DE status instantly)

xPatterns Logical Architecture


Figure 5 (shown below) provides a high-level view of the logical architecture of xPatterns. xPatterns is accessible via a rich set of RESTful web services APIs, which provide applications full access to the platform's runtime functionalities and features. The back office consists of Domain Experts (DEs) and Content Corpuses - both of which are RNNs-as well as the xPersona data stores. The corpus management layer includes a set of tools and portals for creation and maintenance of the corpuses. Additionally, corpus management APIs provide the option of runtime management of internal data.

logical architecture
Figure 5: xPatterns Logical Architecture

xPatterns is a highly performant platform designed for real-time responsiveness, massive data throughput, and true scalability. xPatterns is built on top of the following components of the state-of-the-art, open source technology stack from the Apache Software Foundation:

  • Lucene/Solr
    • RNNs built as "Adaptable" extension of Lucene Text Search Core
    • High Performance; Feature Rich (faceting/filtering/geospatial); Distributed
  • Hadoop
    • Very large-scale, distributed, parallel, scalable batch processing infrastructure
    • Used by xPatterns for heavy-duty computation: indexing, analytics, training
  • Cassandra
    • NoSQL database linearly scalable to large proportions without loss of performance
    • Used by xPatterns for storing corpuses and collecting logs across the cluster
  • Tomcat
    • Scalable web service used for exposing xPatterns RESTful APIs, and Admin Portal
  • Spring AOP (Aspect-Oriented Programming), Spring DI (Dependency Injection)
    • Modularization of crosscutting concerns
    • Lightweight container for DI
  • Active MQ, Zookeeper
    • Scalable, Fault-tolerant coordination of offline processes

Third-party applications can be easily deployed on xPatterns through web-services to immediately enrich data insight through the self-discovery of patterns and concepts in real-time. xPatterns software can be delivered as Software-as-a-Service or On-Premise.

xPatterns Value Proposition


xPatterns is a fundamentally differentiated approach to semantic technology and analysis. The differentiation is based on its proprietary technology underpinnings, as well as its highly performant, highly scalable architecture. Below is a summary of the xPatterns differentiation dimensions:

  • Automated
    • Neural net-based machine learning
    • Draws knowledge from documents
    • Domain Experts (i.e., xPatterns ontologies) are built in a matter of minutes or hours
  • Knowledgeable
    • Self-constructing ontologies
    • Unbiased, non-hierarchical representation of knowledge
  • Adaptive
    • Auto-adapts through exposure to new data
    • Auto-adapts through exposure to user interaction (crowdsourcing)
    • Trainable via an injection of training (DE-CT: DE Crowdsourcing Toolset)
  • Big Data Solution
    • The more unstructured data, the better its pattern-matching capabilities
    • Incorporates/combines the best of structured and unstructured data
    • Sentiment analysis, agent analysis in social media
  • Relevance Ranking
    • Results are ranked based on relevance score
  • Contextual Relevance
    • Relevance is shaped by context, including that of the individual
    • The spatiotemporal dimension of information takes into account (i.e., gives higher relevance to) what just happened, and where it happened
  • Inference Engine
    • Predictions made not only based on extrapolative methods, but on representation of insight from influencers, and/or experts
  • Expandable Technology Platform
    • Basic foundational module: xRelevance
    • Additional components: xPersona, xInference, xSpatial, xThemes, xAgent
  • Multilingual
    • Can leverage dictionaries to enable a many-to-many multilingual applications where the query is in multiple languages, and the results are likewise drawn from many language
    • Language- and character set-agnostic
    • Fully multilingual applications based on brand new languages can be built in a matter of days
  • Performant
    • Can deal with massive amounts of data
    • Fast, scalable
    • built on state-of-the-art technology open source stack
  • Platform Offering
    • xPatterns is available through fully-documented RESTful web services
    • Available as SaaS, as well as installable appliance
    • Application development is fast, requiring little consulting