Lowering the big-data barrier
xPatterns is an application framework for building enterprise-grade, intelligent big data applications. It was conceived to make big data, structured and unstructured, more accessible to any industry for developers and the enterprise alike.
xPatterns can be deployed either on-premise or in the cloud and fully managed by Atigeo, requiring no significant IT support to start or scale. xPatterns provides an SDK for data scientists to easily configure plug-and-play components and experiment with best-in-class tools, reusing and integrating with existing assets. Data scientists can then directly deploy apps as web services or analytical jobs; transition from analysis to production is seamless.
With xPatterns the runtime environment (i.e., Hadoop, NoSQL, Search...) is completely abstracted away, allowing for faster time to market, and eliminating the need to recruit in-house technical expertise.
xPatterns technology overview
- Intelligence components
- xPatterns Relevance and domain experts (DEs)
- xPatterns Inference and cold-start prediction
- xPatterns Classification
- xPatterns Cooperative Distributed Inferencing (CDI)
- xPatterns Personas and online privacy
- xPatterns Natural Language Pre-Processing (NLP-P)
xPatterns makes it easy to combine different types of intelligence components in big data applications, including:
- Popular Python libraries such as NLTK for natural language processing, scikit for machine learning and matplotlib for visualization
- Intelligence components built by our partners such as IBM's SystemT and SystemML
- Patented innovations enabling xPatterns to deliver better results, using algorithms that are not otherwise available because they are proprietary to Atigeo
Selecting the right mix of intelligence components to use is application-specific, and often requires iterative experimentation. Where intelligence components are open source, Atigeo tests, certifies, and supports them for production use.
xPatterns Relevance is our flag ship search and document pattern discovery engine with a set of visualization research tools.
At the core of xPatterns Relevance technology is the automatic creation and dynamic maintenance of high-quality domain experts (DEs) in near-real time. These semantic ontologies can be leveraged to determine indirect semantic relationships between queried concepts and related concepts, and to facilitate understanding of the relevance of a specific document to a specific concept.
- Each domain expert is built as a Relevance Neural Network (RNN) that maps relationships between a set of terms (i.e., semantic concepts) and related terms (output layer), intermediated by context (i.e., documents or articles)
- The network weights are initialized (or bootstrapped) with statistically optimal values based on frequency statistics
- Thereafter, weights are strengthened or weakened through training by live interaction with users, as well as with new data; relevance increases with use
The figure below on the left shows a depiction of a domain expert RNN; the figure on the right is an xPatterns visualization of network relationships, showing relevant documents for a concept and relevant concepts for a document:
xPatterns Inference delivers complex predictions from a given body of evidence.
Combined with a Bayesian Model Average (BMA) approach to integrate user preferences embodied in a Bayesian network (BN), xPatterns Inference can provide higher accuracy even when collective preferences are sparse.
- Inference incorporates ontological information in the task of prediction
- This information can be captured through a representation of domain experts, thereby allowing the incorporation of unstructured information, which is particularly well-suited to cold-start prediction scenarios
Cold-start prediction describes a situation where the data sample is still small and forming, and not enough to make prediction using traditional statistics models. In the diagram below, user A provided a small set of cuisine preferences; the task is to infer user A's other preferences on cuisines not listed. The algorithm takes into account the preferences of all users and the additional relationship weightings represented by Domain Experts to infer the likelihood of user A's other preferences in cuisine. This allows us to calculate with high confidence the probability whether A likes Chinese Food even if preferences collected from the population are too small of a sample.
xPatterns Classification infers type or class from complex information.
Classification integrates structured and unstructured data into classification scenarios which may have large scales in the volume of data, the size of the input space, and the number of possible classes that may be inferred.
- Classification develops deeper understanding of unstructured data through processing natural language to decipher complex relationships
- This deeper understanding enables qualities of sentiment, time and reference to be applied to distinguishing subtly distinct classes
xPatterns CDI is a new paradigm for Inferencing and Optimal Control in real time. It is a distributed optimziation approach with built-in synchronication in a continuous optimization of all types of rules, soft and hard rules. The paradigm for inferencing converts multiple knowledge bases from exponential complexity to polynomic complexity. Then, constraints are build with a Pareto strategy that synchronize different rules to form a converging optimal result.
The potential applications of this inferencing model are significant, for example: optimizing the power grid, which has multiple knowledge bases and rules that are not fully taken into account by the outdated algorithms governing it today. This leads to local ad-hoc adjustments and empirical corrections, which are sub-optimal and result in wasted energy.
The unique xPatterns privacy model makes it possible for individual users to create, build and control their own digital "personas." These anonymous, secure profiles keep users' identities completely private while accurately reflecting their interests and behaviors in the digital landscape. In this way, it becomes possible to deliver highly relevant, personalized content and experiences to individuals without learning those individuals' actual identities; instead, only their relevance scores are visible.
- All content types are given a relevance score based on the personalized attributes of the user.
- The user profile can be initialized from existing enterprise data sources.
- Profile attributes can be dynamically updated from real-time inferred or explicit behavioral data.
- Applications can be designed to give consumers full management of their personas.
- Persona attributes are unstructured, meaning they don't have to be selected from static lists.
xPatterns has a set of healthcare-specific natural language processing components, built on top of existing open source projects, multiple ontologies and proprietary intelligent software. The pre-processing pipeline, which can be applied to any domain, consists of context/section recognizers, body and sentence extraction, negation tagging, normalization, lemmatization and context-specific removal of stop words. This is used to improve overall relevance of xPatterns, both at the time of corpora ingestion & index generation as well as at query time.