Making Big Data + Analytics Simple

Making Big Data + Analytics Simple

Loading
Loading Social Plug-ins...
Language: English
Save to myLibrary Download PDF
Go to Page # Page of 107

Description: The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.

The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

 
Author: Charlie Berger  | Visits: 415 | Page Views: 637
Domain:  Business Category: Management 
Upload Date:
Link Back:
Short URL: https://www.wesrch.com/business/pdfBU1H5H000OHYF
Loading
Loading...



px *        px *

* Default width and height in pixels. Change it to your required dimensions.

 
Contents:
Oracle’s Advanced Analytics
Making Big Data + Analytics Simple
Charlie Berger, MS Engineering, MBA
Sr. Director Product Management, Data Mining and Advanced Analytics
charlie.berger@oracle.com www.twitter.com/CharlieDataMine

Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.

Copyright © 2016 Oracle and/or its affiliates. All rights reserved.

2

Predictive Analytics 101
• Data, data everywhere – explosive growth
• Growth of data exponentially greater than
growth of data analysts!

Machine Learning/Data Analysis
platforms requirements:
• Be extremely powerful and
handle large data volumes
• Be easy to learn
• Be highly automated & enable
deployment
http://www.delphianalytics.net/more-data-than-analysts-the-real-big-data-problem/
http://uk.emc.com/collateral/analyst-reports/ar-the-economist-data-data-everywhere.pdf
Copyright © 2016 Oracle and/or its affiliates. All rights reserved.

Machine Learning/Analytics + Data Warehouse + Hadoop
• Platform Sprawl
– More Duplicated Data
– More Data Movement Latency
– More Security challenges
– More Duplicated Storage
– More Duplicated Backups
– More Duplicated Systems
– More Space and Power

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Vision
• Big Data + Machine Learning/Analytics Platform for the Era of Big
Data and Cloud
–Make Big Data + ML/Analytics Model Discovery Simple
• Any data size, on any computer infrastructure—on-premise and/or cloud
• Any variety of data (structured, unstructured, transactional, geospatial), in any
combination
–Make Big Data + ML/Analytics Model Deployment Simple
• As a service, as a platform, as an application
• On-premise and/or cloud

Copyright © 2016 Oracle and/or its affiliates. All rights reserved.

Oracle Cloud +
Advanced Analytics
Oracle Database
12c

5

What is Machine Learning, Data Mining & Predictive
Analytics?
Automatically sifting through large amounts of data to
create models that find previously hidden patterns,
discover valuable new insights and make predictions
•Identify most important factor (Attribute Importance)
•Predict customer behavior (Classification)
•Predict or estimate a value (Regression)
•Find profiles of targeted people or items (Decision Trees)
•Segment a population (Clustering)
•Find fraudulent or “rare events” (Anomaly Detection)
•Determine co-occurring items in a “baskets” (Associations)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

A1 A2 A3 A4 A5 A6 A7

Machine Learning, Predictive Analytics & Data Mining
Typical Use Cases
• Targeting the right customer with the right offer
• How is a customer likely to respond to an offer?

• Finding the most profitable growth opportunities
• Finding and preventing customer churn
• Maximizing cross-business impact
• Security and suspicious activity detection
• Understanding sentiments in customer conversations

• Reducing medical errors & improving quality of health
• Understanding influencers in social networks

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle’s Advanced Analytics
Fastest Way to Deliver Scalable Enterprise-wide Predictive Analytics
Key Features
 Parallel, scalable data mining algorithms
and R integration
 In-Database + Hadoop—Don’t move the
data
 Data analysts, data scientists &
developers
 Drag and drop workflow, R and SQL APIs
 Extends data management into powerful
advanced/predictive analytics platform
 Enables enterprise predictive analytics
deployment + applications
Copyright © 2016 Oracle and/or its affiliates. All rights reserved.

Oracle’s Advanced Analytics
Fastest Way to Deliver Scalable Enterprise-wide Predictive Analytics
Key Features
 Parallel, scalable data mining algorithms
and R integration
 In-Database + Hadoop—Don’t move the
data
 Data analysts, data scientists &
developers
 Drag and drop workflow, R and SQL APIs
 Extends data management into powerful
advanced/predictive analytics platform
 Enables enterprise predictive analytics
deployment + applications

Don’t move data; Data is LARGE
Move the algorithms instead
Copyright © 2016 Oracle and/or its affiliates. All rights reserved.

Google “Oracle Advanced Analytics”

Copyright © 2016 Oracle and/or its affiliates. All rights reserved.

10

Oracle Advanced Analytics Database Evolution
Analytical SQL in the Database

• New algorithms (EM,
PCA, SVD)
• Predictive Queries
• SQLDEV/Oracle Data
Miner 4.0 SQL script
• ODM 11g & 11gR2 adds
generation and SQL
AutoDataPrep (ADP), text Query node (R integration)
mining, perf. improvementsOAA/ORE 1.3 + 1.4

• SQLDEV/Oracle Data Miner adds NN, Stepwise,
• Oracle Data Mining
3.2 “work flow” GUI
scalable R algorithms
10gR2 SQL - 7 new
launched
• Oracle Adv. Analytics
SQL dm algorithms • Integration with “R” and for Hadoop Connector
• Oracle Data Mining and new Oracle Data introduction/addition of
launched with
• Oracle acquires
Miner “Classic”
9.2i launched – 2
Oracle R Enterprise
scalable BDA
Thinking Machine
wizards driven GUI
algorithms (NB
• Product renamed “Oracle algorithms
Corp’s dev. team +
and AR) via Java • SQL statistical
• 7 Data Mining “Darwin” data
Advanced Analytics (ODM +
functions introduced ORE)
API
“Partners”
mining software

1998

1999

2002

2004

2005

2008

Copyright © 2016 Oracle and/or its affiliates. All rights reserved.

2011

2014

Oracle’s Advanced Analytics
Fastest Way to Deliver Scalable Enterprise-wide ML/Predictive Analytics
Traditional Analytics

Major Benefits
 Data remains in Database & Hadoop
 Model building and scoring occur in-database

 Use R packages with data-parallel invocations

 Leverage investment in Oracle IT
 Eliminate data duplication

 Eliminate separate analytical servers

 Deliver enterprise-wide applications

Oracle Advanced Analytics

Data Import
Data Mining
Model “Scoring”

Data Prep. &
Transformation

avings

Data Mining
Model Building
Data Prep &
Transformation

 GUI for ML/Predictive Analytics & code gen
 R interface leverages database as HPC engine

Data Extraction

Model “Scoring”
Embedded Data Prep
Model Building
Data Preparation

Hours, Days or Weeks
Copyright © 2016 Oracle and/or its affiliates. All rights reserved.

Secs, Mins or Hours

Oracle’s Advanced Analytics (Machine Learning Platform)

Multiple interfaces across platforms — SQL, R, GUI, Dashboards, Apps
Information Producers

Users

R programmers

R Client

Information Consumers

Data & Business Analysts Business Analysts/Mgrs

SQL Developer/
Oracle Data Miner

OBIEE

Domain End Users

Applications

Platform

Hadoop

HQL

R

ORAAH
Parallel,
distributed
algorithms

Oracle Database Enterprise Edition
Oracle Advanced Analytics - Database Option
SQL Data Mining, ML & Analytic Functions + R Integration
for Scalable, Distributed, Parallel in-Database ML Execution
Oracle Cloud
Copyright © 2016 Oracle and/or its affiliates. All rights reserved.

Oracle Database
12c

Oracle Advanced Analytics Database Option
Wide Range of In-Database Data Mining and Statistical Functions
• Data Understanding & Visualization







Summary & Descriptive Statistics
Histograms, scatter plots, box plots, bar charts
R graphics: 3-D plots, link plots, special R graph types
Cross tabulations
Tests for Correlations (t-test, Pearson’s, ANOVA)
Selected Base SAS equivalents

• Data Selection, Preparation and Transformations








Joins, Tables, Views, Data Selection, Data Filter, SQL time windows, Multiple schemas
Sampling techniques
Re-coding, Missing values
Aggregations
Spatial data
SQL Patterns
R to SQL transparency and push down

• Classification Models






Logistic Regression (GLM)
Naive Bayes
Decision Trees
Support Vector Machines (SVM)
Neural Networks (NNs)

• Regression Models
















* included free in every Oracle Database

Attribute Importance (Minimum Description Length)
Principal Components Analysis (PCA)
Non-negative Matrix Factorization
Singular Vector Decomposition

Text Mining




A Priori algorithm

Feature Selection and Reduction




Special case Support Vector Machine (1-Class SVM)

Associations / Market Basket Analysis




Hierarchical K-means
Orthogonal Partitioning
Expectation Maximization

Anomaly Detection


Most OAA algorithms support unstructured data (i.e. customer comments,
email, abstracts, etc.)

Transactional & Spatial Data




Multiple Regression (GLM)
Support Vector Machines

Clustering

All OAA algorithms support transactional data (i.e. purchase transactions,
repeated measures over time, distances from location, time spent in area A,
B, C, etc.)

R packages—ability to run open source


Broad range of R CRAN packages can be run as part of database process via R
to SQL transparency and/or via Embedded R mode

Copyright © 2016 Oracle and/or its affiliates. All rights reserved.

You Can Think of Oracle Advanced Analytics Like This…
Traditional SQL

SQL Statistical Functions - SQL &

– “Human-driven” queries
– Domain expertise
– Any “rules” must be defined and
managed

SQL Queries
– SELECT
– DISTINCT

– Automated knowledge discovery, model
building and deployment
– Domain expertise to assemble the “right”
data to mine/analyze

+

Statistical SQL “Verbs”
– MEAN, STDEV

– MEDIAN

– AGGREGATE

– SUMMARY

– WHERE

– CORRELATE

– AND OR

– FIT

– GROUP BY

– COMPARE

– ORDER BY

– ANOVA

– RANK
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

FREE!

In-Database Statistical Functions (SQL)
Independent Samples T-Test
• A/B offer testing
– Query compares the mean of AMOUNT_SOLD
between
MEN and WOMEN Grouped By
CUST_INCOME_LEVEL ranges
– Returns observed t value and its related twosided significance (