What is the Actual Performance of HANA?





I covered the topic of the actual
performance of HANA versus competitive databases in the article Which if
Faster, HANA or Oracle 12C
. In this article I will cover the various
database benchmarks on HANA and its competitors in more detail.


Performs Benchmark Testing in Databases?

The first thing to establish is
that there is no independent body – such as a Consumer Reports for database
benchmarking. This means that vendors performed the benchmarks that I reviewed.
This is obviously a major issue.

Let us enumerate the problems
with having no independent source for benchmarking as it relates to databases.

A vendor would never release a
benchmark, which showed it as losing to a competing vendor across the board.
The result of the benchmark would have to be positive for the vendor in some
dimension, and more positive than negative for the results to be released. This
brings up the issue that pharmaceutical companies drug testing shows that
negative studies tend to go unpublished. “…studies about antidepressants made
the drugs appear to work much better than they really did. Of 74 antidepressant
studies registered with the FDA, 37 studies that showed positive results ended
up being published. By contrast, studies that showed iffy or negative results
mostly ended up going unpublished or had their data distorted to appear
positive, Turner found. The missing or skewed studies helped create the
impression that 94 percent of antidepressant trials had produced positive
results, according to Turner's analysis, published in the New England Journal
of Medicine. In reality, all the studies together showed just 51-percent
positive results.” For instance, a past analysis of clinical trials supporting
new drugs approved by the FDA showed that just 43 percent of more than 900
trials on 90 new drugs ended up being published. In other words, about 60
percent of the related studies remained unpublished even five years after the
FDA had approved the drugs for market. That meant physicians were prescribing
the drugs and patients were taking them without full knowledge of how well the
treatments worked. - LifeScience. We will address this topic directly as it
appears that SAP is doing the same thing with its OLTP benchmarks for HANA.

Familiarity Bias:
Because a vendor will always have more skills in their solutions than in a
competitor solution and because databases can be “tuned up” and because of
differences in hardware that is selected as well as a number of other
differences, even if a vendor were 100% above board, they would still tend to
observe better performance in their solution than a competing solution.  

The vendors spare no expense in
hardware for these tests. The customers will often purchase hardware that is
lower in its specification than that used by the vendor.

Laboratory Environment Bias
: The hardware and database is run in a “lab” environment. It has no
other batch jobs
pulling its resources – which are of course unrealistic.
Therefore the performance of the benchmark would normally not be attainable in
a production setting. I see the benchmark results are more comparable between
different benchmarks than between the benchmark and a production environment.

Every benchmark paper I looked
at had one clear purpose. That was to improve the sales of the product
benchmarked by the vendor that wrote the paper.

The benchmarks that are released
are then viewed through the prism of bias. That is, people that have an
incentive to prefer a particular software vendor. One entity that has published
inaccurate information about benchmarks that have been right in line with their
financial bias has been the consulting firm Bluefin, which is overall one of
the least reliable providers of information on HANA.


Benchmark Tests

The following benchmarks were
reviewed that were performed for these databases.

OLTP Benchmark
This is a benchmark for transaction processing. So things that ERP systems tend
to do the most like recording journal entries, decrementing inventory when performing
a goods issue, etc..

BW-EML (Business Warehouse Mixed Workload) Benchmark
: This is an analytics benchmark.


Missing Benchmarks

For years SAP would release an
OLTP benchmark for databases. However with HANA, SAP stopped releasing this
benchmark. Database design would predict that HANA would perform poorly in this
benchmark and this is the most likely reason why SAP never produced this
benchmark. However, the consulting firm Bluefin has the following way of
covering this up:

SAP HANA platform was designed to be a data platform on which to build the
business applications of the future. One of the interesting impacts of this is
that the benchmarks of the past (e.g. Sales/Distribution) were not the right
metric by which to measure SAP HANA.” – Behind the SAP BW EML Benchmark

At no point in this article by
John Appleby does he declare the fact that he has a quota, or leads a group with
a quota to sell HANA. John Appleby presents himself as if he is some
disinterested third party. So that is problem number one. But the second
problem is that Appleby is speaking what amounts to gibberish in this

S4 has a Sales module.

 This sales module will be
performing the same functions as the current ECC SD module. Will there be
analytics involved in the Sales module? Of course. However, there will also be
transactions or OLTP performed.

S4 Sales will record sales
orders, update sales orders, etc…

Therefore it is demonstrably
that an OLTP benchmark is now irrelevant because “the platform was
designed to be a data platform on which to build business applications of the
future.” That sentence is just a straight up lie, and it’s hard to twist
oneself up into a pretzel to try to defend it. The person seems to be preparing
to run for political office.

Appleby’s interpretation of the
BW-EML benchmark contains other nonsense like

configuration used by published results is the stock installation…there are not
performance constructs like additional indexes or aggregates in use.”

The reason this is nonsensical is
that column-oriented databases don’t use indexes. They don’t need them. Why
Appleby is impressed by this is a head scratcher. How many times has it been
established that the primary reason for the reduction in the size of the
database footprint is due to the removal of indexes? If so, and if this is
widely accepted, why is it surprising to Appleby that the BW-EML benchmark for
a column-oriented database does not have indexes???

On the topic of aggregates, HANA
does use aggregates, but does not call them aggregates. So what Appleby is
saying there is incorrect. Although there are fewer aggregates. Hasso Plattner
has had an obsession with eliminating aggregates for some time and he rails
against aggregates in his articles and his books, but in many cases aggregates
are beneficial. Unlike what Hasso Plattner states, not everything needs to be
constantly recalculated. And not everything needs to be recalculated every time
it is accessed. This is just a waste of processing cycles. Let us take an
example. Lets say we want to see a report of all the sales orders that a
company has processed for the past 3 months. This report was processed and
aggregated along different dimensional attributes yesterday. Under Hasso
Plattner’s logic this aggregate is worthless because it is pre-calculated.
However let us look at that statement in detail.

Let us say that the aggregate was
calculated yesterday exactly 24 hours prior.

1 day is roughly 1/90th of a 3
month period.

If we look back 90 days in the
report, we would show say 100,000 sales orders. That is an average of 1111
sales orders per day (yes, weekends would be less than work days, but as a
average 1111 sales orders)

Now let us say that the day that
drops off if we run the report anew had 1500 sales orders created (so a high
day). And let us say that the day that was added, which is yesterday plus the
hours up until the present hour are 700 sales orders (a low day).

So instead of looking at 100,000
sales orders, we are now looking at 99,200 sales orders. 1500 - 700 = 800
fewer sales orders. That is a change of .08% in the number of sales orders. Is
that a real problem? Is the last 24 hours more representative than the 24 hour
period from 3 months ago? Probably not. But if it is, how much more should the
company be willing to spend to get rid of all aggregates? And are there other
investments that might be a better use of that money?

There are an unlimited number of
scenarios that could be imagined to determine the importance of the removal of
aggregates. For instance, if just two days of sales orders were reviewed, then
the company would receive a much larger variation. However generally, the needs
for instantly recalculated information are greatly overestimated in vendor
marketing documentation and in analytics vendor documentation in particular. I
have a future article I am preparing which describes the testing of a long held
belief that forecasting information must be frequently updated with the most
recent sales history to obtain the highest forecast accuracy. I have been
testing actual client data, and from a client with difficult to forecast sales
history, and will show that as with the tests I performed at previous clients,
this is actually not all that important and contributes little to forecast

So, while there can be scenarios
where getting the most up to date information is critical, SAP tends to take
these few scenarios and generalize them to the be "normal," when in
fact they tend to be the exceptions. Hasso Plattner has a way of presenting
things that are often quite grey as black or white. And of course, all of Hasso
Plattner’s examples have the peculiar and consistent outcome of handing over
more money to SAP. I don't make more money if I can exaggerate the way that
Hasso Plattner does, and therefore his proposals tend to come off as sales
fluff...at least to me.


for The Improved Analytical Performance of Column Oriented Databases

I found this quotation from IDC
to be a very good explanation of my column-oriented databases is so effective
for analytics.

established approach to setting up a query/reporting database (ODS, data mart,
data warehouse) has involved establishing indexes for all columns that might
have value lookup operations in the queries. Many organizations now use
columnar databases, which have the same relational characteristics as
row-oriented databases but store the data in blocks of column rather than row
data for speed of retrieval. This obviates the need for indexes and, in some
cases, for cubes and materialized views." - IDC

live data is to be queried and updated at the same time, the queries must be
very fast in order to avoid consuming resources on the database server and
slowing down transactions. A number of vendors have created database
technologies that optimize query performance by combining two key elements:
query-optimized columnar organization for the data and memory-optimized
database operations. In the case discussed here, however, there is an
additional challenge, which is to maintain that data in a form that also
supports a high-performance transactional database." – IDC

In-Memory leverages a unique “dual-format” architecture that enables tables to
be in memory simultaneously in a traditional row format and a new in-memory
column format. The Oracle SQL Optimizer automatically routes analytic queries
to the column format and OLTP queries to the row format, transparently
delivering best-of-both-worlds performance. Oracle Database 12
c automatically maintains full
transactional consistency between the row and the column formats, just as it
maintains consistency between tables and indexes today.

·     Access
only the columns that are needed.

·     Scan
and filter data in a compressed format.

·     Prune
out any unnecessary data within each column.

·     Use
SIMD to apply filter predicates.” – Oracle

However, this does not mean, and
Oracle is not implying that a column-oriented database is better for
applications outside of analytics. And as far as I can determine from reading
the perspective of different database vendors on this topic, SAP is the only
database vendor that proposes that a column oriented design is better for all
types of applications.


of the Results

For instance in the Oracle
benchmark paper released in 2015, the benchmark was tested on hardware similar
to what SAP used in its BW-EML benchmark, but leaves out the topic of how many
customers would use this hardware configuration. I don’t know myself as I have
not recorded the hardware specification of many clients, but the hardware used
by Oracle appeared quite advanced. At one point SAP’s benchmark used a machine
with 1536 GB of RAM. I have personally never heard of this much RAM being used
on a server at any account that I have worked on. It probably exists as there
are very advanced companies out there doing scientific computing. But it is a
small number. At one point Oracle points out that the monster machine used by
SAP beat Oracle’s BW-EML benchmark, but needed 3 times the amount of memory to
do this. Things bring up the question of whether SAP’s hardware was simply
reengineered to beat the Oracle benchmark. So did SAP first try the machine
with 1000 GB of RAM, and then add 200 GB or RAM and then test again, and then
add another 200 and test again, etc until it finally beat the Oracle score? In
another benchmark SAP installed 100 IBM servers in a SAP HANA cluster.
Furthermore, if no one outside of the NSA, Amazon AWS (which resells portions
of its hardware over the cloud) or a scientific computing center will be
willing to buy this size of hardware how relevant are these benchmarks to the
majority of HANA customers?


Impact of Marketing on SAP Benchmarking

SAP needs to get marketing out of
the process of releasing benchmarking information. In the benchmark publication
SAP HANA Performance: Efficient Speed and Scale-Out for Real-Time Business
Intelligence, I don’t need to see a cover plastered with stock photograph
imagery of a man pulling a “fly” snowboarding maneuver, and then an image of a
bunch of men rowing together, along with a marketing written introduction that
uses a word salad of terms like NetWeaver components. This should be a
scientific paper that is not word-smithed and couched in the deceptive
marketing language. SAP marketing must acknowledge that not every paper
produced by SAP needs have their fingerprints on it. This is the type of BS
writing that I am referring to.

drill-down queries (276 to 483 milliseconds) demonstrate SAP HANA’s aggressive
support for ad hoc joins and, therefore, to provide unrestricted ability for
users to “slice and dice” with- out having to first involve the technical staff
to provide indexes to support it (as would be the case with a conventional

Please do not use the term “slice
and dice” in a technical paper, or the term “unrestricted,” or the colorful
“HANA’s aggressive support.” This is not scientific terminology. SAP’s
benchmarking paper needs to be completely rewritten just using the original
data. Then at the end SAP has quotations like the following:

have seen massive system speed improvements and increased ability to analyze
the most detailed levels of customers and products.” – Colgate Palmolive

So this is an anecdote, and it
sounds like it was written by Donald Trump (except it use the word massive
instead of tremendous.) What is an anecdote doing in a benchmarking study!?
Does SAP Marketing have any idea of what a study like this is actually supposed
to contain?



When one compares what Bill
McDermott, Hasso Plattner, SAP marketing, Bluefin, Deloitte and others say
about the game changing aspects of HANA to the technical benchmarks there is
absolutely no correspondence. SAP invests comparatively little in benchmarking,
but its marketing spending on HANA is off the charts. This is reminiscent to
pharmaceutical companies. Pharmaceutical companies spend far more on marketing
than research, and the research is mostly just running clinical trials, which
is based upon research that is performed by universities and is publicly
funded. I call this the illusion of innovation.

Oracle has provided compelling
evidence that its 12c database outperforms SAP HANA. I say this acknowledging
the fact that there is no independent body that performs database benchmarking.
Oracle invests much more into database benchmarking and its benchmarking
studies are more transparent and make the case far better than SAP’s. For all
of the talk of HANA’s performance, SAP produces a single benchmark to support
these supposed claims of superiority over Oracle 12c and others. While we do
not have independent verification, sifting through the results it seems more
likely than not that Oracle 12c is not only a little bit faster, but far faster
than HANA. And secondly, while SAP has placed speed as the first priority in
the design of its database, Oracle’s orientation is far more holistic, placing
reliability first. Secondly, given 12c’s design, it will almost certainly
easily beat HANA for OLTP processing.

This article did not review the
benchmarking of other database vendors. However, I find it more likely than not
that vendors like Teradata, given the database talent that they have, not also
have a solution that is superior to HANA in performance. And the list of other
database vendors that can also beat HANA is likely more than just Oracle and

I was not paid or otherwise
compensated by any vendor or other entity to write this article.


