Statistician's new role as a detective--testing data for fraud.

AutorEly Kossovsky, Alex
CargoTexto en ingl
Páginas179(22)
  1. INTRODUCTION

    Recent statistical discoveries allow the statistician to utilize known digital patterns in typical data to detect fraud. Previously, the task of the statistician was to analyze and summarize data, show patterns, and make predictions, but never to decide on the authenticity of the provided data itself. Data provided to the statistician was traditionally always taken as a given without an ability to authenticate. On the other hand, there is always a very strong need on the part of Tax Authorities worldwide and accounting companies to obtain professional statistical advice as to how to detect fake data. Data could be faked to reduce and to under-report revenues in order to pay less tax, as well as to inflate revenues at times in order to impress investors and present the company as being financially healthy. The enormous amount of tax money saved per year for various governments worldwide regularly via forensic digital analysis utilizing Benford's Law can't be underestimated, it is huge. Similar benefits in savings and discontinuation of on-going fraud schemes within companies by insiders, fraudulent treasurers, and financial officers are also extremely valuable, and are achieved by the same digital techniques.

    The technique relies on Benford's Law, a statistical law referring to the consistent and predictable relative proportions of digits occurring in typical real-life data, stating that low digits such as 1, 2, and 3 are much more frequent than high ones such as 7, 8, and 9. For example, numbers whose first digit on the left is 1 are very common, occurring in about 30.1% of values, while numbers beginning with digit 9 are relatively rare, occurring only about 4.6% of values. The main reason that this forensic test is at all possible springs from the fact that cheaters inventing fake data almost always erroneously write them with all digits having about the same proportion due to the mistaken intuition that all digits have equal chance of occurrence. By comparing theoretical Benford digit distribution to the actual digit distribution within the accounting data provided by companies, the statistician can decide if data appear suspicious or normal. During the past 15 years most Tax Revenue Departments of governments worldwide as well as large accounting and auditing companies have adopted these digital forensic tests as their standard procedures, and run them on a regular basis. The results of this revolutionary new technique has been a great increase in the revenue of tax collection money, as well as numerous cases of fraud that have been detected, stopped, and prevented from further exploiting or ruining financially healthy companies.

  2. THE LEADING DIGITS PHENOMENA

    Leading digits (LD) or first significant digits are the first digits of numbers appearing on the left. Such a digit is called "the leader" of the number because all other digits follow it. For 567.34 the leading digit is 5. For 0.0367 the leading digit is 3, as we discard the zeros. For the lone integer 6 the leading digit is 6. For negative numbers we simply discard the sign.

    613 ------> digit 6 0.0002867 ------> digit 2 7 ------> digit 7 -7 ------> digit 7 1,653,832 ------> digit 1 -0.456398 ------> digit 4 The temptation here (even for the statistician!) is to believe one's own intuition that for numbers occurring in everyday typical situations, all digits should have an equal chance of occurring, and thus uniformly distributed. But let's look at some surprising results from the closing prices and daily volume of stocks traded on The New York Stock Exchange (Bolsa) on December 23, 2011. We choose the first 31 companies on top of the alphabetically-sorted list as our random small sample:

    FIGURE 1 PRICES AND VOLUME OF STOCKS TRADED ON THE NEW YORK STOCK EXCHANGE Stock Symbol Closing Price Volume A $30.74 1,124,700 AA $38.32 5,950,900 AAI $7.03 533,700 AAP $34.09 430,100 AAR $22.14 8,600 AAV $11.01 263,800 AB $60.86 335,400 ABA $25.75 4,000 ABB $25.12 2,627,700 ABC $41.48 478,600 ABD $14.03 264,200 ABG $14.24 164,500 ABH $9.68 992,700 ABI $34.42 791,000 ABK $9.94 1,688,700 ABM $19.88 140,100 ABN $57.62 29,500 ABN PRE $21.49 28,000 ABN PRF $23.58 5,800 ABN PRG $22.15 46,100 ABR $15.92 254,700 ABT $53.23 2,336,000 ABV $85.17 406,200 ABV C $77.19 5,400 ABW PRA $25.02 1,900 ABX $53.55 2,574,500 ACC $26.52 147,300 ACE $55.09 1,216,700 ACE PRC $24.92 11,300 ACF $14.50 597,600 ACQ $8.39 193,300 Source: https://nyse.nyx.com/ About half of the numbers here start with digit 1 or digit 2! Here is the exact LD distribution for this limited set of 31 companies above. It should be noted that almost all other such subsets down the long list on the NYSE website yield quite similar results:

    FIGURE 2 LEADING DIGITS OF STOCK PRICES AND VOLUME Digit Price Volume l 19.4% 29.0% 2 29.0% 25.8% 3 12.9% 3.2% 4 3.2% 16.1% 5 12.9% 16.1% 6 3.2% 0.0% 7 6.5% 3.2% 8 6.5% 3.2% 9 6.5% 3.2% Source:: UCR-Calculaciones del Bursa--New York Stock Exchange Simon Newcomb in 1881 and then Frank Benford in 1938 discovered that low digits lead much more often than high digits in everyday and scientific data and arrived at the exact expression of Probability[1st digit is d] = [LOG.sub.10](1+1/d) being the probability that digit d is leading. This set of proportions is known as the logarithmic distribution, and the law is known as Benford's Law. For example, P[1st digit is 1] = [LOG.sub.10](1+1/1) = [LOG.sub.10](2) = 0.301.

    B.L. (1st Digits) = {30.1%, 17.6%, 12.5%, 9.7%, 7.9%, 6.7%, 5.8%, 5.1%, 4.6%}.

    [FIGURE 3 OMITTED]

    The law also describes an exact distribution for the second order digits, but here proportions among digits are more equal. For example, the 2nd leading digit (from the left) of 603 is digit 0, of 0.0002867 it's digit 8, and of 1,653,832 it's digit 6. It is noted that for the 2nd and higher orders, digit 0 is also included, whereas for the 1st digit order it is excluded. The exact 2nd order distribution for all 10 digits according to Benford's Law is:

    B.L. (2nd Digits) = {12.0%, 11.4%, 10.9%, 10.4%, 10.0%, 9.7%, 9.3%, 9.0%, 8.8%, 8.5%}.

    613 ------> digit 1 0.0002867 ------> digit 8 1,653,832 ------> digit 6 -0.456398 ------> digit 5 603 ------> digit 0

    [FIGURE 4 OMITTED]

    Digital proportion for the 2nd order is not nearly as skewed in favor of low digits as is the case for the 1st order. The 3rd order digit distribution is even more equal than the 2nd order. And finally there is almost total digital equality for the 4th and higher orders.

    The probability of any First-Two-Digit combination (called FTD), such as 34, and exemplified in numbers such as 3487, 0.0341, 340 etc. is given by the formula:

    Probability[1st digit is p AND 2nd digit is q ] = [LOG.sub.10] (1 + 1/pq).

    For example, P(10) = LOG(1 + 1/10) = LOG(1.1) = 0.0414

    P(25) = LOG(1 + 1/25) = 0.0170 and P(99) = LOG(1 + 1/99) = 0.0044.

    When numbers in the data set are long enough (i.e. many digits per number), and typically this is the case, namely that we almost always have plenty of digits in each value (say over 4 or 5 digits), then the last digit distribution (digits on the right-most side) should be about uniform with equal probability of 1/10 for each digit. Also, the last-two digit combinations (on the right) should also show uniformity with an equal probability of 1/100. This is so since there are 100 possibilities here, namely {00, 01, 02, ..., 97, 98, 99}.

    As mentioned by Hill (1995) in the first section titled "The Significant-Digit Law", as well as in the 2nd section "Empirical Evidence", the validity of Benford's Law has been observed and verified in numerous domains including finance, accounting, economics, census data, physics, astronomy, chemistry, and geology, to mention just a few.

    According to Durtschi C. et al. (2004), Hill (1995), and others in the literature, the following short summery about applicability can be given. The specific types of data that ARE Benford include: any well-mixed data from a variety of sources, almost all accounting data such as account receivable, account payable, revenues, expenses, election results by city/town or province (if free and fair, not manipulated), population and other census data, size of files in megabytes on any typical large computer, sport data, the list of all the physical constants in Physics and Chemistry (combined), house numbers in address data, data derived from multiplication processes such as exponential growth and decay, the Fibonacci series. While the specific types of data that are NOT Benford include: phone/cell numbers, lottery numbers, code and index numbers, serial numbers, purchase ID order numbers, ATM withdrawal amounts, any other data with pre-assigned values, data with artificial built-in minimum or maximum values.

    Benford's Law is a mathematical and statistical fact about how numbers are USED and OCCUR in everyday typical situations, expressing physical quantities as well as abstract entities we wish to record. But the law is NOT a mathematical law purely about our number system itself --totally divorced from its use. It is indeed also a physical and scientific law regarding quantitative measurements, and its scope covers all disciplines, as it is being almost universally observed. Rarely do we encounter such a prevalent law or regularity spanning all disciplines in science, linking and connecting various fields of study, from physics, chemistry, astronomy, and geology, to economics, finance, accounting, and so forth.

  3. THE LOGARITHMIC AS REPEATED MULTIPLICATIONS

    Repeated multiplication processes effectively drive numbers toward the logarithmic distribution in the limit. This is so only if all intermediate values are considered and retained as part of that long sequence of numbers. Classical examples include: (I) Money invested in a bank account and locked in for 30 years at 5% interest rate, while yearly snapshots of account balance are taken each December 31. (II) Bacteria in the lab tube growing at 20% per...

Para continuar leyendo

Solicita tu prueba

VLEX utiliza cookies de inicio de sesión para aportarte una mejor experiencia de navegación. Si haces click en 'Aceptar' o continúas navegando por esta web consideramos que aceptas nuestra política de cookies. ACEPTAR