Probability of Informed Trade - a measure of information asymmetry

This page primarily contains links to the PINs computed using the Venter and DeJong model (Venter, J.H., de Jongh, D., 2006. Extending the ekop model to estimate the probability of informed trading. Studies in Economics and Econometrics 30, 25-39). At the bottom the page are links to the basic PINs computed using the basic EKO model.

The files accessible from this page contain PINs computed similarly to those used in "How Disclosure Quality Affects the Level of Information Asymmetry" Review of Accounting Studies , Vol 12 (2-3), 2007: co-authored with Stephen A Hillegeist. (Also available here ). See discussion in Brown and Hillegeist (2007) as to why these PINs are much more robust than the basic EKO PINs and how they allow for the very strong positive correlation that we observe between buys and sells in the data. If the basic model did describe the data, this correlation would be negative. As discussed in our RAST paper, the basic EKO model is a special case of the general VdJ model where the parameter psi is infinite. (In my formulation, I use the inverse of this parameter, and hence the EKO model corresponds to invpsi (= 1/psi) = 0.)

The data are contained in two SAS files (and/or two ascii files) - computed by quarter (yyyyq) and year respectively and cover the period 1993 to 2010.

The files are:

VdJ - quarterly data SAS files (30MB)  
VdJ - annual data SAS files (7MB)   
VdJ - quarterly data - ascii file (11MB)  
VdJ - annual data - ascii file (3MB)


Each file contains data along the following lines:

Obs permno yyyyq alphasas deltasas epsisas musas - invpsi minlik - numdays - maxgrd terminat pinsas

.. 1 10001 20031 1.00000 . 0.44419 . 3.41 . 03.56 0.89806 -13.815 .. 59 .... 2.77E-08 ABSGTOL 0.343
.. 2 10001 20032 0.47924 . 0.74004 . 8.55 . 11.86 1.13721 -16.818 .. 62 .... 1.21E-11 GTOL... 0.250
.. 3 10001 20033 0.43950 . 0.67354 . 3.12 . 05.66 0.28189 -09.927 .. 62 .... 2.20E-09 GTOL... 0.285
.. 4 10001 20034 1.00000 . 0.83057 . 2.25 . 03.64 0.60876 -10.055 .. 61 .... 0.00E+00 GTOL... 0.447
.. 5 10001 20041 0.47333 . 0.54494 . 3.28 . 06.02 0.41500 -07.783 .. 61 .... 6.39E-07 GTOL... 0.303
.. 6 10001 20042 1.00000 . 0.70410 . 1.53 . 02.00 0.61685 -11.717 .. 50 .... 2.81E-08 GTOL... 0.396
.. 7 10001 20043 1.00000 . 0.59562 . 2.24 . 04.16 0.81434 -09.222 .. 47 .... 5.32E-13 GTOL... 0.482 

In the quarterly (annual) file, keyfields are permno and yyyyq (year). Other variables are as follows:

alphasas - probability of an information event
deltasas - probability of information event being bad news
epsisas - trading intensity of uninformed traders (trades per day)
musas - trading intensity of informed traders (trades per day)
invpsi - inverse of the psi parameter. invpsi = 0 imples that the data is described by the basic EKO model.
minlik - the minimum value of the log-likelihood for the data within the period
numdays - the number of days used in the estimation process. (Days with zero trades are excluded.) You may wish to exclude observations where number of observations is less than 30.
maxgrd - the gradient of the objective function at optimum.
terminat - termination condition of the SAS proc nlp procedure. If this variable is "PROBLEMS", you may wish to drop the observation.
pinsas - computed PIN, i.e. PIN = (mu * alpha) / (mu*alpha + 2 * epsi)

As can be seen from the data, the VdJ model generates many (approximately 30%) corner solutions of alpha = 1. i.e every day is a private information day. In almost all cases though PIN is not a corner solution. These corner solutions arise because the assumption of a Poisson arrival rate is too restrictive, given the observed trading data. For a Poisson distribution, the variance is equal to the mean. In the actual trade data we observe a considerably greater variation than this amount.
The minimum values of the likelihood function are small - but not nearly as small as those that arise out of the basic EKO model. In the latter case, the model typically only converges if loglikelihoods are limited (say, somewhere in the range -500 to -40). Without such a limit, the SAS optimization procedure typically fails because of numerical over/underflow when the number of buys and sells exceeds 3,000 a day.

You are welcome to use these data for research purposes and I should be grateful if you would let me know if you download them and/or find them useful. If there is sufficient interest, I will try to continue to update them for later periods. Also, if you have estimated PINs yourself and come to believe that I have computed these erroneously, please let me know at:
stephenb at umd dot edu

Computing PIN from buys/sells

If you would like to make your own estimates of the PIN, from the raw buy/sells data that you have extracted from the TAQ files, I should be pleased to supply the SAS code that I used to compute the estimates.

Basic PINs


The basic PINs (computed under the restriction that invpsi = 0, and as used in Brown, Hillegeist and Lo (2004), ("Conference Calls and Information Asymmetry", Journal of Accounting & Economics, Vol 37 (2)) can be downloaded at this page. 

However, as discussed in Brown and Hillegeist (2007) PINs computed using the basic Poisson model do not fit the observed data at all well.

If you do find the above data useful in your work, please let me know.

Thank you.

Alternate Sources of PIN


Estimates of the basic (EKO) PIN are also available at Soeren Hvidkjaer's website . The PINs at Professor Hvidkjaer's site are computed annually for just NYSE and AMEX firms but use ISSM data as well as TAQ data to cover the period 1983 to 2001.

Last update: 18 August, 2016