by Tatiana Kurilo

The data set explored in this project contains information on 113,937 loans obtained via Prosper - the first American company in the field of peer-to peer lending. Borrowers request personal loans on Prosper and investors can fund the loan amount partially or in full.
There are 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information. The time period for which the data were collected lasts from November, 2005 till March, 2014.

Default Data Structure

## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...

to the beginning

Summary Statistics

##                    ListingKey     ListingNumber    
##  17A93590655669644DB4C06:     6   Min.   :      4  
##  349D3587495831350F0F648:     4   1st Qu.: 400919  
##  47C1359638497431975670B:     4   Median : 600554  
##  8474358854651984137201C:     4   Mean   : 627886  
##  DE8535960513435199406CE:     4   3rd Qu.: 892634  
##  04C13599434217079754AEE:     3   Max.   :1255725  
##  (Other)                :113912                    
##                     ListingCreationDate  CreditGrade         Term      
##  2013-10-02 17:20:16.550000000:     6          :84984   Min.   :12.00  
##  2013-08-28 20:31:41.107000000:     4   C      : 5649   1st Qu.:36.00  
##  2013-09-08 09:27:44.853000000:     4   D      : 5153   Median :36.00  
##  2013-12-06 05:43:13.830000000:     4   B      : 4389   Mean   :40.83  
##  2013-12-06 11:44:58.283000000:     4   AA     : 3509   3rd Qu.:36.00  
##  2013-08-21 07:25:22.360000000:     3   HR     : 3508   Max.   :60.00  
##  (Other)                      :113912   (Other): 6745                  
##                  LoanStatus                  ClosedDate   
##  Current              :56576                      :58848  
##  Completed            :38074   2014-03-04 00:00:00:  105  
##  Chargedoff           :11992   2014-02-19 00:00:00:  100  
##  Defaulted            : 5018   2014-02-11 00:00:00:   92  
##  Past Due (1-15 days) :  806   2012-10-30 00:00:00:   81  
##  Past Due (31-60 days):  363   2013-02-26 00:00:00:   78  
##  (Other)              : 1108   (Other)            :54633  
##   BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:0.15629   1st Qu.:0.1340   1st Qu.: 0.1242  
##  Median :0.20976   Median :0.1840   Median : 0.1730  
##  Mean   :0.21883   Mean   :0.1928   Mean   : 0.1827  
##  3rd Qu.:0.28381   3rd Qu.:0.2500   3rd Qu.: 0.2400  
##  Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##  NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn 
##  Min.   :-0.183          Min.   :0.005   Min.   :-0.183  
##  1st Qu.: 0.116          1st Qu.:0.042   1st Qu.: 0.074  
##  Median : 0.162          Median :0.072   Median : 0.092  
##  Mean   : 0.169          Mean   :0.080   Mean   : 0.096  
##  3rd Qu.: 0.224          3rd Qu.:0.112   3rd Qu.: 0.117  
##  Max.   : 0.320          Max.   :0.366   Max.   : 0.284  
##  NA's   :29084           NA's   :29084   NA's   :29084   
##  ProsperRating..numeric. ProsperRating..Alpha.  ProsperScore  
##  Min.   :1.000                  :29084         Min.   : 1.00  
##  1st Qu.:3.000           C      :18345         1st Qu.: 4.00  
##  Median :4.000           B      :15581         Median : 6.00  
##  Mean   :4.072           A      :14551         Mean   : 5.95  
##  3rd Qu.:5.000           D      :14274         3rd Qu.: 8.00  
##  Max.   :7.000           E      : 9795         Max.   :11.00  
##  NA's   :29084           (Other):12307         NA's   :29084  
##  ListingCategory..numeric. BorrowerState  
##  Min.   : 0.000            CA     :14717  
##  1st Qu.: 1.000            TX     : 6842  
##  Median : 1.000            NY     : 6729  
##  Mean   : 2.774            FL     : 6720  
##  3rd Qu.: 3.000            IL     : 5921  
##  Max.   :20.000                   : 5515  
##                            (Other):67493  
##                     Occupation         EmploymentStatus
##  Other                   :28617   Employed     :67322  
##  Professional            :13628   Full-time    :26355  
##  Computer Programmer     : 4478   Self-employed: 6134  
##  Executive               : 4311   Not available: 5347  
##  Teacher                 : 3759   Other        : 3806  
##  Administrative Assistant: 3688                : 2255  
##  (Other)                 :55456   (Other)      : 2718  
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           False:56459         False:101218    
##  1st Qu.: 26.00           True :57478         True : 12719    
##  Median : 67.00                                               
##  Mean   : 96.07                                               
##  3rd Qu.:137.00                                               
##  Max.   :755.00                                               
##  NA's   :7625                                                 
##                     GroupKey                 DateCreditPulled 
##                         :100596   2013-12-23 09:38:12:     6  
##  783C3371218786870A73D20:  1140   2013-11-21 09:09:41:     4  
##  3D4D3366260257624AB272D:   916   2013-12-06 05:43:16:     4  
##  6A3B336601725506917317E:   698   2014-01-14 20:17:49:     4  
##  FEF83377364176536637E50:   611   2014-02-09 12:14:41:     4  
##  C9643379247860156A00EC0:   342   2013-09-27 22:04:54:     3  
##  (Other)                :  9634   (Other)            :113912  
##  CreditScoreRangeLower CreditScoreRangeUpper
##  Min.   :  0.0         Min.   : 19.0        
##  1st Qu.:660.0         1st Qu.:679.0        
##  Median :680.0         Median :699.0        
##  Mean   :685.6         Mean   :704.6        
##  3rd Qu.:720.0         3rd Qu.:739.0        
##  Max.   :880.0         Max.   :899.0        
##  NA's   :591           NA's   :591          
##         FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
##                     :   697     Min.   : 0.00      Min.   : 0.00  
##  1993-12-01 00:00:00:   185     1st Qu.: 7.00      1st Qu.: 6.00  
##  1994-11-01 00:00:00:   178     Median :10.00      Median : 9.00  
##  1995-11-01 00:00:00:   168     Mean   :10.32      Mean   : 9.26  
##  1990-04-01 00:00:00:   161     3rd Qu.:13.00      3rd Qu.:12.00  
##  1995-03-01 00:00:00:   159     Max.   :59.00      Max.   :54.00  
##  (Other)            :112389     NA's   :7604       NA's   :7604   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  2.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.75             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##  NA's   :697                                     
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries   
##  Min.   :    0.0             Min.   :  0.000      Min.   :  0.000  
##  1st Qu.:  114.0             1st Qu.:  0.000      1st Qu.:  2.000  
##  Median :  271.0             Median :  1.000      Median :  4.000  
##  Mean   :  398.3             Mean   :  1.435      Mean   :  5.584  
##  3rd Qu.:  525.0             3rd Qu.:  2.000      3rd Qu.:  7.000  
##  Max.   :14985.0             Max.   :105.000      Max.   :379.000  
##                              NA's   :697          NA's   :1159     
##  CurrentDelinquencies AmountDelinquent   DelinquenciesLast7Years
##  Min.   : 0.0000      Min.   :     0.0   Min.   : 0.000         
##  1st Qu.: 0.0000      1st Qu.:     0.0   1st Qu.: 0.000         
##  Median : 0.0000      Median :     0.0   Median : 0.000         
##  Mean   : 0.5921      Mean   :   984.5   Mean   : 4.155         
##  3rd Qu.: 0.0000      3rd Qu.:     0.0   3rd Qu.: 3.000         
##  Max.   :83.0000      Max.   :463881.0   Max.   :99.000         
##  NA's   :697          NA's   :7622       NA's   :990            
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   : 0.0000          Min.   : 0.000            Min.   :      0       
##  1st Qu.: 0.0000          1st Qu.: 0.000            1st Qu.:   3121       
##  Median : 0.0000          Median : 0.000            Median :   8549       
##  Mean   : 0.3126          Mean   : 0.015            Mean   :  17599       
##  3rd Qu.: 0.0000          3rd Qu.: 0.000            3rd Qu.:  19521       
##  Max.   :38.0000          Max.   :20.000            Max.   :1435667       
##  NA's   :697              NA's   :7604              NA's   :7604          
##  BankcardUtilization AvailableBankcardCredit  TotalTrades    
##  Min.   :0.000       Min.   :     0          Min.   :  0.00  
##  1st Qu.:0.310       1st Qu.:   880          1st Qu.: 15.00  
##  Median :0.600       Median :  4100          Median : 22.00  
##  Mean   :0.561       Mean   : 11210          Mean   : 23.23  
##  3rd Qu.:0.840       3rd Qu.: 13180          3rd Qu.: 30.00  
##  Max.   :5.950       Max.   :646285          Max.   :126.00  
##  NA's   :7604        NA's   :7544            NA's   :7544    
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.000                      Min.   : 0.000         
##  1st Qu.:0.820                      1st Qu.: 0.000         
##  Median :0.940                      Median : 0.000         
##  Mean   :0.886                      Mean   : 0.802         
##  3rd Qu.:1.000                      3rd Qu.: 1.000         
##  Max.   :1.000                      Max.   :20.000         
##  NA's   :7544                       NA's   :7544           
##  DebtToIncomeRatio         IncomeRange    IncomeVerifiable
##  Min.   : 0.000    $25,000-49,999:32192   False:  8669    
##  1st Qu.: 0.140    $50,000-74,999:31050   True :105268    
##  Median : 0.220    $100,000+     :17337                   
##  Mean   : 0.276    $75,000-99,999:16916                   
##  3rd Qu.: 0.320    Not displayed : 7741                   
##  Max.   :10.010    $1-24,999     : 7274                   
##  NA's   :8554      (Other)       : 1427                   
##  StatedMonthlyIncome                    LoanKey       TotalProsperLoans
##  Min.   :      0     CB1B37030986463208432A1:     6   Min.   :0.00     
##  1st Qu.:   3200     2DEE3698211017519D7333F:     4   1st Qu.:1.00     
##  Median :   4667     9F4B37043517554537C364C:     4   Median :1.00     
##  Mean   :   5608     D895370150591392337ED6D:     4   Mean   :1.42     
##  3rd Qu.:   6825     E6FB37073953690388BC56D:     4   3rd Qu.:2.00     
##  Max.   :1750003     0D8F37036734373301ED419:     3   Max.   :8.00     
##                      (Other)                :113912   NA's   :91852    
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:  9.00             1st Qu.:  9.00       
##  Median : 16.00             Median : 15.00       
##  Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :141.00             Max.   :141.00       
##  NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount          LoanOriginationDate LoanOriginationQuarter
##  Min.   : 1000      2014-01-22 00:00:00:   491   Q4 2013:14450         
##  1st Qu.: 4000      2013-11-13 00:00:00:   490   Q1 2014:12172         
##  Median : 6500      2014-02-19 00:00:00:   439   Q3 2013: 9180         
##  Mean   : 8337      2013-10-16 00:00:00:   434   Q2 2013: 7099         
##  3rd Qu.:12000      2014-01-28 00:00:00:   339   Q3 2012: 5632         
##  Max.   :35000      2013-09-24 00:00:00:   316   Q2 2012: 5061         
##                     (Other)            :111428   (Other):60343         
##                    MemberKey      MonthlyLoanPayment LP_CustomerPayments
##  63CA34120866140639431C9:     9   Min.   :   0.0     Min.   :   -2.35   
##  16083364744933457E57FB9:     8   1st Qu.: 131.6     1st Qu.: 1005.76   
##  3A2F3380477699707C81385:     8   Median : 217.7     Median : 2583.83   
##  4D9C3403302047712AD0CDD:     8   Mean   : 272.5     Mean   : 4183.08   
##  739C338135235294782AE75:     8   3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  7E1733653050264822FAA3D:     8   Max.   :2251.5     Max.   :40702.39   
##  (Other)                :113888                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
## 

to the beginning

Data Preparation

The data set uses 4 fields of unique keys to identify each listing:
ListingKey - unique key for each listing, same value as the ‘key’ used in the listing object in the API.
ListingNumber - the number that uniquely identifies the listing to the public as displayed on the website.
LoanKey - equals to the ListingKey and the ‘key’ in the API.
LoanNumber - unique numeric value associated with the loan.

Since all four variable can be used interchangeably for purposes of listings identification, three of them can be omitted. ListingKey will be used in further analysis.

Though this data set may be considered tidy data in general, we can see from summary statistics, that there is a number of ListingKey values which have several rows for each listing key.

Number of occurrence of listing keys
## 
##      1      2      3      4      6 
## 112239    790     32      4      1

Here is one example with the maximum of 6 rows per key.

Example

##                    ListingKey ListingCreationDate CreditGrade Term
## 13079 17A93590655669644DB4C06 2013-10-02 17:20:16               60
## 14889 17A93590655669644DB4C06 2013-10-02 17:20:16               60
## 20570 17A93590655669644DB4C06 2013-10-02 17:20:16               60
## 31451 17A93590655669644DB4C06 2013-10-02 17:20:16               60
## 42751 17A93590655669644DB4C06 2013-10-02 17:20:16               60
## 42752 17A93590655669644DB4C06 2013-10-02 17:20:16               60
##       LoanStatus ClosedDate BorrowerAPR BorrowerRate LenderYield
## 13079    Current       <NA>     0.16662       0.1435      0.1335
## 14889    Current       <NA>     0.16662       0.1435      0.1335
## 20570    Current       <NA>     0.16662       0.1435      0.1335
## 31451    Current       <NA>     0.16662       0.1435      0.1335
## 42751    Current       <NA>     0.16662       0.1435      0.1335
## 42752    Current       <NA>     0.16662       0.1435      0.1335
##       EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## 13079                  0.1264        0.0524           0.074
## 14889                  0.1264        0.0524           0.074
## 20570                  0.1264        0.0524           0.074
## 31451                  0.1264        0.0524           0.074
## 42751                  0.1264        0.0524           0.074
## 42752                  0.1264        0.0524           0.074
##       ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## 13079                       5                     B            4
## 14889                       5                     B            8
## 20570                       5                     B            7
## 31451                       5                     B           10
## 42751                       5                     B            5
## 42752                       5                     B            6
##       ListingCategory..numeric. BorrowerState Occupation EmploymentStatus
## 13079                         1            MD      Other         Employed
## 14889                         1            MD      Other         Employed
## 20570                         1            MD      Other         Employed
## 31451                         1            MD      Other         Employed
## 42751                         1            MD      Other         Employed
## 42752                         1            MD      Other         Employed
##       EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## 13079                       26               False            False
## 14889                       26               False            False
## 20570                       26               False            False
## 31451                       26               False            False
## 42751                       26               False            False
## 42752                       26               False            False
##       GroupKey    DateCreditPulled CreditScoreRangeLower
## 13079          2013-12-23 09:38:12                   720
## 14889          2013-12-23 09:38:12                   720
## 20570          2013-12-23 09:38:12                   720
## 31451          2013-12-23 09:38:12                   720
## 42751          2013-12-23 09:38:12                   720
## 42752          2013-12-23 09:38:12                   720
##       CreditScoreRangeUpper FirstRecordedCreditLine CurrentCreditLines
## 13079                   739              1986-12-26                 12
## 14889                   739              1986-12-26                 12
## 20570                   739              1986-12-26                 12
## 31451                   739              1986-12-26                 12
## 42751                   739              1986-12-26                 12
## 42752                   739              1986-12-26                 12
##       OpenCreditLines TotalCreditLinespast7years OpenRevolvingAccounts
## 13079              12                         20                     6
## 14889              12                         20                     6
## 20570              12                         20                     6
## 31451              12                         20                     6
## 42751              12                         20                     6
## 42752              12                         20                     6
##       OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries
## 13079                         348                    0              5
## 14889                         348                    0              5
## 20570                         348                    0              5
## 31451                         348                    0              5
## 42751                         348                    0              5
## 42752                         348                    0              5
##       CurrentDelinquencies AmountDelinquent DelinquenciesLast7Years
## 13079                    0                0                       0
## 14889                    0                0                       0
## 20570                    0                0                       0
## 31451                    0                0                       0
## 42751                    0                0                       0
## 42752                    0                0                       0
##       PublicRecordsLast10Years PublicRecordsLast12Months
## 13079                        0                         0
## 14889                        0                         0
## 20570                        0                         0
## 31451                        0                         0
## 42751                        0                         0
## 42752                        0                         0
##       RevolvingCreditBalance BankcardUtilization AvailableBankcardCredit
## 13079                  14635                0.57                   10865
## 14889                  14635                0.57                   10865
## 20570                  14635                0.57                   10865
## 31451                  14635                0.57                   10865
## 42751                  14635                0.57                   10865
## 42752                  14635                0.57                   10865
##       TotalTrades TradesNeverDelinquent..percentage.
## 13079          17                                  1
## 14889          17                                  1
## 20570          17                                  1
## 31451          17                                  1
## 42751          17                                  1
## 42752          17                                  1
##       TradesOpenedLast6Months DebtToIncomeRatio    IncomeRange
## 13079                       0              0.41 $25,000-49,999
## 14889                       0              0.41 $25,000-49,999
## 20570                       0              0.41 $25,000-49,999
## 31451                       0              0.41 $25,000-49,999
## 42751                       0              0.41 $25,000-49,999
## 42752                       0              0.41 $25,000-49,999
##       IncomeVerifiable StatedMonthlyIncome TotalProsperLoans
## 13079             True                3000                NA
## 14889             True                3000                NA
## 20570             True                3000                NA
## 31451             True                3000                NA
## 42751             True                3000                NA
## 42752             True                3000                NA
##       TotalProsperPaymentsBilled OnTimeProsperPayments
## 13079                         NA                    NA
## 14889                         NA                    NA
## 20570                         NA                    NA
## 31451                         NA                    NA
## 42751                         NA                    NA
## 42752                         NA                    NA
##       ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## 13079                                  NA                              NA
## 14889                                  NA                              NA
## 20570                                  NA                              NA
## 31451                                  NA                              NA
## 42751                                  NA                              NA
## 42752                                  NA                              NA
##       ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## 13079                       NA                          NA
## 14889                       NA                          NA
## 20570                       NA                          NA
## 31451                       NA                          NA
## 42751                       NA                          NA
## 42752                       NA                          NA
##       ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## 13079                          NA                         0
## 14889                          NA                         0
## 20570                          NA                         0
## 31451                          NA                         0
## 42751                          NA                         0
## 42752                          NA                         0
##       LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination
## 13079                            NA                          2
## 14889                            NA                          2
## 20570                            NA                          2
## 31451                            NA                          2
## 42751                            NA                          2
## 42752                            NA                          2
##       LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## 13079              10000          2014-01-13                Q1 2014
## 14889              10000          2014-01-13                Q1 2014
## 20570              10000          2014-01-13                Q1 2014
## 31451              10000          2014-01-13                Q1 2014
## 42751              10000          2014-01-13                Q1 2014
## 42752              10000          2014-01-13                Q1 2014
##                     MemberKey MonthlyLoanPayment LP_CustomerPayments
## 13079 F80D3694083622957BA09F2              234.5               234.5
## 14889 F80D3694083622957BA09F2              234.5               234.5
## 20570 F80D3694083622957BA09F2              234.5               234.5
## 31451 F80D3694083622957BA09F2              234.5               234.5
## 42751 F80D3694083622957BA09F2              234.5               234.5
## 42752 F80D3694083622957BA09F2              234.5               234.5
##       LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## 13079                       112.62             121.88          -8.49
## 14889                       112.62             121.88          -8.49
## 20570                       112.62             121.88          -8.49
## 31451                       112.62             121.88          -8.49
## 42751                       112.62             121.88          -8.49
## 42752                       112.62             121.88          -8.49
##       LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## 13079                 0                     0                   0
## 14889                 0                     0                   0
## 20570                 0                     0                   0
## 31451                 0                     0                   0
## 42751                 0                     0                   0
## 42752                 0                     0                   0
##       LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## 13079                               0             1               0
## 14889                               0             1               0
## 20570                               0             1               0
## 31451                               0             1               0
## 42751                               0             1               0
## 42752                               0             1               0
##       InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## 13079                          0                           0        96
## 14889                          0                           0        96
## 20570                          0                           0        96
## 31451                          0                           0        96
## 42751                          0                           0        96
## 42752                          0                           0        96

to the beginnging

As can be seen from the example above, the only variable that changes for this listing key is ProsperScore which is a custom risk score built using historical Prosper data and is applicable for loans originated after July 2009. The distribution of the listings for which additional rows occurred because of the change in ProsperScore across the time line shows that the practice of recording score changes was implemented only in most recent data.

Since only ProsperScore changes, this leads to double count of these listings for other variables like LoanStatus, ListingCategory, etc. Of 113066 unique listing keys 827 produce 871 rows of such duplicates. There is no specific logging of changes in ProsperScore in the data, and - taking into account the intent of omitting ProsperScore from further analysis because of high proportion of NA values - the rows where listing keys appear for the second and time and more can be dropped to avoid double count.

The high proportion of NA values is an issue for 14 columns: in 5 columns the proportion of NA values exceeds 25%, in 10 it is over 50% and goes up to 80-85%.

Columns with proportion of NA higher than 25%
##             EstimatedEffectiveYield                       EstimatedLoss 
##                           0.2572303                           0.2572303 
##                     EstimatedReturn             ProsperRating..numeric. 
##                           0.2572303                           0.2572303 
##                          ClosedDate                   TotalProsperLoans 
##                           0.5128863                           0.8061044 
##          TotalProsperPaymentsBilled               OnTimeProsperPayments 
##                           0.8061044                           0.8061044 
## ProsperPaymentsLessThanOneMonthLate     ProsperPaymentsOneMonthPlusLate 
##                           0.8061044                           0.8061044 
##            ProsperPrincipalBorrowed         ProsperPrincipalOutstanding 
##                           0.8061044                           0.8061044 
##         ScorexChangeAtTimeOfListing       LoanFirstDefaultedCycleNumber 
##                           0.8327349                           0.8500699

ClosedDate is applicable for Cancelled, Completed, Chargedoff and Defaulted loan statuses, so we can assume that about half of the loans in the data set are in progress.

EstimatedEffectiveYield, EstimatedLoss, EstimatedReturn, ProsperRating..numeric., ProsperScore were implemented in July 2009. It leads to missing values for earlier data, which are about 26% of the data set. ProsperRating..Alpha. has a comparable number of empty string values for the same reason. On the other hand, CreditGrade - the credit rating that was assigned at the time the listing went live - contains 84984 empty strings of 113937 rows of original data, because it is applicable only for listings before 2009.

TotalProsperLoans, TotalProsperPaymentsBilled, OnTimeProsperPayments, ProsperPaymentsLessThanOneMonthLate, ProsperPaymentsOneMonthPlusLate, ProsperPrincipalBorrowed, ProsperPrincipalOutstanding, ScorexChangeAtTimeOfListing have null values for cases when the borrower had no prior loans with Prosper. That means that about 81% of loans are first Prosper loans for borrowers.

LoanFirstDefaultedCycleNumber is the cycle the loan was charged off, if ever. 85% of missing values result in less than 15% of loans that were charged off.

Univariate Plots Section

Listings by Creation Date

We can clearly see two periods in data - from the end of 2005 to the middle of 2009, corresponding to the relaunch the company conducted in 2009. The same pattern can be seen in loan origination dates.

Since it takes some time for listings to become loans, the distribution of loan originaion dates is slightly translated to the right, compared to the listing creation dates. We can also notice, that after the relauch the number of listing/loans in 2010-2011 was about half lower than it was in 2007-2009. However, the number of loans started growing in 2011 and, after some decrease to pre-relaunch levels in the beginning of 2013, by the end of the period in question achived the number of listing/loans, which is about 3 times higher, than it was before relaunch.

As for seasonal patterns, April is the month of lowest number of listings created and loans originated, while January is of highest. This results in the second quarter being the lowest in term of new loans, and the decrease during the months of the first quarter results in lower total in comparison with the fourth quarter, which holds the highest value. Still, as could be seen from the previous charts of number of listings and loans by months and years, the last months of 2013 and of the first months of 2014 have pronounced impact on total counts. This will affect the seasonal picture as well.

Loan original amount distribution

As can be seen from the chart above, the borrowes tend to ask for some rounded amounts: about 73.8% of loans are in thousands, another 15.9% are divisible by $500. The most popular loan amounts are $4,000, $10,000, and $15,000, though 75% of all loans are below $12,000. The overall distribution is right-skewed with range from $1,000 to $35,000, the median of $6,500 and the mean of $8,337.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6300    8315   12000   35000
Average loan amounts by year
##      2005      2006      2007      2008      2009      2010      2011 
##  3576.682  4763.325  7049.545  6021.628  4354.859  4766.540  6692.021 
##      2012      2013      2014 
##  7833.842 10540.158 11926.927

Loan amounts have also grown in the recent years, after some decrease in the middle of the period, following the total number of listings and loans.

Loan terms

The chart shows that loan terms are descrete and have a limited number of possible values, which can be presented in a shorter form of a table.

## 
##    12    36    60 
##  1614 87224 24228
## 
##        12        36        60 
##  1.427485 77.144323 21.428192

There are three loan terms used in Prosper: 12, 36 and 60 months, which equals to 1, 3 or 5 years. Most loans - 77.1% - are given for 3 years, about one fifth of loans have 5 year terms.

3-year loans on average tend to be for lower amounts, than 5-year loans. The number of 1-year loans is small, but it it closer to 3-year loans distribution with the prevalence of lower amounts.

As for loan origination dates for different terms, on the time line we can see, that 1-year loans were introduced for a period of time during 2011-2014, after which the company seems to seize this option. 5-year loans were also implemented around 2011, and the number of such loans has grown noticeably in the following years.
Since the longer terms on average tend to have highter loan amounts, this leads to higher mean loan amounts in more recent years. The loan amounts should be explored regarding not only time period, but loan terms as well.

Rates

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1929  0.2506  0.4975

Borrowers’ ARP is based on their interest rates, with some additional fees, so it is expectanly follows the same distribution, slightly translated to the right.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.00653 0.15629 0.20984 0.21898 0.28386 0.51229      25

Lenders’ yield is also based on borrower’s rate minus service fees of the platform, so it also follows the same distribution, but slightly translated to the left.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.0100  0.1245  0.1740  0.1829  0.2406  0.4925
Number of invertors per listing

The distribution of investors per listing is right-skewed, with the mode in the 1st percentile of the number range. We can change the scale to logariphmic to take a closer look into smaller values.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    2.00   44.00   80.88  116.00 1189.00
## [1] "df$Investors == 1"
## 
## FALSE  TRUE 
## 75.96 24.04

As can be seen from the histogram, a lot of loans are funded by a small number of investors, 24% - by only one. The average number of investors is 44 in terms of the median or about 80 in terms of the mean.

Monthly payments

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   130.9   217.4   271.9   370.6  2251.5

75% of loans have monthly payments lesser than $370.57, with average monthly payment of $271.9327422.

Loans by loan status

Proportions of each loan status
## 
##                Current              Completed             Chargedoff 
##                  0.493                  0.337                  0.106 
##              Defaulted   Past Due (1-15 days)  Past Due (31-60 days) 
##                  0.044                  0.007                  0.003 
##  Past Due (61-90 days) Past Due (91-120 days) FinalPaymentInProgress 
##                  0.003                  0.003                  0.002 
##  Past Due (16-30 days)              Cancelled   Past Due (>120 days) 
##                  0.002                  0.000                  0.000

Of all loans in the data set, about 51% are loans in progress, 1.82% are past due to some extent, 49.7% are paid on time. 33% of loans are fully paid by borroweds, less than 0.5% loans defaulted, about 10.5% were charged off. As for dynamics, the number of charged-off loans decreased in 2010-2012, getting back to levels of 2008 in 2014, while the number of defaulted loans noticeably decreased in 2010-2014 in comparison with the earlier period (see the chart below).

Average loan original amount for each loan status
##              Cancelled             Chargedoff              Defaulted 
##               1700.000               6398.917               6486.799 
##              Completed FinalPaymentInProgress                Current 
##               6188.146               8344.606              10346.692 
##   Past Due (1-15 days)  Past Due (16-30 days)  Past Due (31-60 days) 
##               8491.334               8156.430               8504.055 
##  Past Due (61-90 days) Past Due (91-120 days)   Past Due (>120 days) 
##               7683.267               8003.977               8281.250

The average amount of current loans is higher than of those that are closed, corresponding to the trends mentioned above in Loan amount and Term sections.

Listing categories

As can be seen on the plot, the most frequent purpose of loans is debt consolidation. Still, it looks rather strange, that for so many options in listing categories the second most popular category is “Not Available”, meaning that either the borrowers aren’t willing to declare the purpose of their loans, or the list of categories has changed to a more detailed version only recently.

Borrowers by income

57.2% of listings were created by the borrowers, who declared their income to be higher than $50,000. However, the income range variable also seems to have undergone some changes over time, considering three levels having the meaning close to “zero income”.

## 
##  Not displayed   Not employed             $0      $1-24,999 $25,000-49,999 
##           6.85           0.71           0.55           6.40          28.25 
## $50,000-74,999 $75,000-99,999      $100,000+ 
##          27.20          14.84          15.20
Borrowers by states

Top ten states, %
## 
##   CA   TX   FL   NY   IL        GA   OH   MI   VA 
## 12.9  6.0  5.9  5.9  5.2  4.9  4.4  3.7  3.2  2.9
Borrowers by employment status

Once again, the levels of Employment Status variable, some of which are of the same meaning, may be the result of some changes in the required information. However, definitely most borrowers are employed.

Borrowers by occupation

Top ten occupations, %
## 
##                    Other             Professional      Computer Programmer 
##                    25.14                    11.97                     3.93 
##                Executive                  Teacher Administrative Assistant 
##                     3.79                     3.30                     3.25 
##                  Analyst                                Sales - Commission 
##                     3.16                     3.12                     3.02 
##           Accountant/CPA 
##                     2.84
Borrowers by credit score

Each bar represents the number of listings by borrowers with a specific credit score range, encoded with two variables CredirScoreRangeLower and CreditScoreRangeUpper with the length of 20 points (for example, 660-679).

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   660.0   680.0   685.5   720.0   880.0     591

For easier interpretation the credit score groups were added, based on FICO score intervals, published on CreditCarma.com

Borrowers by credit score groups, %
## 
##   Too low      Poor      Fair      Good Very Good Excellent      <NA> 
##      0.12      4.87     32.88     39.29     18.03      4.29      0.52

61.61% of borrowers have credit scores in range of Good and better.

Debt to income ratio

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8472

For 99% of borrowers who reported the information on debt-to-income ratio, the ratio is below 0.8607, which makes us wonder who are the outliers whose debt is times higher than income. We can check how the borrowers with highest ratio differ in other aspects, for example, their loan status. Here are two plots on LoanStatus for those who have debt-to-income ratio between 1 and 5 (on the left plot) and over 5 (on the right plot).

Univariate Analysis

What is the structure of your dataset?

The data set containts information on 113066 unique listings created at Prosper.com to obtain loans. Each listing includes the following:
- its indentification numbers and time of creation;
- the information about the borrower - location, income, employment and occupation, credit score, debt to income ratio and other aspects of borrower’s financial situation, and their history with Prosper.com on the listing in question and previous loans, if any;
- the information about the loan - date of origination, amount, borrowers’ rate and APR, lenders’ yield, loan status and closed date (if closed) and the information about the most recent payment.

For all listing in the data set there were loans originated. The average loan amount is 8314.762307, but this number differs a lot from year to year, taking into account the pause in 2008-2009 and the growth of 2013-2014, the latter also marked by the increasing number of 5-year loans. The most frequent purpose of loans is debt consolidation. About half of the loans are closed, others are still in progress, of which less than 5% are past to some extent. The time interval is stretched from November, 09, 2005 to March, 10, 2014 in terms of listing creation time. 24% of loans are funded by one investor. There are three options of loan terms: 1 year, 3 and 5 year. 1-year loans appear to be a temporary option, available in 2011-2013, while 5-year loans were introduced in the middle of the time period in question and became a growing category.
Most borrowers are employed, have income range higher than $50,000 and credit score better than Good (670+). There are borrers from all states, the highest number - in California (13%), followed by Texas, Florida, New York (6%) and Illinois (5%). The average number of current credit lines is about 10 and the median debt-to-income ratio is 22%.

What is/are the main feature(s) of interest in your dataset?

The main feature of interest in the data set for me is the company’s progress over time. We can see, that in the early period of data the company had progress in numbers of listings, yet it had a pause in 2009 (apparently the pause coinsides in time with the Great Recession of 2008-2009). After this pause it restarted with lower listing numbers, but made a great progress in the following years. From the variable dictionary and the behavior of some variables (Terms and ListingCategory, for example) we may assume, that some approaches and policies had changed, affecting the company’s performance and maybe the types of clients it is attracting (though we may have to take into account the overall improvement of the financial situation).

What other features in the dataset do you think will help support your

investigation into your feature(s) of interest?

I suppose that studying the characteristics of borrowers for any change the relauch was followed by, can also help in understanding of the effect of Prosper’s policy changes.

Did you create any new variables from existing variables in the dataset?

I added ListingYear / LoanYear and ListingMonth / Loan Month, based on ListingCreationDate / LoanOriginationDate to make these time parameters more accessible. For borrowers’ credit scores I created a new variable to map the values to the general ranges, usually used to describe scores: “Poor”, “Fair”, “Good”, “Very Good” and “Excellent” (I also added “Too low” level for scores below “Poor”). Also I added a categorical equivalent for ListingCategory..numeric. based on data dictionary for better readability.

Of the features you investigated, were there any unusual distributions?

Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

I converted variables that contain date and time information from character type to date time and modified LoanOriginationQuarter for easier sorting. I also excluded duplicated rows for a set of listings keys (see more information in Data Preparation section).
The distribution of Term variable showed the limited number of values, so I converted it to a factor with 3 levels: 12, 36 and 60 months.
The distribution of DebtToIncomeRatio has some number of outliers, who demostate more charged-off loan status as the ratio grows. Also they have less current loans, which may reflect some changes in policies about this index, so the outliers may require further investigation. There is a significant proportion of loans - about 24%, that are funded by only one investor, as the distribution of investors per loan shows. Any other numbers doesn’t yield a comparable percentage, so this kind of loans may be specific in some way. Overall I found applying logariphmic scale to count axes useful for making existing small values in specific groups or periods more visible.

Bivariate Plots Section

Time in other variables

Number of listings/loans by month and year

On the plots above the scale is changed to logariphmic to make all months visible. Here the interruption in data is more detailed - from October, 2008 to April, 2009 for creation of new listings and to May, 2009 for origination of new loans. Still, the scale of number may be confusing for perception of monthly results, which are more accessible without transformations.

Listing categories by year of creation

Adding colors to the plot, based on year (and changing the scale of counts to logariphmic for better visibility of smaller values), we can see, that “Not Available” category was mostly used in listings of 2005-2007 (and seems to be excluded in 2008-2010). “Personal Loan” and “Student Use” were applicable only before 2009 and 2011, respectively. The most detailed categories, like “Vacation”, “Medical/Dental” or “Boat”, came in use only since 2011 or 2012.

Employment status by year

As expected, missing values in employment status of borrowers refer to the earliest period of data. The categories Employed and Other were implemented in 2010. As we can see from the plot below, Employed became the most frequently used status in recent years.

Occupations by year

Though some other categorial variables seem to have changed their levels over time, Occupation has observations for each year almost in all its levels. The lack of observations in missing values for 2009-2012 may lead to an assumption, that in these years it was a required field in loan applications.

Income range by year

Here all levels, except “Not displayed”, were in use since 2007. There are no listings with $0 income or “Not employed” status, but since these are not very frequently used categories and the data of 2014 include less than 3 months, this is expectant.

Exploring income ranges with frequency polygons we can notice, that the most freqent range before relaunch and in the first few years after it was the range of $25,000-49,999, but in 2013 it changed to $50,000-74,999.

Loan status over time

The charts above shows how the most frequent loan status change from comleted to current for the most recent data. We can also see that the number of loan which are past to some extent, are comparatively low. FinalPaymentInProgress describes a rather specific state of loan “life”, so its number is expectantly low as well.

Closed loans by status

If we compare the distributions of listings creation by date with the distribution of dates when the listings were closed, we can see the 3-year translation of peaks and downfalls for most of the data: for the pause in the listing creation in 2009 there is a pit on the histogram for closed loans.

Of the earliest loans, which all were 3-year loans, we can see an increase of defaulted loans in the second half of 2007 and the inclease of charged-off loans by the beginning of 2008 and upto the first months of 2009. We should mention that the number of defaulted loans descreased pronouncedly in the following years, while the number of charged-off loans started to grow after 2012. But for the latter we should also take into account the growing number of current loans, in which the share of 5-year loans is also growing, so here the comparison of proportions between charged-off and completed loans will be accurate only by 2015-2017 years.

Having compared the loan status frequency over time with the distributions of listings creation and loan closing, we may assume, that the growing share of defaulted and charged-off loans led to the decreased number of loans after relaunch - this might be caused by more strict criteria for borrowers or the lack of funds for dealing with a larger number of loans, or, maybe, by the decreasing of the attractivity for investors and therefore their numbers. We can also suppose that the growing number of completed loans around 2011 created an impusle for growing number of new listings in 2012.

Listing life from creation to closing

To get a deeper understading of listings’ lasting we can compare the listings creation dates to their closed dates (for the listings, that have a closing date).

There is a noticeable line marking the closing of 3-year loans that compose the majority of the data set. We can also see a similar line starting from 2011, which is 2 years lower, which may correspond to the period when 1-year loans were availablel. However there are many listings that were closed much earlier, than their term’s end.

## $Cancelled
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.02270 0.07086 0.07850 0.07262 0.09333 0.09772 
## 
## $Chargedoff
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4187  0.8775  1.3555  1.4853  1.9465  4.6102 
## 
## $Defaulted
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -3.7877  0.7161  1.0940  1.3015  1.6893  4.2856 
## 
## $Completed
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.004109 0.795757 1.668924 1.742848 2.948581 5.530674

As can be seen from the summary above, it takes only a short time for a loan to be cancelled, and about a year and a half to reach a charged-off status. 75% of completed loans reach this status in less than 3 years, though the maximum exceed not only the 3-year term, but also 5-year one. The defaulted group has some negative values in the distribution, so it is necessary to study the data for possible errors.

##                     ListingKey ListingCreationDate LoanStatus ClosedDate
## 108298 DEAA359893047281162F432 2013-12-27 12:02:50  Defaulted 2010-03-16

There is only one listing where ClosedDate is earlier than ListingCreationDate, which is definitely an error. the exclusion of this listing doesn’t affect the distribution much, and the minimum is not meaningful.

## $Defaulted
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2932  0.7163  1.0942  1.3025  1.6893  4.2856

Exploring the distribution of listing life (in years) over time and terms, we can see, that for most loans 75% of closed loans had listing life about the length of their term or lower. Аbout 25% of listing tend to have listing life longer than their loan term. For 1-year loans the median is closer to the end of the term, while for 3-year loans the median listing life is close to 2 years. Аbout 25% of listing tend to have listing life longer than their loan term.
Starting from 2011 the growing proportion of loans in progress starts affecting the range of the distributions - it includes only the loans that were closed earlier than their terms required. The proportion can be seen in the table and chart below.

##                         
##                            2005   2006   2007   2008   2009   2010   2011
##   Cancelled                0.00   0.07   0.00   0.01   0.00   0.00   0.00
##   Chargedoff               0.00  16.12  25.64  23.87  11.33  13.27  16.01
##   Defaulted                0.00  23.25  13.84   9.09   3.86   3.31   3.09
##   Completed              100.00  60.57  60.52  67.03  84.81  82.94  48.88
##   FinalPaymentInProgress   0.00   0.00   0.00   0.00   0.00   0.00   0.35
##   Current                  0.00   0.00   0.00   0.00   0.00   0.34  29.04
##   Past Due (1-15 days)     0.00   0.00   0.00   0.00   0.00   0.05   1.13
##   Past Due (16-30 days)    0.00   0.00   0.00   0.00   0.00   0.00   0.34
##   Past Due (31-60 days)    0.00   0.00   0.00   0.00   0.00   0.00   0.39
##   Past Due (61-90 days)    0.00   0.00   0.00   0.00   0.00   0.00   0.37
##   Past Due (91-120 days)   0.00   0.00   0.00   0.00   0.00   0.09   0.37
##   Past Due (>120 days)     0.00   0.00   0.00   0.00   0.00   0.00   0.04
##                         
##                            2012   2013   2014
##   Cancelled                0.00   0.00   0.00
##   Chargedoff              11.58   0.88   0.00
##   Defaulted                1.83   0.12   0.00
##   Completed               28.22   6.74   0.58
##   FinalPaymentInProgress   0.28   0.27   0.14
##   Current                 53.63  89.44  99.12
##   Past Due (1-15 days)     1.56   1.04   0.11
##   Past Due (16-30 days)    0.55   0.34   0.03
##   Past Due (31-60 days)    0.78   0.48   0.01
##   Past Due (61-90 days)    0.77   0.35   0.00
##   Past Due (91-120 days)   0.76   0.32   0.00
##   Past Due (>120 days)     0.04   0.01   0.00

As for loan amount, it seems not to have any relatioship with closed loan status, expect for cancelled loans, but their number is quite small. For completed, defaulted and charged-off loans the distribution are more or less the same (the outliers in completed loans may refer to the loans completed in 2013-2014, when such amount became available, but the proportion of other closed status is too low for these years due to the term lengths).

Loan waiting time

Another aspect of listing life is the time between the listing was created and the origination of the loan.

From the chart above we can state that waiting time is usually rather short, though in more recent years it may become longer. However, there were period there the period of waiting could reach years, though it referred to only a small number of loans. The distribution of days in waiting can help us to be more precise.

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    0.405    4.574    8.053   11.619   12.376 1094.189

The overall distribution of days in waiting is centered at about 10 days. We can plot it over the years of listing creation to see if there are some changes corresponding to splashes on the scatterplot.

As can be seen from the plot above, the median waiting time was the highest in 2009, while the longest waiting time together with a lot of outliers could be seen in 2008. This period coincides with the Great recession, so we may assume either an increase of borrowers who might be concidered not very reliable or the decrease of the number of investors due to the overall economic situation. Also the relaunch of Prosper’s website in 2009 might have affected the waiting time for the listings created before the relaunch.

We can check if the duration of waiting may be affected by some characteristics of the borrowers, for example, their employment status or credit score.

For the employment status, though the median waiting time for “Employed” and “Self-Employed” categories is slightly lower, the largest number of outliers can be seen in “Full-time” category, which was the most numerous before “Employed” level was implemented.

We can also check if the ability to verify the income had any impact on the distribution of waiting days.

## [1] "Income Verifiable"
## 
##  False   True 
##   8587 104479

As for credit score ranges, again we see the largest number of outliers in the most numerous categories, but the “best” levels are also affected.

In terms of borrowers’ characteristic I’d say that the highest number of outliers can be observed in the categories that were the most frequently seen in time sections where the most listings with longest waiting periods occurred. This may be speaking in favor of other factors mentioned above: the website relaunch, the difficulties of finding investors in the situation of ecomonic recession or some others.

Terms

The loans in the data set have the terms of either 1 year, 3 years or 5 years, 3-year loans being the most frequent. However, they have differ noticeably in average loan amounts or average borrowers’ rates.

These differences may also affect the distribution of other variables if used as an additional dimension. It will be explored further in the Multivariate Plots Section.

Loan original amounts

On the scatterplot the popularity of round amounts is clearly visible. Also we can see, that after the beginning of 2013 the acceptable loan amount increased from $25,000 to $35,000. The minimum amount was also increased in 2011. Also we can assume that in the period before relauch the loans were usually lower than after the relaunch, but it is more accessible via a boxplot (see below).

Debt consolidation, being the most popular category, also has the widest range of loan amount distribution. Of comparable range are also Business category, Wedding Loans and Baby&Adoption, though the first two have lower medians. Auto and Vacation loans are characterised by comparatively lower loan amounts.

Though of employment status “Retired”, “Part-time” or “Not employed” have expectantly lower median loan amounts, the difference between “Employed” and “Full-time” may be caused mostly by the usage of the latter in the period when generally lower loans were accepted.

Investors

The median number of investors per loan grew in 2007-2010, but decreased in 2011, dropping to 1 investor in 2013. The change of the plot type or the scale can give a closer look into the data.

As we can see on the scatterplot, the number of investors per loan was growing from 2006 to 2008, but in the period of 2009-2011 there were almost no loans funded by less than 10 investors. The situation changed in 2012 and further in 2013 where a growing number of loans was funded by only 1 investor.

The following tables aimed at exploring the differences between loans funded by 1 investor or more than one.

Loans with 1 investor by listing categories
## 
##      Not Available Debt Consolidation   Home Improvement 
##               0.89              76.02               5.12 
##           Business      Personal Loan        Student Use 
##               3.31               0.10               0.03 
##               Auto              Other      Baby&Adoption 
##               1.13               6.10               0.21 
##               Boat Cosmetic Procedure    Engagement Ring 
##               0.09               0.04               0.16 
##        Green Loans Household Expenses    Large Purchases 
##               0.05               1.65               1.17 
##     Medical/Dental         Motorcycle                 RV 
##               1.51               0.18               0.04 
##              Taxes           Vacation      Wedding Loans 
##               0.75               0.65               0.79
Loans with 1 investor by listing categories in 2013
## 
##      Not Available Debt Consolidation   Home Improvement 
##               0.01              76.79               5.70 
##           Business      Personal Loan        Student Use 
##               3.22               0.00               0.00 
##               Auto              Other      Baby&Adoption 
##               1.13               5.53               0.22 
##               Boat Cosmetic Procedure    Engagement Ring 
##               0.11               0.03               0.17 
##        Green Loans Household Expenses    Large Purchases 
##               0.05               1.88               1.17 
##     Medical/Dental         Motorcycle                 RV 
##               1.48               0.22               0.03 
##              Taxes           Vacation      Wedding Loans 
##               0.72               0.74               0.79
Loans with more than 1 investor by listing categories in 2013
## 
##      Not Available Debt Consolidation   Home Improvement 
##               0.02              70.05               7.19 
##           Business      Personal Loan        Student Use 
##               4.50               0.00               0.00 
##               Auto              Other      Baby&Adoption 
##               1.36               6.31               0.38 
##               Boat Cosmetic Procedure    Engagement Ring 
##               0.07               0.11               0.28 
##        Green Loans Household Expenses    Large Purchases 
##               0.08               2.36               1.21 
##     Medical/Dental         Motorcycle                 RV 
##               2.11               0.34               0.06 
##              Taxes           Vacation      Wedding Loans 
##               1.35               1.10               1.12

As can be seen from the tables the loans with 1 investors are even more concentrated on Debt consolidation. As for correlation between loan amount and the number of investors, for loans with more than 1 investor it becomes stronger than for the whole data set.

## [1] "All loans:  0.383"    "2+ investors:  0.668"

## [1] -0.2762578

It seems that the lower rates correspond to higher number of inverstors. It may feel slightly counter-intuitive, for the higher rates mean higher lender yield, yet higher rates are more often associated with higher risks involved. We can check for more details in the multivariate analysis. Also we can exclude the loans with one investor, for they are specific for only most recent period and check the correlation again.

## [1] -0.4173609

The relationships seems to be stronger for the loans where more than one investor is involved.

As for terms, the distributions are also affected by the growing number of loans with single investors in 2012-2013. If these loans are excluded, we can see, that the longer terms on average tend to attract slightly more investors (supposedly by the larger amount of loans asked by borrowers).

Credit Score groups

Credit score is considered one of the most important characteristics in US banking system, being a compound indicator of a person’s financial behavior and responsibility.
We can estimate how the Prosper’s preferences about the acceptable credit score changes over time. From the visualisation below we can see that in 2009 the company seized to accept the listings with the credit score worse than “Fair”. Also after the relaunch the most frequent range has changed from “Fair” to “Good”, which may mean that of potential borrowers the company start choosing more reliable, especially in the first years after relaunch.

As can be seen from the chart below, the proportion of borrowers with “excellent” credit score was the highest in 2009-2010, in comparison with other years (for the listings of 2005 the information about borrowers’ credit scores is not available).

As for borrower’s rates, the better the credit score, the lower is the interest the borrower can expect to pay.

As for reported monthly income (on the plot above), the borrowers’ with better credit score on average tend to have higher income (top 1% is excluded from the plot for better representation of most frequent values). This may be one of the reason, that they are on average approved for higher loan amounts (see the chart below).

Bivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. How did the feature(s) of interest vary with other features in
the dataset?

The bivarite data exploration has shown that the company’s performance over time has gone through a set of stages, each of which having its specific characteristics in the variables:
- the early period from 2005 till the company’s website relaunch in the middle of 2009, with the growing number of listing and investors, but a lot of undetailed information on borrowers in the earliest years (with a number of borrowers with low credit score) and a relatively high proportion of loans that resulted later in defaulted and charged-off status;
- the recovery stage from 2009 till 2011 that was characterised by lower number of new listings and loan amounts, the completion of 3-year loans of the first stage, the higher proportion of borrowers with very good and excellent credit scores and the exclusion of credit ratings worse than fair;
- the growth stage from 2011 with dynamic increase of new listings and the number of investors per loan, the proporiton of borrowers with good and fair credit score also grew. In 2012-2013 the new type of investors might have been attracted, who were able to solely fund a loan. Also after the 1-year and 5-year loans were introduced in 2011 the necessity arose of adding one more dimention to the analysis, because the loans differ by term in average loan amounts, rates and number of investors.

Did you observe any interesting relationships between the other features

(not the main feature(s) of interest)?

Exploring the borrowers’ characteristic I found that the credit score appear to be a reasonable reflection of a person’s financial well-being and behavoir, for it corresponds to the higher monthly income and higher proportion of completed loans in comparison with defaulted and charged-off loans, therefore the borrowers with higher credit score may expect to be approved for greater loan amounts with lower interest rates.

What was the strongest relationship you found?

The strongest relationship was found between listing creation date and loan origination date, so if you are going to apply for a loan at Prosper you can expect the origination of the loan within 14 days in case of approval (or even earlier).
The other strong relationship is between the listing creation date and the loans’ closing date, which is determined first of all by the term of the loans. The introduction of 1-year loans caused some distortion to the model line, but for 3-years loans we may expect the closing date to come within the last a year and a half of the term. The first half of the second year seems to be crucial for the loan status of a 3-year loan: if it wasn’t defaulted or charged-off in this period, it is more likely to be completed.
There is also a moderate negative relationship between borrowers’ rates (which is equal to lenders’ yield minus fees) and number of investors, especially for the loans with more than 1 investor. Also the lower rates positively correlate with higher credit scores, which may be the reason the lenders found these listings more attractive.

Multivariate Plots Section

Closed loans by status and terms

The following two plots are made to confirm the assumptions made in Closed Loans section above. On the first one we can see the number of defaulted loans during 2006-2007 and the dynamics of charged-off loans - rather high density in 2006-2009 and - after several years of rather scarce occurrence - the growing density in 2011-2014.

The second plot confirms the assumption about the loan terms: the upper straight line refers to 3-year loans, the lower straight line, which begins in 2011, refers to 1-year loans. The number of closed 5-year loans appeared in 2012-2014 with no trend for most of such loans still have a few years before the end of the term.

Adding a trend line to each term separately, we can see, that for 1-year loan the closing date may be expected slightly earlier but still close to the end of the term, while for 3-year loans the closing date may be expected at about 1.5-2 years from the origination of the loan. However, the data on closed loans for most recent periods distort the model because of many loans that are still in progress. So, any modelling for 5-year loans are not quite reasonable.

It may be useful to truncate the period for 3-year loans to get a more accurate model, also taking into account the loan status.

Here we can see, that the trend line on the plot for all 3-year loans was a compomose between a higher line of completed loans and the lower lines of defaulted and charged-off loans.

Median loan amount by credit groups over time

The borrowers with the excellent credit were the only category who underwent olny slight decrease of the average loan amount in 2008-2010, while good and lower experienced a more noticeable slope from 2007 to 2009, followed by very good category in 2008.

Average borrower rates by credit groups over time

Comparing the average borrower rates for different credit groups over time, we can state that the rates also would change: they grew for all groups in 2007-2008, then jumped higher for good and fair credit groups in comparison with very good and excellent. The increase stopped from 2011, and the rates started to decrease (for fair group - from 2010), getting back to the numbers of 2006.

Borrower rates by credit groups and terms

As we can see from the chart above, better credit ratings tend to result in lower interest rates. The longer terms will result in higher rates for most credit score groups on average, though the variablity of the distribution is higher for fair and good groups for 3-year loans (though it may be the result of fluctuation in rates in 2009-2011).

Credit score, monthly income and loan amount

Here we can see, that the listing are clustered by loan amount depending on the monthly income and credit score of the borrowers: the higher the credit score and income, the greater the loan amount that may be approved, though the relationship is rather non-linear. The similar clasterisation can be seen, if we change credit score to debt-to-income ratio.

The best combination for the loan amount of about $10,000-15000 seems to be the debt-to-income ratio of about 0.25-0. and monthly income close to $5,000. Higher income may result in greater amount, while lower income may be bound to lower amounts, no matter the ratio.

Investors by term and year

The plot above shows the difference in number of investior depending on the terms of loans. The 3-year and 5-year loans underwent the decrease in median number of investors in 2011-2012 with a drop in 2013. I would expect it to be related with the company efforts for establishing the audience of reliable borrowers in the prior two years. Another possible factor may be the overall improvement of economic situation and the appearence of investors on Prosper, who were not only willing to invest, but also capable of funding solely a relatively large amount of loans. The 1-year loans didn’t show the same trends, so these investors might have been be interested predominantly in longer terms. The other reason might have been the fact that 1-year loans were only accepted in the beginning of 2013, so the distribution is not describing the whole year.

Number of investors per loan by loan original amount over time

The scatterplot above gives more detailed information about the distributions in the previous section. Here we can divide the data into the same three stages as the overall company’s performance. We can see, that during the recovery period the loans of lower amounts were funded mostly by higher number of inverstors. The situation changed around 2012, when the greater amounts began to be funded by lower number of investors. The change continued into 2013 when the number of investors appeared who were able to fund alone the loans that before relaunch were usually funded by several hundred people.

Rates, investors and loan amount

In addition to the relationshop between BorrowerRate and the number of investors per loan discovered earlier, the loan amounts provide additional details. The loans with lower interest rates also tend to be of larger amount. This is the reflection of the fact, that lowest rates typically can be obtained mostly by borrowers with better credit ratings, that also tend to have higher income. This makes it more accessible for them to get a loan of greater amount. The higher loan amount in peer-to-peer lending may require a greater number of invertors, because of collective lending and limited funds of each lender involved, but the characteristics of borrowers of such loans make it a reliable investment. The higher amounts are more often seen in 3-year and 5-year loans, so the investment will be also long-term.
Also there is a dense number of invertors who are interested in loans with comparatively higher interest rates of about 0.28-0.35, but since the loan amounts with such rates are usually lower, the number of investors involved is also relatively small.

Investors, credit score and loan amount over time

Adding another dimension of credit score to number of investors per loan, loan amounts and time, we can assume that the characteristics of borrowers is not the only important part in getting a loan. After the relaunch and recession mostly small loan amounts were funded, no matter the credit score (if not excellent). We may assume that it was the consequence not only of the recession, but the statistics of defaulted and charged-off loans, demonstrated by Prosper’s borrowers in the earlier years. As time moved forward the wider range of credit score became acceptable for the higher loan amounts.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the

investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

The multivariate analysis helped to distinguish the characteristics related to different stages of Prosper’s performance over time and to see how they influence each other. Thus, on the earlier stage the growing number of listings and loan amounts was simultaneous with the growing number of investors per loan. Then the recession took place, together with the company’s website relauch with changed requirements to the new borrowers after the relatively high proportion of loans in the first stage resulted in defaults and charged-off. This was also the time when the number of investors overall seems to have decreased (either as a result of the poor outcome of earlier loans or of the economic crisis in general, we can’t tell). The caution of listings approval and the increase of rates was followed by the growing number of loans (and, apparently investors). We must also mention here the overall improvement of economic situation.

Were there any interesting or surprising interactions between features?

One of the most surprising discoveries for me was the negative relation between loan interest rates and the number of investors, but adding the loan amount variable allowed to see this relationship in more detailed way, and I included it in the final section.


Final Plots and Summary

Plot One

Description One

This plot describes several aspects of Prospers’ loans history at once, as the company went through three stages in its performance during the period in question:
- the early period from 2005 till the company’s website relaunch in the middle of 2009, with the growing number of listing and investors, but with a relatively high proportion of loans that resulted later in defaulted and charged-off status;
- the recovery stage from 2009 till 2011 that was characterised by lower number of new listings together with the growing proportion of completed loans and decreasing share of defaulted and charged-off loans;
- the growth stage from 2011 with dynamic increase of new listings and the intoduction of 2 additional terms: 1 year and 5 years, which resulted in higher proportions of loans in progress because of the growing number of 5-year loans.
We can also see that thecompany managed to decrease the number of defaulted loans since the earliest stage and that the loans that are past to some extend compose only small share of loans in progress.

The progress made through the recovery stage correlates with the temporary increase of rates for all credit ratings in 2008-2011 and the exclusion of borrowers with credit ratings below fair starting from 2009. While such measures are reasonable and may be called expectant in the situation of economic recession, in peer-to-peer lending the existance and attracting of investors becomes an important factor of financial processes. That is why I found the behavior of investors the most interesting aspect discovered.

Plot Two

Description Two

This plot gives us some understanding of the other side in the peer-to-peer landing - the investors. Here we can visually divide the data into the same three stages as the overall company’s performance. We can see, that during the recovery period the loans of lower amounts were funded mostly by higher number of inverstors. I would also assume the overall decrease of the number of investors involved. The situation started to change in 2011. Around 2012 the greater loan amounts began to be funded by lower number of investors. Here I would also expect the overall increase in number of investors together with the number of loans. The change continued into 2013 when the number of investors appeared who were able to fund alone the loans that before the recession and the relaunch were usually funded by several hundred investors.

Plot Three

Description Three

For this plot I used BorrowerRate on x-axis to visualise the relationship between what the borrowers are expected to pay and what number of inverstors are interested in such loans, since the lenders’ yield is based on borrowers’ rate minus servicing fees.
As can be seen from the plot, there is a negative correlation between the number of investors per loan and the interest rate, paid by borrowers. The loans with lower interest rates also tend to be of larger amount. This is the reflection of the fact, that lowest rates typically can be obtained mostly by borrowers with better credit ratings, who also tend to have higher income and therefore may be approved for higher loan amount. The higher loan amount in peer-to-peer lending may require a greater number of invertors, because there is no single financial institution behind, but the characteristics of borrowers of such loans make it a reliable investment. The higher amounts are more often seen in 3-year and 5-year loans, so the investment will be also long-term.
Also there is a dense number of invertors who are interested in loans with comparatively higher interest rates of about 0.28-0.35, but since the loan amounts with such rates are usually lower, the number of investors involved is also relatively small.


Reflection

There are several challenges this data set provides, in my opinion. First of all, the number of variables. Even the limiting of their number for further exploration requires a lot of efforts in distinguishing the most perspective ones. Another challenge comes from the rather long period of data. On the one hand, the data change over time due to different circumstances, which makes the explorations even more interesting, but it also makes, for example, the analysis of averages less meaningfull without time dimension (or less reliable without verification by time dimension). On the other hand, the approach to some variables also change over time, which is especially noticeable in changing levels of the categorical variables. Some of them would require rethinking and relabelling to be used in further analysis. This especially refers to the categorical characteristics of borrowers which may provide more additions to the trends discovered after rearranging the levels to combine those with similar meanings into broader groups. My favorite part was working with ggplot2 for visualisations, I appreciated the versatility it gives after you understand the structure of code used for plotting. My main struggle was to remeber to fit the code into 80 symbol limitation, mostly because of the long names of the variables in the data set.