by Tatiana Kurilo

Summary

The visualisation created in Tableau is based on the exploratory analysis of the Prosper loans data set, conducted as a part of the project for Exploratory Data Analysis in R and is aimed at presenting the finding of this analysis in compact and interactive way. Prosper Marketplace is America’s first peer-to-peer lending marketplace. Borrowers request personal loans on Prosper and investors can fund these request partially or in full. Investors can consider borrowers’ credit scores, ratings, and histories and the category of the loan. Prosper handles the servicing of the loan and collects and distributes borrower payments and interest back to the loan investors.
The original data set was processed for the purposes of EDA and underwent some cleaning and additions of variables. The more detailed explanation of data cleaning can be found it the Data Preparation section of the report [1], the variables were mostly introduced/modified in Univariable Plots section. The data set used for the Tableau story is included in the submission.

As the EDA has shown, the company’s performance in 2005-2014 has gone through a set of stages:
- the early period from 2005 till the company’s website relaunch in the middle of 2009, with the growing number of listing and investors, but a lot of undetailed information on borrowers in the earliest years (with a number of borrowers with low credit score) and a relatively high proportion of loans that resulted later in defaulted and charged-off status;
- the recovery stage from 2009 till 2011 that was characterised by lower number of new listings and loan amounts, the completion of 3-year loans of the first stage, the higher proportion of borrowers with very good and excellent credit scores and the exclusion of credit ratings worse than fair;
- the growth stage from 2011 with dynamic increase of new listings and the number of investors per loan, the proporiton of borrowers with good and fair credit score also grew. In 2012-2013 the new type of investors might have been attracted, who were able to solely fund a loan. Also the new loan terms were introduced in 2011 and the highest loan amount increased up to $35,000 in recent years.

Of three options of loan terms 1-year loans appear to be a temporary option, available in 2011-2013, while 5-year loans became a growing category. The new loan terms required to add the term dimention to other indexes like average loan amount, loan duration, rates, etc. On average, the loan amounts tend to increase with the loan term, and so do the rates. However, the history of 3-year loans is significantly longer, than of the other two terms, and they have the highest variability in rates. Also, since the most of 5-year loans are still in progess, the average loan duration for closed 5-year loan is short - it includes only the loans which were closed early before the end of the term.

Borrowers and investors are two important sides making peer-to-peer lending possible. For borrowers their credit score is considered one of the most important characteristics in US banking system, being a compound indicator of a person’s financial behavior and responsibility. Prosper’s data show that it corresponds to the higher monthly income and higher proportion of completed loans in comparison with defaulted and charged-off loans, therefore the borrowers with higher credit score may expect to be approved for greater loan amounts with lower interest rates.
After the difficulties the company faced during the first years, Prosper seized to accept listing from borrowers with credit score lower that fair. In 2009-2011 the share of excellent and very good ratings was also the highest. The increase of rates in the same period and decrease of approved loan amounts affected these two groups less than others. This period is also characterised by higher number of investors per loan even for the loans with relatively small amounts requested.
In general, the higher the loan amount, the more investors are involved, this also correlates with lower rates. This is the reflection of the fact, that lowest rates typically can be obtained mostly by borrowers with better credit ratings, who also tend to have higher income and therefore may be approved for higher loan amount. The higher loan amount in peer-to-peer lending may require a greater number of invertors, because there is no single financial institution behind, but the characteristics of borrowers of such loans make it a reliable investment. The higher amounts are more often seen in 3-year and 5-year loans, so the investment will be also long-term. Also there is a noticeable number of invertors who are interested in in comparatively higher rates, but since it may mean also lower credit rates and and therefore lower approved loan amounts, the number of investors per loan is relatively small.
In 2012-2013 a new type of investors was attracted who were able to solely fund in full loan amounts that were above average, which also had its impact on number of loans funded through Prosper’s platform.


Design

The exploratory analysis was concentrated on studying the Prosper’s performance over time in terms of number of loans, their status, rates, loan amounts and other indices. So one of the main element of charts or filters are dates, either of listing creation/loan origination, or of loan closing. The EDA also brought in focus several aspects of Prosper’s performance in 2005-2014 that were presented in five sections of the Tableau story.

  1. Overview. The section’s purpose is to provide an overview of lending activity and its results throughout 2005-2014. The section includes a map to show state coverage, an area chart to demonstrate the number of listings together with their status over time, the distribution of loans amounts for the chosen period of time, and two totals - for number of loans and for loan amounts.
  2. Performance. Again, area chart with bars was used on time line to demostrate the simultaneous processes: the occurrence of new loans and the closing of existing ones. Also the terms were shown to add time information to the previous section. The relative bar chart was used to explore loan status in proportions and compare them from year to year.
  3. Terms. In 2011 Prosper introduced 1-year and 5-year loan terms in addition to 3-year term it used before. This section shows differences observed between these terms in rates, loan amounts and loan duration for closed loans. Since this refers to numeric variables, the boxplots where used to compare the distributions betweets terms. The relative bar chart was used to explore loan status in proportions and comrape them between terms.
  4. Borrowers. The credit score is one of the most widely used borrowers’ characteristics in US banking system and it is also used on Prosper. The line charts with time line were chosen to show the trends in each credit score group corresponding to the peaks and downfalls of overall activity in the previous section. The relative bar chart comparatively demostrate the presence of each credit score group in each year in question and alson their impact in each loan status.
  5. Investors. Investors is the other important side of the process making peer-to-peer landing possible. The section’s purpose is to show how the number of investor per loan changed following the trends shown in other sections. The data is presented on timeline as scatter for better representation of specific points and in the aggrerated form of boxplot by years. Also the scatterplot is included to demonstrate the relationship between the borrowers’ rates (that are the base for the lenders’ yield), loan amount and the number of investors per loan. Since loan amounts and rates differ between credit score groups, the filter on these groups is also added for interactive comparison.
Additions and changes on feedback

The story was originally desinged as a presentation, but as a follow-up on the feedback received the textboxes were included in all story slides to substitute for a hypothetical speaker. The Loan Status variable was changed to include all “Past Due …” levels in one for easier interpretation (via additional calculated field).
Average loan amount was added to tooltips on the map in the Overview section. Selection filter has been set for scatterplots in the Investors sections to allow the users to choose their own periods of interest or rate ranges. Credit score group filter in Investors sections was changed to multiple values.

Predicting the most reliable and profitable loans goes beyond the scope of this project and the preceeding project on EDA. However, it may be the next project task to try.


Feedback

What do you notice in the visualization?
- At the start of the period the number of loans in Texas was on the same level as in California.
- During the crisis period only the borrowers with excellent credit score didn’t show the decrease in median loan amount.

What questions do you have about the data?
- What is the average loan amount in each state?

What relationships do you notice?
- The 1-year loans look more perspective for lenders, than 3-year loans: their loan duration is more consistent and close to the term length, while 3-year loan duration is very variable, going beyond 3 years, and the charge-offs and defaults have larger share in loan status.

What do you think is the main takeaway from this visualization?
- The company was able to overcome the crisis, but it is not clear what the most profitable loans are, taking into account the risks of not getting the money back.

Is there something you don’t understand in the graphic?
- Why the long duration for 5-year loans is shorter than for 3-year loans in Term section?

Other suggestions
- Change the filter type in Investor section to multiple choice to select several credit score groups.
- Combine all past due loan status into one group.


Resources

  1. Exploratory Data Analysis in R Project: Prosper Loans