Investment Readiness Screener — Phase 3: Screening & Analysis¶

Crystal Olisa · Operations Generalist

Project context¶

This notebook is Phase 3 of a three-phase investment pipeline built to automate the first-pass screening of African SME financials for investment readiness.

Role context: As a Digital Engagement Analyst supporting private equity DealRooms for African SMEs, I reviewed submitted company financials manually to determine whether a business cleared the bar for an investment conversation. Four signals were assessed per company — revenue size, EBITDA margin, revenue growth, and debt load — producing a pass/review/decline decision. This pipeline automates that process.

Pipeline architecture¶

Phase File Responsibility
1 — Ingestion generate_profiles.py Generates 100 structured JSON company submission forms
2 — Extraction pipeline/extract_and_validate.py Runs five-gate validation, derives screening metrics, outputs master CSV
3 — Screening This notebook Applies investment thresholds, produces flags and charts

This notebook reads from data/screening_results.csv — the output of Phase 2. It does not re-run validation logic. The gate outcomes are already recorded in the CSV.


Source data and calibration¶

The synthetic dataset is calibrated against World Bank Enterprise Survey public indicator data for Nigeria, Kenya, Ghana, South Africa, and Ethiopia.

Source: Enterprise Surveys, World Bank Group. https://www.enterprisesurveys.org
Classification: Public. No registration required for indicator-level data.

WB Indicator file What it measures How it was used
WB_ES_T_PERF1.csv Real annual sales growth (%) Calibrates growth ranges per archetype. Median: 0.2%, P25: -10.2%, P75: 5.6%
WB_ES_T_FIN16.csv % of firms citing finance as major constraint Calibrates submission completeness split. Median: 33.1% → 50/33/17 Validated/Partial/Deferred

What synthetic means here: Company names, registration numbers, and specific figures are fabricated. Distributions, growth ranges, and completeness rates are anchored to published WB indicators.


Gate logic (applied in Phase 2)¶

Four gates ran in sequence before any company reached this notebook.

Gate Condition Action Rationale
Gate 1 reporting_year − year_founded < 3 Deferred — exit pipeline EBITDA from under 3 years of operations is not a reliable signal
Gate 1 Critical fields null (revenue, gross_profit, total_equity) Deferred — exit pipeline Can't screen on partial critical data
Gate 2 EBITDA reconciliation variance > 5% Flag — continue to screening Submitted EBITDA checked against derived (GP − opex)
Gate 3 All others Derive metrics, compile CSV USD conversion, EBITDA margin, YoY growth, D/E ratio

Screening thresholds¶

| Signal | Advance | Decline | Decline | |---|---|---|---| | Revenue (USD) | > $1.5M | | EBITDA margin | ≥ 15% | | Revenue growth (YoY) | ≥ 10% | | Debt-to-equity | ≤ 2.0 |

Decision logic: All 4 signals at Advance → Advance. 2–3 signals at Decline or better → Decline. Fewer than 2 → Decline. Fewer than 2 signals available → Decline.


Section 0 — Setup and load¶

What we're doing: Loading the master CSV from Phase 2 and auditing what arrived at the screening stage.

Why it matters: The screening logic only runs on companies that cleared Gates 0 and 1. Companies that didn't make it are still present in the CSV — their pipeline_status is the output for those rows, not a screening flag. Understanding the population before applying logic is the first step in any responsible analysis.

Expected population: 100 total. 17 Deferred (Gate 1). 0 Deferred (Gate 4 — all submissions had critical fields present). 83 screenable.

In [4]:
from IPython.display import display, Image
import pandas as pd
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import numpy as np
import os

os.makedirs('../charts', exist_ok=True)

df = pd.read_csv('../data/screening_results.csv')

print(f'Total companies in pipeline: {len(df)}')
print()
print('Investment flag distribution:')
print(df['investment_flag'].value_counts())
print()
print('Deferred breakdown:')
print(df[df['investment_flag']=='Deferred']['deferred_reason'].value_counts())
print()
print('Submission anomalies:', df['submission_anomaly'].sum())
print('EBITDA reconciliation flags:', df['ebitda_recon_flag'].sum())
print()

# Screenable companies (reached Gate 5)
df_screen   = df[df['investment_flag'].isin(['Advance', 'Decline'])].copy()
df_deferred = df[df['investment_flag'] == 'Deferred'].copy()
df_def_mat  = df[df['deferred_reason'] == 'operating_history'].copy()
df_def_fin  = df[df['deferred_reason'] == 'financial_history'].copy()

print(f'Screened companies: {len(df_screen)}')
print(f'Deferred — operating history: {len(df_def_mat)}')
print(f'Deferred — financial history: {len(df_def_fin)}')
print()
print('Criterion score availability (screened companies):')
for col in ['rev_score', 'ebitda_score', 'growth_score', 'd2e_score']:
    n = df_screen[col].notna().sum()
    print(f'  {col}: {n}/{len(df_screen)}')
Total companies in pipeline: 100

Investment flag distribution:
investment_flag
Decline     63
Deferred    26
Advance     11
Name: count, dtype: int64

Deferred breakdown:
deferred_reason
operating_history    17
financial_history     9
Name: count, dtype: int64

Submission anomalies: 9
EBITDA reconciliation flags: 5

Screened companies: 74
Deferred — operating history: 17
Deferred — financial history: 9

Criterion score availability (screened companies):
  rev_score: 74/74
  ebitda_score: 52/74
  growth_score: 74/74
  d2e_score: 54/74

Section 1 — Apply screening thresholds¶

What we're doing: Scoring each signal for each screenable company and assigning an investment flag.

Analytical decisions documented here:

Missing signals are scored as None, not zero. A company with a missing EBITDA figure is not assumed to have failed EBITDA — the signal is simply absent. This matters for the Decline flag: if fewer than 2 signals are available, the company cannot be reliably screened and is flagged accordingly rather than auto-declined.

All 4 signals must be at Advance to receive an Advance flag. If a company clears 3 signals at Advance but has a missing 4th signal, it receives Decline. The highest confidence flag requires full visibility. A missing signal could be concealing the one failure.

Thresholds are named constants, not hardcoded values. Changing a threshold means changing one variable at the top of this cell, not hunting through logic. This is intentional — the screening criteria should be adjustable without touching the decision logic.

In [6]:
# ── Screening criteria (reference) ───────────────────────────────────────
# Scores are pre-computed in Phase 2 (extract_and_validate.py) and loaded
# from screening_results.csv. These constants are shown here for reference.

CRITERIA = {
    'revenue_usd_millions': {'advance': 1.5,  'label': 'Revenue',               'unit': '$M',  'higher': True},
    'ebitda_margin_pct':    {'advance': 15.0, 'label': 'EBITDA margin',          'unit': '%',   'higher': True},
    'revenue_growth_pct':   {'advance': 10.0, 'label': 'Revenue growth (YoY)',   'unit': '%',   'higher': True},
    'debt_to_equity':       {'advance': 2.0,  'label': 'Debt-to-equity ratio',   'unit': '',    'higher': False},
}

print('Screening criteria — all four must be met for Advance:')
for key, t in CRITERIA.items():
    direction = '≥' if t['higher'] else '≤'
    unit = t['unit']
    val = f'${t["advance"]}M' if unit == '$M' else f'{t["advance"]}{unit}'
    print(f'  {t["label"]}: {direction} {val}')
print()
print('Investment flag distribution (from Phase 2):')
print(df['investment_flag'].value_counts())
Screening criteria — all four must be met for Advance:
  Revenue: ≥ $1.5M
  EBITDA margin: ≥ 15.0%
  Revenue growth (YoY): ≥ 10.0%
  Debt-to-equity ratio: ≤ 2.0

Investment flag distribution (from Phase 2):
investment_flag
Decline     63
Deferred    26
Advance     11
Name: count, dtype: int64

Section 2 — Chart 1: Pipeline conversion funnel¶

What we're doing: Showing how 100 submitted companies convert through each stage of the pipeline to investment decision.

Why it matters: This is the first question any portfolio manager asks — what proportion of inbound is investable? The funnel shows not just the final answer but where attrition happens. Companies lost at Gate 1 represent a sourcing calibration problem (too many early-stage companies entering the pipeline). Companies lost at screening represent a financial profile problem. The two require different operational responses.

Chart design decisions: Bars are coloured by function — grey for pipeline exits, dark blue for neutral counts, green for Advance, orange for Decline, grey for Decline. The colour carries meaning: a viewer should be able to read the chart without the legend.

Saved to: ../charts/chart_1_pipeline_funnel.png

In [8]:
matplotlib.rcParams['font.family'] = 'monospace'

BG    = '#F7F4EF'
TEXT  = '#1A1A1A'
MID   = '#888780'
BORDER= '#D3D1C7'
BASE  = '#2E3E5A'
ACCENT= '#E8621A'
LIGHT = '#D3D1C7'

n_total   = len(df)
n_def_mat = len(df_def_mat)
n_def_fin = len(df_def_fin)
n_screen  = len(df_screen)
n_advance = int(df['investment_flag'].eq('Advance').sum())
n_decline = int(df['investment_flag'].eq('Decline').sum())

stages = ['Submitted', 'After\nGate 1', 'After\nGate 3', 'Screened', 'Advance', 'Decline']
values = [n_total, n_total - n_def_mat, n_total - n_def_mat - n_def_fin, n_screen, n_advance, n_decline]
colors = [LIGHT, LIGHT, LIGHT, LIGHT, ACCENT, BASE]

fig, ax = plt.subplots(figsize=(12, 5))
fig.patch.set_facecolor(BG); ax.set_facecolor(BG)
ax.xaxis.grid(False)

bars = ax.bar(stages, values, color=colors, width=0.55, zorder=3)
for bar, val in zip(bars, values):
    pct = val / n_total * 100
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.8,
            f'{val}  ({pct:.0f}%)', ha='center', va='bottom',
            fontsize=8, color=TEXT, fontfamily='monospace', fontweight='bold')

ax.set_title('Only 11% of submissions clear all four criteria',
             fontsize=11, fontweight='bold', color=TEXT, fontfamily='monospace', pad=16)
ax.set_ylabel('Number of companies', fontsize=9, color=MID, fontfamily='monospace')
ax.tick_params(colors=MID, labelsize=8)
for spine in ax.spines.values(): spine.set_edgecolor(BORDER)
ax.yaxis.grid(True, color=BORDER, linewidth=0.5, alpha=0.5, zorder=0)
ax.spines['top'].set_visible(False); ax.spines['right'].set_visible(False)
ax.set_axisbelow(True); ax.set_ylim(0, n_total * 1.22)

plt.tight_layout()
plt.savefig('../charts/chart_1_pipeline_funnel.png', dpi=150, bbox_inches='tight', facecolor=BG)
display(Image('../charts/chart_1_pipeline_funnel.png'))
print('Saved: charts/chart_1_pipeline_funnel.png')
No description has been provided for this image
Saved: charts/chart_1_pipeline_funnel.png

Section 3 — Chart 2: Signal failure breakdown¶

What we're doing: Showing which signals are failing most frequently across the screenable pool.

Why it matters: Signal failure rates are sourcing intelligence, not just screening outputs. If debt-to-equity is the most common failure signal, the pipeline is finding companies that are growing but over-leveraged — a structural pattern that points to a specific sourcing adjustment. If revenue is most common, the pipeline is pulling in companies too early in their growth cycle. Each pattern has a different operational response.

WB context: The World Bank data shows median real sales growth of 0.2% across five African countries. Low or negative growth is the norm, not the exception. The growth failure rate in this chart should be read against that benchmark — companies failing the growth threshold are not outliers; they reflect the regional baseline.

Chart design decision: Bars sorted by failure rate descending. The sort order is intentional — the chart is designed to answer 'where is the pipeline breaking down most?' not 'how are signals ordered alphabetically.'

Saved to: ../charts/chart_2_signal_failures.png

In [10]:
criteria_map = {
    'Growth < 10%':         int(df_screen[df_screen['growth_score'].notna()]['growth_score'].eq('decline').sum()),
    'Revenue < $1.5M':      int(df_screen['rev_score'].eq('decline').sum()),
    'EBITDA margin < 15%':  int(df_screen[df_screen['ebitda_score'].notna()]['ebitda_score'].eq('decline').sum()),
    'Debt-to-equity > 2.0': int(df_screen[df_screen['d2e_score'].notna()]['d2e_score'].eq('decline').sum()),
}

pairs   = sorted(zip(criteria_map.values(), criteria_map.keys()), reverse=True)
vals_s, labs_s = zip(*pairs)
pcts    = [v / len(df_screen) * 100 for v in vals_s]
bar_colors = [ACCENT, ACCENT, BASE, BASE]

fig, ax = plt.subplots(figsize=(10, 4.5))
fig.patch.set_facecolor(BG); ax.set_facecolor(BG)
ax.yaxis.grid(False)
ax.xaxis.grid(True, color=BORDER, linewidth=0.5, alpha=0.5, zorder=0)

bars = ax.barh(labs_s, pcts, color=bar_colors, height=0.45, zorder=3)
for bar, val, pct in zip(bars, vals_s, pcts):
    ax.text(pct + 0.5, bar.get_y() + bar.get_height()/2,
            f'{val} companies  ({pct:.1f}%)', ha='left', va='center',
            fontsize=8, color=TEXT, fontfamily='monospace', fontweight='bold')

ax.set_title('Growth and revenue are the two most common barriers to Advance',
             fontsize=11, fontweight='bold', color=TEXT, fontfamily='monospace', pad=16)
ax.set_xlabel('% of screened companies not meeting this criterion', fontsize=9, color=MID, fontfamily='monospace')
ax.tick_params(colors=MID, labelsize=8)
for spine in ax.spines.values(): spine.set_edgecolor(BORDER)
ax.spines['top'].set_visible(False); ax.spines['right'].set_visible(False)
ax.invert_yaxis(); ax.set_xlim(0, max(pcts) * 1.4)

plt.tight_layout()
plt.savefig('../charts/chart_2_signal_failures.png', dpi=150, bbox_inches='tight', facecolor=BG)
display(Image('../charts/chart_2_signal_failures.png'))
print('Saved: charts/chart_2_signal_failures.png')
No description has been provided for this image
Saved: charts/chart_2_signal_failures.png

Section 4 — Chart 3: EBITDA margin vs revenue (coloured by flag)¶

What we're doing: Plotting the two most analytically central signals against each other, coloured by investment flag.

Why it matters: Revenue and EBITDA margin together answer the core investability question — is this company big enough and efficient enough? The scatter shows the population shape: where Advance companies cluster, where borderline cases sit, and whether the Decline category is genuinely borderline or failing one specific signal. Threshold lines make the decision logic visible directly on the chart.

Note on missing values: Companies with missing EBITDA margin are excluded from this chart. They appear in the pipeline funnel (Chart 1) and signal failure chart (Chart 2) but cannot be plotted here. Their exclusion from this chart is noted — it does not mean they were excluded from screening.

WB calibration context: The growth ranges used to generate the synthetic archetypes mean most companies in this dataset cluster at low-to-moderate revenue with variable EBITDA. The Qualified Star archetype (revenue > $1.5M, EBITDA > 15%) represents the top of the realistic distribution for this market — not an easy bar to clear.

Saved to: ../charts/chart_3_ebitda_vs_revenue.png

In [12]:
df_plot = df_screen[df_screen['ebitda_margin_pct'].notna()].copy()

REV_THRESHOLD   = 1.5
EBITDA_THRESHOLD = 15.0

fig, ax = plt.subplots(figsize=(11, 6.5))
fig.patch.set_facecolor(BG); ax.set_facecolor(BG)
ax.xaxis.grid(True, color=BORDER, linewidth=0.5, alpha=0.5, zorder=0)
ax.yaxis.grid(True, color=BORDER, linewidth=0.5, alpha=0.5, zorder=0)

decline_grp = df_plot[df_plot['investment_flag'] == 'Decline']
advance_grp = df_plot[df_plot['investment_flag'] == 'Advance']

ax.scatter(decline_grp['revenue_usd_millions'], decline_grp['ebitda_margin_pct'],
           c=LIGHT, edgecolors=MID, linewidths=0.5, label='Decline', alpha=0.8, s=45, zorder=3)
ax.scatter(advance_grp['revenue_usd_millions'], advance_grp['ebitda_margin_pct'],
           c=ACCENT, edgecolors='none', label='Advance', alpha=0.9, s=65, zorder=4)

ax.axvline(x=REV_THRESHOLD,    color=MID, lw=0.8, ls='--', alpha=0.6, zorder=2)
ax.axhline(y=EBITDA_THRESHOLD, color=MID, lw=0.8, ls='--', alpha=0.6, zorder=2)

ymin = ax.get_ylim()[0]
ax.text(REV_THRESHOLD + 0.1, ymin + 1.5, '$1.5M threshold', fontsize=6.5, color=MID, fontfamily='monospace')
ax.text(0.05, EBITDA_THRESHOLD + 0.5, '15% threshold',    fontsize=6.5, color=MID, fontfamily='monospace')

ax.set_title('Advance companies clear both revenue and margin — the gap is structural',
             fontsize=11, fontweight='bold', color=TEXT, fontfamily='monospace', pad=16)
ax.set_xlabel('Revenue (USD millions)', fontsize=9, color=MID, fontfamily='monospace')
ax.set_ylabel('EBITDA margin (%)',       fontsize=9, color=MID, fontfamily='monospace')
ax.tick_params(colors=MID, labelsize=8)
for spine in ax.spines.values(): spine.set_edgecolor(BORDER)
ax.spines['top'].set_visible(False); ax.spines['right'].set_visible(False)
ax.set_axisbelow(True)
ax.legend(fontsize=8, framealpha=0, labelcolor=TEXT, loc='upper center', bbox_to_anchor=(0.5, -0.12), ncol=2)

plt.tight_layout()
plt.savefig('../charts/chart_3_ebitda_vs_revenue.png', dpi=150, bbox_inches='tight', facecolor=BG)
display(Image('../charts/chart_3_ebitda_vs_revenue.png'))
print('Saved: charts/chart_3_ebitda_vs_revenue.png')
No description has been provided for this image
Saved: charts/chart_3_ebitda_vs_revenue.png

Section 5 — Chart 4: Pipeline health dashboard¶

What we're doing: A two-panel summary showing pipeline status breakdown and data completeness distribution.

Why it matters: Pipeline health answers a different question from screening results. Screening tells you which companies are investable. Pipeline health tells you whether the intake process is functioning — whether submissions are arriving complete enough to assess, and where the operational bottlenecks are.

Completeness score: Assigned during Phase 1 generation. Validated submissions score 85–100. Partial submissions score 55–80. Deferred submissions score 20–50. The distribution across these bands shows how much of the pipeline is arriving in a state that can be screened vs. requiring follow-up.

WB calibration note: The 33% finance constraint median from the World Bank data was used to calibrate the partial/incomplete split. The completeness distribution in Panel 2 reflects that calibration — the concentration of submissions in the 71–85 and 86–100 bands for validated submissions, and the 51–70 band for partial submissions.

Saved to: ../charts/chart_4_pipeline_health.png

In [14]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 5.5))
fig.patch.set_facecolor(BG)

# Panel 1 — Outcome distribution
ax1.set_facecolor(BG)
outcome_labels = ['Advance', 'Decline', 'Deferred\n(maturity)', 'Deferred\n(history)']
outcome_counts = [n_advance, n_decline, len(df_def_mat), len(df_def_fin)]
outcome_colors = [ACCENT, BASE, LIGHT, LIGHT]

bars1 = ax1.bar(outcome_labels, outcome_counts, color=outcome_colors, width=0.5, zorder=3)
for bar, val in zip(bars1, outcome_counts):
    ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.3,
             f'{val} ({val / n_total * 100:.0f}%)', ha='center', va='bottom',
             fontsize=8, color=TEXT, fontfamily='monospace', fontweight='bold')
ax1.set_title('63 companies decline — 26 deferred before screening', fontsize=10,
              fontweight='bold', color=TEXT, fontfamily='monospace', pad=12)
ax1.set_ylabel('Number of companies', fontsize=8, color=MID, fontfamily='monospace')
ax1.tick_params(colors=MID, labelsize=8)
for spine in ax1.spines.values(): spine.set_edgecolor(BORDER)
ax1.spines['top'].set_visible(False); ax1.spines['right'].set_visible(False)
ax1.yaxis.grid(True, color=BORDER, linewidth=0.5, alpha=0.5, zorder=0)
ax1.set_axisbelow(True); ax1.set_ylim(0, max(outcome_counts) * 1.3)

# Panel 2 — Completeness score distribution
ax2.set_facecolor(BG)
completeness = df_screen['completeness_score'].dropna()
bins         = [0, 30, 50, 70, 85, 100]
bin_labels   = ['0–30', '31–50', '51–70', '71–85', '86–100']
bin_colors   = [BASE, BASE, BASE, BASE, ACCENT]
counts2, _   = np.histogram(completeness, bins=bins)

bars2 = ax2.bar(bin_labels, counts2, color=bin_colors, width=0.5, zorder=3)
for bar, val in zip(bars2, counts2):
    if val > 0:
        ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.2,
                 str(val), ha='center', va='bottom',
                 fontsize=9, color=TEXT, fontfamily='monospace', fontweight='bold')
ax2.set_title('Most screened submissions score above 85% completeness', fontsize=10,
              fontweight='bold', color=TEXT, fontfamily='monospace', pad=12)
ax2.set_xlabel('Completeness score', fontsize=8, color=MID, fontfamily='monospace')
ax2.set_ylabel('Number of companies', fontsize=8, color=MID, fontfamily='monospace')
ax2.tick_params(colors=MID, labelsize=8)
for spine in ax2.spines.values(): spine.set_edgecolor(BORDER)
ax2.spines['top'].set_visible(False); ax2.spines['right'].set_visible(False)
ax2.yaxis.grid(True, color=BORDER, linewidth=0.5, alpha=0.5, zorder=0)
ax2.set_axisbelow(True)

plt.tight_layout()
plt.savefig('../charts/chart_4_pipeline_health.png', dpi=150, bbox_inches='tight', facecolor=BG)
display(Image('../charts/chart_4_pipeline_health.png'))
print('Saved: charts/chart_4_pipeline_health.png')
No description has been provided for this image
Saved: charts/chart_4_pipeline_health.png

Section 6 — Sector and country breakdown¶

What we're doing: Breaking down investment flags by sector and country to identify pipeline patterns.

Why it matters: A screen that produces Advance flags concentrated in one sector is telling you something about sourcing quality, not just company quality. If one country produces a disproportionate share of Declines, the sourcing channel for that market may be pulling in companies at the wrong development stage. This is where the operational recommendation lives — not in the screen itself, but in what the screen reveals about where to focus and where to recalibrate.

Analytical note: Sector and country distributions in this dataset reflect the synthetic generation parameters, not real market data. The advance rates by sector and country should be read as illustrative of the kind of insight this pipeline would surface in a real submission pipeline, not as findings about actual market sectors.

In [16]:
print('Investment flags by sector (screened companies):')
sector_bd = pd.crosstab(df_screen['sector'], df_screen['investment_flag'])
sector_bd['total'] = sector_bd.sum(axis=1)
if 'Advance' in sector_bd.columns:
    sector_bd['advance_rate_%'] = (sector_bd['Advance'] / sector_bd['total'] * 100).round(1)
print(sector_bd.to_string())
print()

print('Investment flags by country (screened companies):')
country_bd = pd.crosstab(df_screen['country'], df_screen['investment_flag'])
country_bd['total'] = country_bd.sum(axis=1)
if 'Advance' in country_bd.columns:
    country_bd['advance_rate_%'] = (country_bd['Advance'] / country_bd['total'] * 100).round(1)
print(country_bd.to_string())
Investment flags by sector (screened companies):
investment_flag  Advance  Decline  total  advance_rate_%
sector                                                  
Agribusiness           2       10     12            16.7
Energy                 2       10     12            16.7
Fintech                0        9      9             0.0
Healthcare             1       12     13             7.7
Logistics              2        6      8            25.0
Manufacturing          2        5      7            28.6
Retail                 2       11     13            15.4

Investment flags by country (screened companies):
investment_flag  Advance  Decline  total  advance_rate_%
country                                                 
Egypt                  1        9     10            10.0
Ethiopia               0        9      9             0.0
Ghana                  1        7      8            12.5
Kenya                  1        5      6            16.7
Nigeria                1        5      6            16.7
Rwanda                 3        4      7            42.9
Senegal                0        6      6             0.0
South Africa           3        6      9            33.3
Tanzania               1        5      6            16.7
Uganda                 0        7      7             0.0

Section 7 — Full results export¶

What we're doing: Merging the investment flags back onto the full dataset and exporting the final results.

Why it matters: The final output covers all 100 companies — not just the screened ones. Companies that were deferred or incomplete need to be in the output too, because they are still part of the pipeline. A deferred company needs to be tracked for resubmission. An incomplete company needs a follow-up action. Both are operationally live.

Saved to: data/investment_flags_final.csv

In [18]:
# Final results are already in the CSV from Phase 2.
# This cell prints the summary and confirms the output path.

print('FINAL RESULTS — ALL 100 COMPANIES')
print('=' * 60)
print(df['investment_flag'].value_counts().to_string())
print()
print('Advance companies:')
advance = df[df['investment_flag'] == 'Advance'][[
    'submission_id', 'company_name', 'country', 'sector',
    'revenue_usd_millions', 'ebitda_margin_pct', 'revenue_growth_pct', 'debt_to_equity'
]].sort_values('revenue_usd_millions', ascending=False)
print(advance.to_string(index=False))
print()
print('Submission anomalies (internal — not in founder reports):')
anom = df[df['submission_anomaly'] == True][[
    'submission_id', 'company_name', 'audit_status', 'investment_flag'
]]
print(anom.to_string(index=False))
print()
print('Source: data/screening_results.csv (Phase 2 output)')
FINAL RESULTS — ALL 100 COMPANIES
============================================================
investment_flag
Decline     63
Deferred    26
Advance     11

Advance companies:
submission_id           company_name      country        sector  revenue_usd_millions  ebitda_margin_pct  revenue_growth_pct  debt_to_equity
        AF055     Waterberg Holdings       Rwanda  Agribusiness                13.059              27.44                11.5            1.05
        AF052      Thorn Tree Energy South Africa     Logistics                 4.010              20.61                13.7            1.56
        AF040    Flame Tree Ventures       Rwanda Manufacturing                 3.509              21.62                10.6            1.57
        AF026     River Basin Retail        Egypt        Retail                 3.391              25.39                19.7            1.56
        AF010 Horizon Healthcare Ltd      Nigeria     Logistics                 3.135              18.04                15.8            0.73
        AF036      Boma Retail Chain        Kenya        Energy                 2.718              19.71                13.9            1.07
        AF053         Ubuntu Agri Co South Africa Manufacturing                 2.443              27.43                15.9            1.50
        AF057    Yellow River Retail     Tanzania        Energy                 2.129              15.96                10.3            0.34
        AF050   Prism Capital Africa       Rwanda    Healthcare                 2.102              20.71                18.0            1.38
        AF031       Wangari Agri Ltd South Africa  Agribusiness                 2.021              28.74                12.6            0.92
        AF016       Harambee Capital        Ghana        Retail                 1.955              31.25                11.6            0.68

Submission anomalies (internal — not in founder reports):
submission_id            company_name        audit_status investment_flag
        AF003  Meridian Agri Holdings Management Accounts        Deferred
        AF008  Serengeti Retail Group           Unaudited         Decline
        AF018  Jua Kali Manufacturing           Unaudited         Decline
        AF019    Kilimanjaro Holdings Management Accounts         Decline
        AF020     Lakeshore Financial Management Accounts         Decline
        AF024      Pantheon Logistics Management Accounts         Decline
        AF025 Quartz Energy Solutions Management Accounts         Decline
        AF029  Ubuntu Financial Group           Unaudited         Decline
        AF032   Xcel Logistics Africa Management Accounts         Decline

Source: data/screening_results.csv (Phase 2 output)

Summary of all analytical decisions¶

Decision Rationale
Minimum 3 years operating history (Gate 1) EBITDA from fewer than 3 years of operations is not a reliable signal — it could reflect a launch spike rather than sustainable margin. Deferred, not declined.
Completeness gate before screening (Gate 1) Scoring a company on 1 signal produces a misleading flag. Decline is a different operational action from Decline — it triggers a follow-up request, not a rejection.
EBITDA reconciliation check (Gate 4) Submitted EBITDA is cross-checked against derived (gross profit − opex). A >5% variance flags the submission for reviewer check before the figure drives a decision. The company is not removed from the pipeline.
Prior year revenue as submitted field The system calculates YoY growth from the two revenue figures rather than accepting a submitted growth rate. This separates raw data from derived metric and makes the calculation auditable.
Three-tier flag (Advance/Decline/Decline) Collapse to binary pass/fail loses borderline companies that warrant a conditional conversation. Decline is a different operational action from Decline.
All 4 criteria required for Advance A missing signal could be concealing a failure. Full visibility required for the highest confidence flag.
Signal failure chart sorted by frequency The sort order carries meaning — the chart is designed to answer 'where is the pipeline breaking down most?'
Synthetic data (calibrated to WB indicators) No publicly available dataset provides African SME-level financials at this granularity without confidentiality restrictions. WB microdata requires a signed confidentiality agreement. The indicator-level data used here does not.

World Bank Enterprise Survey calibration — full citation¶

  • Indicator: Real annual sales growth (%) — WB_ES_T_PERF1
  • Indicator: % of firms identifying access to finance as a major or very severe constraint — WB_ES_T_FIN16
  • Countries: Nigeria, Kenya, Ghana, South Africa, Ethiopia
  • Period: 2006–2025 (most recent survey year per country)
  • Source: Enterprise Surveys, World Bank Group. https://www.enterprisesurveys.org
  • Access classification: Public. No registration required for indicator-level data.

The pipeline does not replace the investment conversation. It makes the right conversations possible faster.