This notebook demonstrates most of the features of the AllenSDK that help manipulate data in the Cell Types Database. The main entry point will be through the CellTypesCache
class.
CellTypesCache
is responsible for downloading Cell Types Database data to a standard directory structure on your hard drive. If you use this class, you will not have to keep track of where your data lives, other than a root directory.
from allensdk.core.cell_types_cache import CellTypesCache
# Instantiate the CellTypesCache instance. The manifest_file argument
# tells it where to store the manifest, which is a JSON file that tracks
# file paths. If you supply a relative path (like this), it will go
# into your current working directory
ctc = CellTypesCache(manifest_file='cell_types/manifest.json')
# this saves the NWB file to 'cell_types/specimen_464212183/ephys.nwb'
data_set = ctc.get_ephys_data(464212183)
The data_set
variable is an NwbDataSet
instance, which has some methods we can use to access the injected current stimulus waveform and the voltage response waveform for all experimental sweeps. Let's pull one sweep out and plot it.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
sweep_number = 30
sweep_data = data_set.get_sweep(sweep_number)
index_range = sweep_data["index_range"]
i = sweep_data["stimulus"][0:index_range[1]+1] # in A
v = sweep_data["response"][0:index_range[1]+1] # in V
i *= 1e12 # to pA
v *= 1e3 # to mV
sampling_rate = sweep_data["sampling_rate"] # in Hz
t = np.arange(0, len(v)) * (1.0 / sampling_rate)
plt.style.use('ggplot')
fig, axes = plt.subplots(2, 1, sharex=True)
axes[0].plot(t, v, color='black')
axes[1].plot(t, i, color='gray')
axes[0].set_ylabel("mV")
axes[1].set_ylabel("pA")
axes[1].set_xlabel("seconds")
Cell records in the Cell Types Database come with a large amount of metadata. We have exposed the most commonly used of these are arguments to CellTypesCache.get_cells.
from allensdk.core.cell_types_cache import CellTypesCache
from allensdk.core.cell_types_cache import ReporterStatus as RS
ctc = CellTypesCache(manifest_file='cell_types/manifest.json', base_uri='http://testwarehouse:9000')
# this downloads metadata for all cells with morphology images
cells = ctc.get_cells(require_morphology = True)
print "Cells with morphology images: ", len(cells)
# cells with reconstructions
cells = ctc.get_cells(require_reconstruction = True)
print "Cells with reconstructions: ", len(cells)
# all cre positive cells
cells = ctc.get_cells(reporter_status = RS.POSITIVE)
print "Cre-positive cells: ", len(cells)
# cre negative cells with reconstructions
cells = ctc.get_cells(require_reconstruction = True,
reporter_status = RS.NEGATIVE)
print "Cre-negative cells with reconstructions: ", len(cells)
The Cell Types Database also contains 3D reconstructions of neuronal morphologies. The data are presented in the SWC format. We'll download a particular cell's reconstrution here.
The AllenSDK contains a module that makes it easier to work with the SWC files. We'll see how the data is contained in the file by looking at the first node.
# download and open an SWC file
cell_id = 480114344
morphology = ctc.get_reconstruction(cell_id)
# the compartment list has all of the nodes in the file
print morphology.compartment_list[0]
Note that the type
field refers to the type of neuronal compartment. The values can be 1 for the soma, 2 for the axon, 3 for dendrites, and 4 for apical dendrites (if present).
Morphologies now also come with marker files, which contains points of interest in the reconstruction. The marker file contains locations where dendrites have been truncated due to slicing and when axons were not reconstructed. The name
field indicates the type of marker (10 for dendrite truncation, 20 for no reconstruction).
# download and open a marker file
markers = ctc.get_reconstruction_markers(cell_id)
print len(markers)
print markers[0]
We can use this data to draw lines between each node and all its children to get a drawing of the cell. We'll do it looking at it from the front and from the side.
from allensdk.core.swc import Marker
fig, axes = plt.subplots(1, 2, sharey=True, sharex=True)
axes[0].set_aspect('equal', 'box-forced')
axes[1].set_aspect('equal', 'box-forced')
# Make a line drawing of x-y and y-z views
for n in morphology.compartment_list:
for c in morphology.children_of(n):
axes[0].plot([n['x'], c['x']], [n['y'], c['y']], color='black')
axes[1].plot([n['z'], c['z']], [n['y'], c['y']], color='black')
# cut dendrite markers
dm = [ m for m in markers if m['name'] == Marker.CUT_DENDRITE ]
axes[0].scatter([m['x'] for m in dm], [m['y'] for m in dm], color='#3333ff')
axes[1].scatter([m['z'] for m in dm], [m['y'] for m in dm], color='#3333ff')
# no reconstruction markers
nm = [ m for m in markers if m['name'] == Marker.NO_RECONSTRUCTION ]
axes[0].scatter([m['x'] for m in nm], [m['y'] for m in nm], color='#333333')
axes[1].scatter([m['z'] for m in nm], [m['y'] for m in nm], color='#333333')
axes[0].set_ylabel('y')
axes[0].set_xlabel('x')
axes[1].set_xlabel('z')
The Cell Types Database contains a set of features that have already been computed, which could serve as good starting points for analysis. We can query the database using the SDK to get these features.
import pprint
pp = pprint.PrettyPrinter(indent=2)
# download all electrophysiology features for all cells
ephys_features = ctc.get_ephys_features()
# filter down to a specific cell
specimen_id = 464212183
cell_ephys_features = [f for f in ephys_features if f['specimen_id'] == specimen_id]
pp.pprint(cell_ephys_features)
That's how to get all the ephys features for a given specimen - what if we want a particular feature for all cells?
updown = np.array([f['upstroke_downstroke_ratio_long_square'] for f in ephys_features], dtype=float)
fasttrough = np.array([f['fast_trough_v_long_square'] for f in ephys_features], dtype=float)
plt.figure()
plt.scatter(fasttrough, updown, color='#2ca25f')
plt.ylabel("upstroke-downstroke ratio")
plt.xlabel("fast trough depth (mV)")
Let's use numpy to fit a regression line to these data and plot it.
A = np.vstack([fasttrough, np.ones_like(updown)]).T
print "First 5 rows of A:"
print A[:5, :]
m, c = np.linalg.lstsq(A, updown)[0]
print "m", m, "c", c
plt.figure()
plt.scatter(fasttrough, updown, color='#2ca25f')
plt.plot(fasttrough, m * fasttrough + c, c='gray')
plt.ylabel("upstroke-downstroke ratio")
plt.xlabel("fast trough depth (mV)")
It looks like there may be roughly two clusters in the data above. Maybe they relate to whether the cells are presumably excitatory (spiny) cells or inhibitory (aspiny) cells. Let's query the API and split up the two sets to see.
cells = ctc.get_cells()
cell_index = { c['id']: c for c in cells}
dendrite_types = ['spiny', 'aspiny']
data = {}
# group fast trough depth and upstroke downstroke ratio values by cell dendrite type
for dendrite_type in dendrite_types:
type_features = [f for f in ephys_features if cell_index[f['specimen_id']]['dendrite_type'] == dendrite_type]
data[dendrite_type] = {
"fasttrough": [f['fast_trough_v_long_square'] for f in type_features],
"updown": [f['upstroke_downstroke_ratio_short_square'] for f in type_features],
}
plt.figure()
for a_type, color in zip(dendrite_types, ["#d95f02", "#7570b3"]):
plt.scatter(data[a_type]['fasttrough'], data[a_type]['updown'], color=color, label=a_type)
plt.legend(loc='best')
plt.ylabel("upstroke-downstroke ratio")
plt.xlabel("fast trough depth (mV)")
The Cell Types Database contains a set of precomputed morphological features for cells that have reconstructions. You can access morphology features by themselves, or combined with the electrophysiology features.
# download all morphology features for cells with reconstructions
morphology_features = ctc.get_morphology_features()
# or download both morphology and ephys features
# this time we'll ask the cache to return a pandas dataframe
all_features = ctc.get_all_features(dataframe=True, require_reconstruction=True)
all_features
The AllenSDK contains the code used to compute the electrophysiology features you accessed above. You can run it yourself like this.
from allensdk.ephys.feature_extractor import EphysFeatureExtractor
sweep_number = 35
sweep_data = data_set.get_sweep(sweep_number)
index_range = sweep_data["index_range"]
i = sweep_data["stimulus"][0:index_range[1]+1] # in A
v = sweep_data["response"][0:index_range[1]+1] # in V
i *= 1e12 # to pA
v *= 1e3 # to mV
sampling_rate = sweep_data["sampling_rate"] # in Hz
t = np.arange(0, len(v)) * (1.0 / sampling_rate)
fx = EphysFeatureExtractor()
stim_start = 1.0
stim_duration = 1.0
fx.process_instance("", v, i, t, stim_start, stim_duration, "")
feature_data = fx.feature_list[0].mean
print "Avg spike width: {:.2f} ms".format(feature_data['width'])
print "Avg spike threshold: {:.1f} mV".format(feature_data["threshold"])
import pprint
pp = pprint.PrettyPrinter(indent=2)
pp.pprint(feature_data["spikes"][0])
A list comprehension is an easy way to pull out the spike times.
spike_times = [s["t"] for s in feature_data["spikes"]]
print spike_times[:5]
plt.figure()
plt.plot(t, v, color='black')
min_v = v.min()
min_v -= 5.0
plt.scatter(spike_times, np.ones(len(spike_times)) * min_v, c='r')
plt.xlim(0.9, 1.2)