wbia.algo.verif package

Submodules

wbia.algo.verif.clf_helpers module

This module is a work in progress, as such concepts are subject to change.

MAIN IDEA:
MultiTaskSamples serves as a structure to contain and manipulate a set of samples with potentially many different types of labels and features.
class wbia.algo.verif.clf_helpers.ClfProblem[source]

Bases: utool.util_dev.NiceRepr

learn_deploy_classifiers(task_keys=None, clf_key=None, data_key=None)[source]

Learns on data without any train/validation split

learn_evaluation_classifiers(task_keys=None, clf_keys=None, data_keys=None)[source]

Evaluates by learning classifiers using cross validation. Do not use this to learn production classifiers.

python -m wbia.algo.verif.vsone evaluate_classifiers –db PZ_PB_RF_TRAIN –show

Example:

CommandLine:
python -m clf_helpers learn_evaluation_classifiers

Example

>>> # ENABLE_DOCTEST
>>> from wbia.algo.verif.clf_helpers import *  # NOQA
>>> pblm = IrisProblem()
>>> pblm.setup()
>>> pblm.verbose = True
>>> pblm.eval_clf_keys = ['Logit', 'RF']
>>> pblm.eval_task_keys = ['iris']
>>> pblm.eval_data_keys = ['learn(all)']
>>> result = pblm.learn_evaluation_classifiers()
>>> res = pblm.task_combo_res['iris']['Logit']['learn(all)']
>>> res.print_report()
>>> res = pblm.task_combo_res['iris']['RF']['learn(all)']
>>> res.print_report()
>>> print(result)
rrr(verbose=True, reload_module=True)

special class reloading function This function is often injected as rrr of classes

set_pandas_options()[source]
set_pandas_options_low()[source]
set_pandas_options_normal()[source]
class wbia.algo.verif.clf_helpers.ClfResult[source]

Bases: utool.util_dev.NiceRepr

Handles evaluation statistics for a multiclass classifier trained on a specific dataset with specific labels.

augment_if_needed()[source]

Adds in dummy values for missing classes

classmethod combine_results(res_list, labels=None)[source]

Combine results from cross validation runs into a single result representing the performance of the entire dataset

compress(flags)[source]
confusions(class_name)[source]
confusions_ovr()[source]
extended_clf_report(verbose=True)[source]
get_pos_threshes(metric='fpr', value=0.0001, maximize=False, warmup=200, priors=None, min_thresh=0.5)[source]

Finds a threshold that achieves the desired value for the desired metric, while maximizing or minimizing the threshold.

For positive classification you want to minimize the threshold. Priors can be passed in to augment probabilities depending on support. By default a class prior is 1 for threshold minimization and 0 for maximization.

get_thresholds(metric='mcc', value='maximize')[source]

get_metric = ‘thresholds’ at_metric = metric = ‘mcc’ at_value = value = ‘maximize’

a = [] b = [] for x in np.linspace(0, 1, 1000):

a += [cfms.get_metric_at_metric(‘thresholds’, ‘fpr’, x, subindex=True)] b += [cfms.get_thresh_at_metric(‘fpr’, x)]

a = np.array(a) b = np.array(b) d = (a - b) logger.info((d.min(), d.max()))

hardness_analysis(samples, infr=None, method='argmax')[source]

samples = pblm.samples

# TODO MWE with sklearn data

# ClfResult.make_single(ClfResult, clf, X_df, test_idx, labels, # data_key, feat_dims=None):

import sklearn.datasets iris = sklearn.datasets.load_iris()

# TODO: make this setup simpler pblm = ClfProblem() task_key, clf_key, data_key = ‘iris’, ‘RF’, ‘learn(all)’ X_df = pd.DataFrame(iris.data, columns=iris.feature_names) samples = MultiTaskSamples(X_df.index) samples.apply_indicators({‘iris’: {name: iris.target == idx

for idx, name in enumerate(iris.target_names)}})

samples.X_dict = {‘learn(all)’: X_df}

pblm.samples = samples pblm.xval_kw[‘type’] = ‘StratifiedKFold’ clf_list, res_list = pblm._train_evaluation_clf(

task_key, data_key, clf_key)

labels = pblm.samples.subtasks[task_key] res = ClfResult.combine_results(res_list, labels)

res.get_thresholds(‘mcc’, ‘maximize’)

predict_method = ‘argmax’

index
ishow_roc()[source]
classmethod make_single(clf, X_df, test_idx, labels, data_key, feat_dims=None)[source]

Make a result for a single cross validiation subset

missing_classes()[source]
print_report()[source]
report_auto_thresholds(threshes, verbose=True)[source]
report_thresholds(warmup=200)[source]
roc_score()[source]
roc_scores_ovr()[source]
roc_scores_ovr_hat()[source]
rrr(verbose=True, reload_module=True)

special class reloading function This function is often injected as rrr of classes

show_roc(class_name, **kwargs)[source]
class wbia.algo.verif.clf_helpers.IrisProblem[source]

Bases: wbia.algo.verif.clf_helpers.ClfProblem

Simple demo using the abstract clf problem to work on the iris dataset.

Example:
>>> # ENABLE_DOCTEST
>>> from wbia.algo.verif.clf_helpers import *  # NOQA
>>> pblm = IrisProblem()
>>> pblm.setup()
>>> pblm.samples
rrr(verbose=True, reload_module=True)

special class reloading function This function is often injected as rrr of classes

setup()[source]
class wbia.algo.verif.clf_helpers.MultiClassLabels[source]

Bases: utool.util_dev.NiceRepr

Used by samples to encode a single set of mutually exclusive labels. These can either be binary or multiclass.

import pandas as pd pd.options.display.max_rows = 10 # pd.options.display.max_rows = 20 pd.options.display.max_columns = 40 pd.options.display.width = 160
classmethod from_indicators(indicator, index=None, task_name=None)[source]
gen_one_vs_rest_labels()[source]

Example

>>> # ENABLE_DOCTEST
>>> from wbia.algo.verif.clf_helpers import *  # NOQA
>>> indicator = ut.odict([
>>>         ('state1', [0, 0, 0, 1]),
>>>         ('state2', [0, 0, 1, 0]),
>>>         ('state3', [1, 1, 0, 0]),
>>>     ])
>>> labels = MultiClassLabels.from_indicators(indicator, task_name='task1')
>>> sublabels = list(labels.gen_one_vs_rest_labels())
>>> sublabel = sublabels[0]
has_support()[source]
lookup_class_idx(class_name)[source]
make_histogram()[source]
one_vs_rest_task_names()[source]
print_info()[source]
rrr(verbose=True, reload_module=True)

special class reloading function This function is often injected as rrr of classes

target_type
y_bin
y_enc
class wbia.algo.verif.clf_helpers.MultiTaskSamples(index)[source]

Bases: utool.util_dev.NiceRepr

Handles samples (i.e. feature-label pairs) with a combination of non-mutually exclusive subclassification labels

CommandLine:
python -m wbia.algo.verif.clf_helpers MultiTaskSamples

Example

>>> # ENABLE_DOCTEST
>>> from wbia.algo.verif.clf_helpers import *  # NOQA
>>> samples = MultiTaskSamples([0, 1, 2, 3])
>>> tasks_to_indicators = ut.odict([
>>>     ('task1', ut.odict([
>>>         ('state1', [0, 0, 0, 1]),
>>>         ('state2', [0, 0, 1, 0]),
>>>         ('state3', [1, 1, 0, 0]),
>>>     ])),
>>>     ('task2', ut.odict([
>>>         ('state4', [0, 0, 0, 1]),
>>>         ('state5', [1, 1, 1, 0]),
>>>     ]))
>>> ])
>>> samples.apply_indicators(tasks_to_indicators)
apply_encoded_labels(y_enc, class_names, task_name)[source]

Adds labels for a specific task. Alternative to apply_indicators

Parameters:
  • y_enc (list) – integer label indicating the class for each sample
  • class_names (list) – list of strings indicating the class-domain
  • task_name (str) – key for denoting this specific task
apply_indicators(tasks_to_indicators)[source]

Adds labels for a specific task

Parameters:tasks_to_indicators (dict) –
takes the form:
{
`my_task_name1’ {
’class1’: [list of bools indicating class membership] … ‘classN’: [list of bools indicating class membership]

`my_task_nameN’: …

}

class_idx_basis_1d()[source]

1d-index version of class_name_basis

class_idx_basis_2d()[source]

2d-index version of class_name_basis

class_name_basis()[source]

corresponds with indexes returned from encoded1d

encoded_1d()[source]

Returns a unique label for each combination of samples

encoded_2d()[source]
group_ids
items()[source]
make_histogram()[source]

label histogram

print_info()[source]
rrr(verbose=True, reload_module=True)

special class reloading function This function is often injected as rrr of classes

stratified_kfold_indices(**xval_kw)[source]

TODO: check xval label frequency

subsplit_indices(subset_idx, **xval_kw)[source]

split an existing set

supported_tasks()[source]
class wbia.algo.verif.clf_helpers.XValConfig(**kwargs)[source]

Bases: wbia.dtool.base.Config

wbia.algo.verif.deploy module

class wbia.algo.verif.deploy.Deployer(dpath='.', pblm=None)[source]

Bases: object

Transforms a OneVsOne problem into a deployable model. Registers and loads published models.

deploy(task_key=None, publish=False)[source]

Trains and saves a classifier for deployment

Notes

A deployment consists of the following information
  • The classifier itself
  • Information needed to construct the input to the classifier
    • TODO: can this be encoded as an sklearn pipeline?
  • Metadata concerning what data the classifier was trained with
  • PUBLISH TO /media/hdd/PUBLIC/models/pairclf

Example

>>> # xdoctest: +REQUIRES(module:wbia_cnn, --slow)
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> params = dict(sample_method='random')
>>> pblm = OneVsOneProblem.from_empty('PZ_MTEST', **params)
>>> pblm.setup(with_simple=False)
>>> task_key = pblm.primary_task_key
>>> self = Deployer(dpath='.', pblm=pblm)
>>> deploy_info = self.deploy()
Ignore:
pblm.evaluate_classifiers(with_simple=False) res = pblm.task_combo_res[pblm.primary_task_key][‘RF’][‘learn(sum,glob)’]
ensure(task_key)[source]
find_latest_local()[source]
>>> self = Deployer()
>>> self.find_pretrained()
>>> self.find_latest_local()
find_latest_remote()[source]

Used to update the published dict

CommandLine:
python -m wbia.algo.verif.vsone find_latest_remote

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> self = Deployer()
>>> task_clf_names = self.find_latest_remote()
find_pretrained()[source]
fname_fmtstr = 'vsone.{species}.{task_key}.{clf_key}.{n_dims}.{hashid}'
fname_parts = ['vsone', '{species}', '{task_key}', '{clf_key}', '{n_dims}', '{hashid}']
load_published(ibs, species)[source]
meta_suffix = '.meta.json'
publish_info = {'path': '/data/public/models/pairclf', 'remote': 'cthulhu.dyn.wildme.io'}
published = {'giraffe_reticulated': {'match_state': 'vsone.giraffe_reticulated.match_state.RF.131.kqbaqnrdyxpjrzjd.ggr2.cPkl'}, 'zebra_grevys': {'match_state': 'vsone.zebra_grevys.match_state.RF.131.qwmzlhlnnsgzropq.cPkl'}, 'zebra_grevys+_canonical_': {'match_state': 'vsone.zebra_grevys+_canonical_.match_state.RF.107.cusnlyxbberandka.cPkl'}, 'zebra_mountain': {'match_state': 'vsone.zebra_mountain.match_state.RF.131.lciwhwikfycthvva.cPkl'}, 'zebra_plains': {'match_state': 'vsone.zebra_plains.match_state.RF.131.eurizlstehqjvlsu.cPkl'}}
rrr(verbose=True, reload_module=True)

special class reloading function This function is often injected as rrr of classes

wbia.algo.verif.oldvsone module

wbia.algo.verif.oldvsone.demo_single_pairwise_feature_vector()[source]
CommandLine:
python -m wbia.algo.verif.vsone demo_single_pairwise_feature_vector

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> match = demo_single_pairwise_feature_vector()
>>> print(match)

wbia.algo.verif.pairfeat module

class wbia.algo.verif.pairfeat.MatchConfig(**kwargs)[source]

Bases: wbia.dtool.base.Config

class wbia.algo.verif.pairfeat.PairFeatureConfig(**kwargs)[source]

Bases: wbia.dtool.base.Config

Config for building pairwise feature dimensions

I.E. Config to distil unordered feature correspondences into a fixed length vector.

class wbia.algo.verif.pairfeat.PairwiseFeatureExtractor(ibs=None, config={}, use_cache=True, verbose=1, match_config=None, pairfeat_cfg=None, global_keys=None, need_lnbnn=None, feat_dims=None)[source]

Bases: object

Parameters:
  • ibs (wbia.IBEISController) – image analysis api
  • match_config (dict) – config for building feature correspondences
  • pairfeat_cfg (dict) – config for making the pairwise feat vec
  • global_keys (list) – global keys to use
  • need_lnbnn (bool) – use LNBNN for enrichment
  • feat_dims (list) – subset of feature dimensions (from pruning) if None, then all dimensions are used
  • use_cache (bool) – turns on disk based caching (default = True)
  • verbose (int) – verbosity flag (default = 1)
CommandLine:
python -m wbia.algo.verif.pairfeat PairwiseFeatureExtractor

Example

>>> # ENABLE_DOCTEST
>>> from wbia.algo.verif.pairfeat import *  # NOQA
>>> import wbia
>>> ibs = wbia.opendb('testdb1')
>>> extr = PairwiseFeatureExtractor(ibs)
>>> edges = [(1, 2), (2, 3)]
>>> X = extr.transform(edges)
>>> featinfo = vt.AnnotPairFeatInfo(X.columns)
>>> print(featinfo.get_infostr())
transform(edges)[source]

Converts an annotation edge into their corresponding feature. By default this is a caching operation.

class wbia.algo.verif.pairfeat.VsOneFeatConfig(**kwargs)[source]

Bases: wbia.dtool.base.Config

keypoint params

class wbia.algo.verif.pairfeat.VsOneMatchConfig(**kwargs)[source]

Bases: wbia.dtool.base.Config

wbia.algo.verif.ranker module

TODO: rewrite the hotspotter lnbnn algo to be a generator

Wrapper around LNBNN hotspotter algorithm

class wbia.algo.verif.ranker.Ranker(ibs=None, config={})[source]

Bases: object

fit(daids, dnids=None)[source]
predict(qaids, qnids=None, prog_hook=None)[source]

wbia.algo.verif.sklearn_utils module

class wbia.algo.verif.sklearn_utils.PrefitEstimatorEnsemble(clf_list, voting='soft', weights=None)[source]

Bases: object

hacks around limitations of sklearn.ensemble.VotingClassifier

predict(X)[source]

Predict class labels for X.

Parameters:X ({array-like, sparse matrix}, shape = [n_samples, n_features]) – Training vectors, where n_samples is the number of samples and n_features is the number of features.
Returns:maj – Predicted class labels.
Return type:array-like, shape = [n_samples]
predict_proba(X)[source]

Predict class probabilities for X in ‘soft’ voting

class wbia.algo.verif.sklearn_utils.StratifiedGroupKFold(n_splits=3, shuffle=False, random_state=None)[source]

Bases: sklearn.model_selection._split._BaseKFold

Stratified K-Folds cross-validator with Grouping

Provides train/test indices to split data in train/test sets.

This cross-validation object is a variation of GroupKFold that returns stratified folds. The folds are made by preserving the percentage of samples for each class.

Parameters:n_splits (int, default=3) – Number of folds. Must be at least 2.
split(X, y, groups=None)[source]

Generate indices to split data into training and test set.

wbia.algo.verif.sklearn_utils.classification_report2(y_true, y_pred, target_names=None, sample_weight=None, verbose=True)[source]

References

https://csem.flinders.edu.au/research/techreps/SIE07001.pdf https://www.mathworks.com/matlabcentral/fileexchange/5648-bm-cm-?requestedDomain=www.mathworks.com Jurman, Riccadonna, Furlanello, (2012). A Comparison of MCC and CEN

Error Measures in MultiClass Prediction

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.sklearn_utils import *  # NOQA
>>> y_true = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3]
>>> y_pred = [1, 2, 1, 3, 1, 2, 2, 3, 2, 2, 3, 3, 2, 3, 3, 3, 1, 3]
>>> target_names = None
>>> sample_weight = None
>>> verbose = True
>>> report = classification_report2(y_true, y_pred, verbose=verbose)
Ignore:
>>> size = 100
>>> rng = np.random.RandomState(0)
>>> p_classes = np.array([.90, .05, .05][0:2])
>>> p_classes = p_classes / p_classes.sum()
>>> p_wrong   = np.array([.03, .01, .02][0:2])
>>> y_true = testdata_ytrue(p_classes, p_wrong, size, rng)
>>> rs = []
>>> for x in range(17):
>>>     p_wrong += .05
>>>     y_pred = testdata_ypred(y_true, p_wrong, rng)
>>>     report = classification_report2(y_true, y_pred, verbose='hack')
>>>     rs.append(report)
>>> import wbia.plottool as pt
>>> pt.qtensure()
>>> df = pd.DataFrame(rs).drop(['raw'], axis=1)
>>> delta = df.subtract(df['target'], axis=0)
>>> sqrd_error = np.sqrt((delta ** 2).sum(axis=0))
>>> print('Error')
>>> print(sqrd_error.sort_values())
>>> ys = df.to_dict(orient='list')
>>> pt.multi_plot(ydata_list=ys)
wbia.algo.verif.sklearn_utils.predict_from_probs(probs, method='argmax', target_names=None, **kwargs)[source]

Predictions are returned as indices into columns or target_names

Doctest:
>>> from wbia.algo.verif.sklearn_utils import *
>>> rng = np.random.RandomState(0)
>>> probs = pd.DataFrame(rng.rand(10, 3), columns=['a', 'b', 'c'])
>>> pred1 = predict_from_probs(probs, 'argmax')
>>> pred2 = predict_from_probs(probs, 'argmax', target_names=probs.columns)
>>> threshes = probs.loc[0]
>>> pred3 = predict_from_probs(probs, threshes.values, force=True,
>>>                            target_names=probs.columns)
wbia.algo.verif.sklearn_utils.predict_proba_df(clf, X_df, class_names=None)[source]

Calls sklearn classifier predict_proba but then puts results in a dataframe using the same index as X_df and incorporating all possible class_names given

wbia.algo.verif.sklearn_utils.predict_with_thresh(probs, threshes, target_names=None, force=False, multi=True, return_flags=False)[source]

if force is true, everything will make a prediction, even if nothing passes the thresholds. In that case it will use argmax.

if more than one thing passes the thresold we take the highest one if multi=True, and return nan otherwise.

Doctest:
>>> from wbia.algo.verif.sklearn_utils import *
>>> probs = np.array([
>>>     [0.5, 0.5, 0.0],
>>>     [0.4, 0.5, 0.1],
>>>     [1.0, 0.0, 0.0],
>>>     [0.3, 0.3, 0.4],
>>>     [0.1, 0.3, 0.6],
>>>     [0.1, 0.6, 0.3],
>>>     [0.6, 0.1, 0.3],])
>>> threshes = [.5, .5, .5]
>>> pred_enc = predict_with_thresh(probs, threshes)
>>> a = predict_with_thresh(probs, [.5, .5, .5])
>>> b = predict_with_thresh(probs, [.5, .5, .5], force=True)
>>> assert np.isnan(a).sum() == 3
>>> assert np.isnan(b).sum() == 0
wbia.algo.verif.sklearn_utils.temp(samples)[source]
wbia.algo.verif.sklearn_utils.testdata_ypred(y_true, p_wrong, rng)[source]
wbia.algo.verif.sklearn_utils.testdata_ytrue(p_classes, p_wrong, size, rng)[source]
wbia.algo.verif.sklearn_utils.voting_ensemble(clf_list, voting='hard')[source]

hack to construct a VotingClassifier from pretrained classifiers TODO: contribute similar functionality to sklearn

wbia.algo.verif.verifier module

class wbia.algo.verif.verifier.BaseVerifier[source]

Bases: utool.util_dev.NiceRepr

easiness(edges, real)[source]

Gets the probability of the class each edge is labeled as. Indicates how easy it is to classify this example.

fit(edges)[source]

The vsone.OneVsOneProblem currently handles fitting a model based on edges. The actual fit call is in clf_helpers.py

predict(edges, method='argmax', encoded=False)[source]
predict_proba_df(edges)[source]
rrr(verbose=True, reload_module=True)

special class reloading function This function is often injected as rrr of classes

class wbia.algo.verif.verifier.IntraVerifier(pblm, task_key, clf_key, data_key)[source]

Bases: wbia.algo.verif.verifier.BaseVerifier

Predicts cross-validated intra-training sample probs.

Note

Requires the original OneVsOneProblem object. This classifier is for intra-dataset evaulation and is not meant to be pushlished for use on external datasets.

predict_proba_df(want_edges)[source]
Predicts task probabilities in one of two ways:
  1. if the edge was in the training set then its cross-validated probability is returned.
  2. if the edge was not in the training set, then the average prediction over all cross validated classifiers are used.
rrr(verbose=True, reload_module=True)

special class reloading function This function is often injected as rrr of classes

class wbia.algo.verif.verifier.Verifier(ibs=None, deploy_info=None)[source]

Bases: wbia.algo.verif.verifier.BaseVerifier

Notes

deploy_info should be a dict with the following keys:

clf: sklearn classifier metadata: another dict with key:

class_names - classes that clf predicts task_key - str clf_key - str data_info - tuple of (feat_extract_config, feat_dims) # TODO: make feat dims part of feat_extract_config defaulted to None data_info - tuple of (feat_extract_config, feat_dims)

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> import wbia
>>> ibs = wbia.opendb('PZ_MTEST')
>>> speceis = 'zebra_plains'
>>> task_key = 'match_state'
>>> verif = Deployer()._load_published(ibs, species, task_key)
predict_proba_df(edges)[source]
rrr(verbose=True, reload_module=True)

special class reloading function This function is often injected as rrr of classes

wbia.algo.verif.vsone module

CommandLine:

# Test how well out-of-the-box vsone classifiers to: python -m wbia.algo.verif.vsone evaluate_classifiers –db DETECT_SEATURTLES

# Train a classifier for deployment # Will output to the current working directory python -m wbia.algo.verif.vsone deploy –db GZ_Master1

class wbia.algo.verif.vsone.AnnotPairSamples(ibs, aid_pairs, infr=None, apply=False)[source]

Bases: wbia.algo.verif.clf_helpers.MultiTaskSamples, ubelt.util_mixins.NiceRepr

Manages the different ways to assign samples (i.e. feat-label pairs) to 1-v-1 classification

CommandLine:
python -m wbia.algo.verif.vsone AnnotPairSamples

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> pblm = OneVsOneProblem.from_empty()
>>> pblm.load_samples()
>>> samples = AnnotPairSamples(pblm.ibs, pblm.raw_simple_scores, {})
>>> print(samples)
>>> samples.print_info()
>>> print(samples.sample_hashid())
>>> encode_index = samples.subtasks['match_state'].encoded_df.index
>>> indica_index = samples.subtasks['match_state'].indicator_df.index
>>> assert np.all(samples.index == encode_index)
>>> assert np.all(samples.index == indica_index)
apply_multi_task_binary_label()[source]
apply_multi_task_multi_label()[source]
apply_single_task_multi_label()[source]
compress(flags)[source]
edge_set_hashid()[source]

Faster than using ut.combine_uuids, because we condense and don’t bother casting back to UUIDS, and we just directly hash.

group_ids

Prevents samples with the same group-id from appearing in the same cross validation fold. For us this means any pair within the same name or between the same names will have the same groupid.

is_comparable()[source]
is_photobomb()[source]
is_same()[source]
print_featinfo()[source]
rrr(verbose=True, reload_module=True)

special class reloading function This function is often injected as rrr of classes

sample_hashid()[source]
set_feats(X_dict)[source]
set_simple_scores(simple_scores)[source]
task_label_hashid(task_key)[source]
task_sample_hashid(task_key)[source]
class wbia.algo.verif.vsone.OneVsOneProblem(infr=None, verbose=None, **params)[source]

Bases: wbia.algo.verif.clf_helpers.ClfProblem

Keeps information about the one-vs-one pairwise classification problem

CommandLine:

python -m wbia.algo.verif.vsone evaluate_classifiers python -m wbia.algo.verif.vsone evaluate_classifiers –db PZ_PB_RF_TRAIN python -m wbia.algo.verif.vsone evaluate_classifiers –db PZ_PB_RF_TRAIN –profile python -m wbia.algo.verif.vsone evaluate_classifiers –db PZ_MTEST –show python -m wbia.algo.verif.vsone evaluate_classifiers –db PZ_Master1 –show python -m wbia.algo.verif.vsone evaluate_classifiers –db GZ_Master1 –show python -m wbia.algo.verif.vsone evaluate_classifiers –db RotanTurtles –show

python -m wbia.algo.verif.vsone evaluate_classifiers –db testdb1 –show -a default

Example

>>> # xdoctest: +REQUIRES(module:wbia_cnn, --slow)
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> pblm = OneVsOneProblem.from_empty('PZ_MTEST')
>>> pblm.hyper_params['xval_kw']['n_splits'] = 10
>>> assert pblm.xval_kw.n_splits == 10
>>> pblm.xval_kw.n_splits = 5
>>> assert pblm.hyper_params['xval_kw']['n_splits'] == 5
>>> pblm.load_samples()
>>> pblm.load_features()
appname = 'vsone_rf_train'
auto_decisions_at_threshold(primary_task, task_probs, task_thresh, task_keys, clf_key, data_key)[source]
build_feature_subsets()[source]

Try to identify a useful subset of features to reduce problem dimensionality

CommandLine:

python -m wbia.algo.verif.vsone build_feature_subsets –db GZ_Master1 python -m wbia.algo.verif.vsone build_feature_subsets –db PZ_PB_RF_TRAIN

python -m wbia Chap4._setup_pblm –db GZ_Master1 –eval python -m wbia Chap4._setup_pblm –db PZ_Master1 –eval

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> pblm = OneVsOneProblem.from_empty('PZ_MTEST')
>>> pblm.load_samples()
>>> pblm.load_features()
>>> pblm.build_feature_subsets()
>>> pblm.samples.print_featinfo()
deploy(dpath='.', task_key=None, publish=False)[source]

Trains and saves a classifier for deployment

Parameters:
  • dpath (str) – where to save the deployable model
  • task_key (str) – task to train for (default match_state)
  • publish (bool) – if True will try to rsync the model and metadata to the publication server.

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> pblm = OneVsOneProblem.from_empty(defaultdb='PZ_MTEST',
>>>                                   sample_method='random')
>>> task_key = ut.get_argval('--task', default='match_state')
>>> publish = ut.get_argflag('--publish')
>>> pblm.deploy(task_key=task_key, publish=publish)

Notes

A deployment consists of the following information
  • The classifier itself
  • Information needed to construct the input to the classifier
    • TODO: can this be encoded as an sklearn pipeline?
  • Metadata concerning what data the classifier was trained with
  • PUBLISH TO /media/hdd/PUBLIC/models/pairclf
Ignore:
pblm.evaluate_classifiers(with_simple=False) res = pblm.task_combo_res[pblm.primary_task_key][‘RF’][‘learn(sum,glob)’]
deploy_all(dpath='.', publish=False)[source]
ensure_deploy_classifiers(dpath='.')[source]
evaluate_classifiers(with_simple=False)[source]
CommandLine:
python -m wbia.algo.verif.vsone evaluate_classifiers python -m wbia.algo.verif.vsone evaluate_classifiers –db PZ_MTEST python -m wbia.algo.verif.vsone evaluate_classifiers –db GZ_Master1 python -m wbia.algo.verif.vsone evaluate_classifiers –db GIRM_Master1

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> pblm = OneVsOneProblem.from_empty(defaultdb='PZ_MTEST',
>>>                                   sample_method='random')
>>> #pblm.default_clf_key = 'Logit'
>>> pblm.default_clf_key = 'RF'
>>> pblm.evaluate_classifiers()
evaluate_simple_scores(task_keys=None)[source]
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> pblm = OneVsOneProblem.from_empty()
>>> pblm.set_pandas_options()
>>> pblm.load_samples()
>>> pblm.load_features()
>>> pblm.evaluate_simple_scores()
extra_report(task_probs, is_auto, want_samples)[source]
feature_importance(task_key=None, clf_key=None, data_key=None)[source]
CommandLine:
python -m wbia.algo.verif.vsone report_importance –show python -m wbia.algo.verif.vsone report_importance –show –db PZ_PB_RF_TRAIN

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> pblm = OneVsOneProblem.from_empty('GZ_Master1')
>>> data_key = pblm.default_data_key
>>> clf_key = pblm.default_clf_key
>>> task_key = pblm.primary_task_key
>>> pblm.setup_evaluation()
>>> featinfo = pblm.feature_info(task_key, clf_key, data_key)
>>> ut.quit_if_noshow()
>>> import wbia.plottool as pt
>>> text = importances
>>> pt.wordcloud(featinfo.importances)
>>> ut.show_if_requested()
classmethod from_aids(ibs, aids, verbose=None, **params)[source]

Constructs a OneVsOneProblem from a subset of aids. Use pblm.load_samples to sample a set of pairs

classmethod from_empty(defaultdb=None, **params)[source]
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> defaultdb = 'GIRM_Master1'
>>> pblm = OneVsOneProblem.from_empty(defaultdb)
classmethod from_labeled_aidpairs(ibs, labeled_aid_pairs, class_names, task_name, **params)[source]

Build a OneVsOneProblem directly from a set of aid pairs. It is not necessary to call pblm.load_samples.

Parameters:
  • ibs (IBEISController) –
  • labeled_aid_pairs (list) – tuples of (aid1, aid2, int_label)
  • class_names (list) – list of names corresponding to integer labels
  • task_name (str) – identifier for the task (e.g. custom_match_state)
load_features(use_cache=True, with_simple=False)[source]
CommandLine:
python -m wbia.algo.verif.vsone load_features –profile

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> #pblm = OneVsOneProblem.from_empty('GZ_Master1')
>>> pblm = OneVsOneProblem.from_empty('PZ_PB_RF_TRAIN')
>>> pblm.load_samples()
>>> pblm.load_features(with_simple=False)
load_samples()[source]
CommandLine:
python -m wbia.algo.verif.vsone load_samples –profile

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> #pblm = OneVsOneProblem.from_empty('PZ_MTEST')
>>> #pblm = OneVsOneProblem.from_empty('PZ_PB_RF_TRAIN')
>>> pblm = OneVsOneProblem.from_empty('PZ_Master1')
>>> pblm.load_samples()
>>> samples = pblm.samples
>>> samples.print_info()
load_simple_scores()[source]
make_graph_based_bootstrap_pairs()[source]

Sampling method for when you want to bootstrap VAMP after several reviews.

Sample pairs for VAMP training using manually reviewed edges and mines other (random) pairs as needed.

We first sample a base set via:
  1. take all manually reviewed positive edges (not in an inconsistent PCC)
  2. take all manually reviewed negative edges (not touching an inconsistent PCC)

(3) take all manually reviewed incomparable edges. Note: it is important to ignore any PCC currently in an inconsistent state.

We can then generate additional positive samples by sampling automatically reviewed positive edges within PCCs.

We can do the same for negatives.

make_lnbnn_training_pairs()[source]
make_randomized_training_pairs()[source]

Randomized sample that does not require LNBNN

make_training_pairs()[source]
CommandLine:
python -m wbia.algo.verif.vsone make_training_pairs –db PZ_Master1

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> pblm = OneVsOneProblem.from_empty('PZ_MTEST')
>>> pblm.make_training_pairs()
prune_features()[source]

References

http://blog.datadive.net/selecting-good-features-part-iii-random-forests/ http://alexperrier.github.io/jekyll/update/2015/08/27/feature-importance-random-forests-gini-accuracy.html https://arxiv.org/abs/1407.7502 https://github.com/glouppe/phd-thesis

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> pblm = OneVsOneProblem.from_empty(defaultdb='PZ_MTEST')
>>> pblm = OneVsOneProblem.from_empty(defaultdb='PZ_PB_RF_TRAIN')
>>> pblm = OneVsOneProblem.from_empty(defaultdb='PZ_Master1')
Ignore:
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> pblm = OneVsOneProblem.from_empty(defaultdb='GZ_Master1')
>>> pblm.setup_evaluation()
qt_review_hardcases()[source]

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> pblm = OneVsOneProblem.from_empty('PZ_Master1')
>>> #pblm = OneVsOneProblem.from_empty('GIRM_Master1')
>>> #pblm = OneVsOneProblem.from_empty('PZ_PB_RF_TRAIN')
>>> pblm.evaluate_classifiers()
>>> win = pblm.qt_review_hardcases()
Ignore:
>>> from wbia.scripts.postdoc import *
>>> self = VerifierExpt('RotanTurtles')
>>> self = VerifierExpt('humpbacks_fb')
>>> import wbia
>>> self._precollect()
>>> ibs = self.ibs
>>> aids = self.aids_pool
>>> pblm = vsone.OneVsOneProblem.from_aids(ibs, aids)
>>> infr = pblm.infr
>>> infr.params['algo.hardcase'] = True
>>> infr.params['autoreview.enabled'] = False
>>> infr.params['redun.enabled'] = False
>>> infr.params['ranking.enabled'] = False
>>> win = infr.qt_review_loop()
>>> pblm.eval_data_keys = [pblm.default_data_key]
>>> pblm.eval_clf_keys = [pblm.default_clf_key]
>>> pblm.evaluate_classifiers()
Ignore:
>>> # TEST to ensure we can priorizite reviewed edges without inference
>>> import networkx as nx
>>> from wbia.algo.graph import demo
>>> kwargs = dict(num_pccs=6, p_incon=.4, size_std=2)
>>> infr = demo.demodata_infr(**kwargs)
>>> infr.params['redun.pos'] = 1
>>> infr.params['redun.neg'] = 1
>>> infr.apply_nondynamic_update()
>>> edges = list(infr.edges())
>>> prob_match = ut.dzip(edges, infr.dummy_matcher.predict(edges))
>>> infr.set_edge_attrs('prob_match', prob_match)
>>> infr.params['redun.enabled'] = True
>>> infr.prioritize('prob_match', edges)
>>> order = []
>>> while True:
>>>     order.append(infr.pop())
>>> print(len(order))
report_classifier_importance2(clf, data_key=None)[source]
report_evaluation()[source]
CommandLine:
python -m wbia.algo.verif.vsone report_evaluation –db PZ_MTEST

Example

>>> # DISABLE_DOCTEST
>>> from wbia.algo.verif.vsone import *  # NOQA
>>> pblm = OneVsOneProblem.from_empty(defaultdb='PZ_MTEST',
>>>                                   sample_method='random')
>>> pblm.eval_clf_keys = ['MLP', 'Logit', 'RF']
>>> pblm.eval_data_keys = ['learn(sum,glob)']
>>> pblm.setup_evaluation(with_simple=False)
>>> pblm.report_evaluation()
report_importance(task_key, clf_key, data_key)[source]
report_simple_scores(task_key=None)[source]
rrr(verbose=True, reload_module=True)

special class reloading function This function is often injected as rrr of classes

setup(with_simple=False)[source]
setup_evaluation(with_simple=False)[source]
task_evaluation_report(task_key)[source]

clf_keys = [pblm.default_clf_key]

class wbia.algo.verif.vsone.PairSampleConfig(**kwargs)[source]

Bases: wbia.dtool.base.Config

Module contents

wbia.algo.verif.IMPORT_TUPLES = [('clf_helpers', None), ('sklearn_utils', None), ('vsone', None), ('deploy', None), ('verifier', None), ('pairfeat', None)]

cd /home/joncrall/code/wbia/wbia/algo/verif makeinit.py –modname=wbia.algo.verif

Type:Regen Command
wbia.algo.verif.reassign_submodule_attributes(verbose=1)[source]

Updates attributes in the __init__ modules with updated attributes in the submodules.

wbia.algo.verif.reload_subs(verbose=1)[source]

Reloads wbia.algo.verif and submodules

wbia.algo.verif.rrrr(verbose=1)

Reloads wbia.algo.verif and submodules