reacnetgenerator package

reacnetgenerator

ReacNetGenerator is an automatic reaction network generator for reactive molecular dynamics simulation.

Notes

Please cite: ReacNetGenerator: an automatic reaction network generator for reactive molecular dynamic simulations, Phys. Chem. Chem. Phys., 2020, 22 (2): 683–691, doi: 10.1039/C9CP05091D

class reacnetgenerator.ReacNetGenerator(**kwargs)[source]

Bases: object

Use ReacNetGenerator for trajectory analysis.

Parameters:
inputfiletype: str
The type of the input file. The following type is allowed:
inputfilename: str or list of strs

The filename(s) of the input file, which can be either relative path or absolute path. If it is a list, the files will be read in order.

atomname: tuple of strs

The list of the atomic names in the input file, such as (‘C’, ‘H’, ‘O’). It should match the order of that in the input file.

runHMM: bool, optional, default: True

Process trajectory with Hidden Markov Model (HMM) or not. If the user find too many species are filtered, they can turn off this option.

miso: int, optional, default: 0

Merge the isomers and the highest frequency is used as the representative. 0, off two available levels: 1, merge the isomers with same atoms and same bond-network but different bond levels; 2, merge the isomers with same atoms with different bond-network.

pbc: bool, optional, default: True

Use periodic boundary conditions (PBC) or not.

cell: (3,3) array_like or (3,) array_like or (9,) array_like, optional, default: None

The cell (box size) of the system. If None (default), the cell will be read from the input file. If the input file doesn’t have cell information, this parameter will be necessary.

nproc: int, optional, default: None

The number of processors used for analysis. If None (default), the program will try to use all processors.

selectatoms: str, optional, default: None

Select an element from the atomic names, such as C, and only show species with this element in the reaction network. If None (default), the network will show all elements.

split: int, optional, default: None

Split number for the time axis. For example, if set to 10, the whole trajectroy will be divided into 10 parts and reactions of each part will be shown.

a: (2,2) array_like, optional, default: [[0.999, 0.001], [0.001, 0.009]]

Transition matrix A of HMM parameters. It is recommended for users to choose their own parameters. See the paper for details.

b: (2,2) array_like, optional, default: [[0.6, 0.4], [0.4, 0.6]]

Emission matrix B of HMM parameters. It is recommended for users to choose their own parameters. See the paper for details.

Examples

>>> from reacnetgenerator import ReacNetGenerator
>>> rng=ReacNetGenerator(inputfiletype="dump", inputfilename="dump.ch4", atomname=['C', 'H', 'O'])
>>> rng.runanddraw()

Methods

Status(value)

ReacNetGenerator status.

draw()

Draw the reaction network, i.e. NETWORK step.

report()

Generate the analysis report, i.e. REPORT step.

run()

Process MD trajectory, including DOWNLOAD, DETECT, HMM, PATH, and MATRIX steps.

runanddraw([run, draw, report])

Analyze the trajectory from MD simulation.

class Status(value)[source]

Bases: Enum

ReacNetGenerator status.

The ReacNetGenerator consists of several modules and algorithms to process the information from the given trajectory, including:

  • DOWNLOAD: Download trajectory from urls

  • DETECT: Read bond information and detect molecules

  • HMM: HMM filter

  • MISO: Merge isomers

  • PATH: Indentify isomers and collect reaction paths

  • MATRIX: Reaction matrix generation

  • NETWORK: Draw reaction network

  • REPORT: Generate analysis report

DETECT = 'Read bond information and detect molecules'
DOWNLOAD = 'Download trajectory'
HMM = 'HMM filter'
INIT = 'Init'
MATRIX = 'Reaction matrix generation'
MISO = 'Merge isomers'
NETWORK = 'Draw reaction network'
PATH = 'Indentify isomers and collect reaction paths'
REPORT = 'Generate analysis report'
draw()[source]

Draw the reaction network, i.e. NETWORK step.

Parameters:
None
report()[source]

Generate the analysis report, i.e. REPORT step.

Parameters:
None
run()[source]

Process MD trajectory, including DOWNLOAD, DETECT, HMM, PATH, and MATRIX steps.

Parameters:
None
runanddraw(run=True, draw=True, report=True)[source]

Analyze the trajectory from MD simulation.

Parameters:
run: bool, optional, default: True

Process the trajectory or not, including DOWNLOAD, DETECT, HMM, PATH, and MATRIX steps.

draw: bool, optional, default: True

Draw the reaction network or not, i.e. NETWORK step.

report: bool, optional, default: True

Generate the analysis report, i.e. NETWORK step.

Submodules

reacnetgenerator.commandline module

reacnetgenerator.commandline.main_parser() ArgumentParser[source]

Returns main parser.

Returns:
argparse.ArgumentParser

reacnetgenerator cli parser

reacnetgenerator.commandline.parm2cmd(pp)[source]

reacnetgenerator.dps module

Connect molecule with Depth-First Search.

reacnetgenerator.dps.dps()
reacnetgenerator.dps.dps_reaction()

A+B->C+D

reacnetgenerator.gui module

reacnetgenerator.reacnetgen module

ReacNetGenerator: an automatic reaction network generator for reactive molecular dynamics simulation.

Please cite: ReacNetGenerator: an automatic reaction network generator for reactive molecular dynamic simulations, Phys. Chem. Chem. Phys., 2020, 22 (2): 683–691, doi: 10.1039/C9CP05091D

Jinzhe Zeng (jinzhe.zeng@rutgers.edu), Tong Zhu (tzhu@lps.ecnu.edu.cn)

Features

  • Processing of MD trajectory containing atomic coordinates or bond orders

  • Hidden Markov Model (HMM) based noise filtering

  • Isomers identifying accoarding to SMILES

  • Generation of reaction network for visualization using force-directed algorithm

  • Parallel computing

Simple example

ReacNetGenerator can process any kind of trajectory files containing atomic coordinates, e.g. a LAMMPS dump file prepared by running “dump 1 all custom 100 dump.reaxc id type x y z” in LAMMPS: $ reacnetgenerator –type dump -i dump.reaxc -a C H O where C, H, and O are atomic names in the input file. Analysis report will be generated automatically.

Also, ReacNetGenerator can process files containing bond information, e.g. LAMMPS bond file: $ reacnetgenerator –type bond -i bonds.reaxc -a C H O

You can running the following script for help: $ reacnetgenerator -h

class reacnetgenerator.reacnetgen.ReacNetGenerator(**kwargs)[source]

Bases: object

Use ReacNetGenerator for trajectory analysis.

Parameters:
inputfiletype: str
The type of the input file. The following type is allowed:
inputfilename: str or list of strs

The filename(s) of the input file, which can be either relative path or absolute path. If it is a list, the files will be read in order.

atomname: tuple of strs

The list of the atomic names in the input file, such as (‘C’, ‘H’, ‘O’). It should match the order of that in the input file.

runHMM: bool, optional, default: True

Process trajectory with Hidden Markov Model (HMM) or not. If the user find too many species are filtered, they can turn off this option.

miso: int, optional, default: 0

Merge the isomers and the highest frequency is used as the representative. 0, off two available levels: 1, merge the isomers with same atoms and same bond-network but different bond levels; 2, merge the isomers with same atoms with different bond-network.

pbc: bool, optional, default: True

Use periodic boundary conditions (PBC) or not.

cell: (3,3) array_like or (3,) array_like or (9,) array_like, optional, default: None

The cell (box size) of the system. If None (default), the cell will be read from the input file. If the input file doesn’t have cell information, this parameter will be necessary.

nproc: int, optional, default: None

The number of processors used for analysis. If None (default), the program will try to use all processors.

selectatoms: str, optional, default: None

Select an element from the atomic names, such as C, and only show species with this element in the reaction network. If None (default), the network will show all elements.

split: int, optional, default: None

Split number for the time axis. For example, if set to 10, the whole trajectroy will be divided into 10 parts and reactions of each part will be shown.

a: (2,2) array_like, optional, default: [[0.999, 0.001], [0.001, 0.009]]

Transition matrix A of HMM parameters. It is recommended for users to choose their own parameters. See the paper for details.

b: (2,2) array_like, optional, default: [[0.6, 0.4], [0.4, 0.6]]

Emission matrix B of HMM parameters. It is recommended for users to choose their own parameters. See the paper for details.

Examples

>>> from reacnetgenerator import ReacNetGenerator
>>> rng=ReacNetGenerator(inputfiletype="dump", inputfilename="dump.ch4", atomname=['C', 'H', 'O'])
>>> rng.runanddraw()

Methods

Status(value)

ReacNetGenerator status.

draw()

Draw the reaction network, i.e. NETWORK step.

report()

Generate the analysis report, i.e. REPORT step.

run()

Process MD trajectory, including DOWNLOAD, DETECT, HMM, PATH, and MATRIX steps.

runanddraw([run, draw, report])

Analyze the trajectory from MD simulation.

class Status(value)[source]

Bases: Enum

ReacNetGenerator status.

The ReacNetGenerator consists of several modules and algorithms to process the information from the given trajectory, including:

  • DOWNLOAD: Download trajectory from urls

  • DETECT: Read bond information and detect molecules

  • HMM: HMM filter

  • MISO: Merge isomers

  • PATH: Indentify isomers and collect reaction paths

  • MATRIX: Reaction matrix generation

  • NETWORK: Draw reaction network

  • REPORT: Generate analysis report

DETECT = 'Read bond information and detect molecules'
DOWNLOAD = 'Download trajectory'
HMM = 'HMM filter'
INIT = 'Init'
MATRIX = 'Reaction matrix generation'
MISO = 'Merge isomers'
NETWORK = 'Draw reaction network'
PATH = 'Indentify isomers and collect reaction paths'
REPORT = 'Generate analysis report'
draw()[source]

Draw the reaction network, i.e. NETWORK step.

Parameters:
None
report()[source]

Generate the analysis report, i.e. REPORT step.

Parameters:
None
run()[source]

Process MD trajectory, including DOWNLOAD, DETECT, HMM, PATH, and MATRIX steps.

Parameters:
None
runanddraw(run=True, draw=True, report=True)[source]

Analyze the trajectory from MD simulation.

Parameters:
run: bool, optional, default: True

Process the trajectory or not, including DOWNLOAD, DETECT, HMM, PATH, and MATRIX steps.

draw: bool, optional, default: True

Draw the reaction network or not, i.e. NETWORK step.

report: bool, optional, default: True

Generate the analysis report, i.e. NETWORK step.

reacnetgenerator.tools module

Useful methods to futhur process ReacNetGenerator results.

reacnetgenerator.tools.calculate_rate(specfile: str, reacfile: str, cell: ndarray, timestep: float) Dict[str, float][source]

Calculate the rate constant of each reaction.

The rate constants are calculated by the method developed in [1].

Parameters:
specfilestr

The species file.

reacfilestr

The reactions file.

cellnp.ndarray

The cell with the shape (3, 3). Unit: Angstrom.

timestepfloat

The time step. Unit: femtosecond.

Returns:
ratesDict[str, float]

The rate of each reaction. The dict key is the reaction SMILES. The value is in unit of [(cm^3/mol)s^(-1)].

References

[1]

J Comput Chem 40, 16, 1586-1592.

reacnetgenerator.tools.read_reactions(reacfile) List[Tuple[int, Counter, str]][source]

Read reactions from the reactions file (ends with .reactionsabcd).

For accuracy, HMM filter should be disabled.

Parameters:
reacfilestr

The reactions file.

Returns:
occsList[Tuple[int, Counter, str]]

The number of occurences of each reaction. The tuple is (occurence, counter_reactants, reaction).

reacnetgenerator.tools.read_species(specfile: str) Tuple[List[int], Dict[str, ndarray]][source]

Read species from the species file (ends with .species).

For accuracy, HMM filter should be disabled.

Parameters:
specfilestr

The species file.

Returns:
step_idxnp.ndarray

The index of the step.

n_speciesDict[str, np.ndarray]

The number of species in each step. The dict key is the species SMILES.

Examples

Plot the number of methane in each step.

>>> from reacnetgenerator.tools import read_species
>>> import matplotlib.pyplot as plt
>>> step_idx, n_species = read_species('methane.species')
>>> plt.plot(step_idx, n_species['[H]C([H])([H])[H]'])
>>> plt.savefig("methane.svg")

reacnetgenerator.utils module

Provide utils for ReacNetGenerator.

class reacnetgenerator.utils.SCOUROPTIONS[source]

Bases: object

Scour (SVG optimization) options.

enable_viewboxing = True
newlines = False
remove_descriptions = True
remove_descriptive_elements = True
remove_metadata = True
remove_titles = True
shorten_ids = True
strip_comments = True
strip_ids = True
strip_xml_prolog = True
strip_xml_space_attribute = True
class reacnetgenerator.utils.SharedRNGData(rng, usedRNGKeys, returnedRNGKeys, extraNoneKeys=None)[source]

Bases: object

Share ReacNetGenerator data with a submodule.

Parameters:
rng: reacnetgenerator.ReacNetGenerator

The centered ReacNetGenerator class.

usedRNGKeys: list of strs

Keys that needs to pass from ReacNetGenerator class to the submodule.

returnedRNGKeys: list of strs

Keys that needs to pass from the submodule to ReacNetGenerator class.

extraNoneKeys: list of strs, optional, default: None

Set keys to None, which will be used in the submodule.

Methods

returnkeys()

Return back keys to ReacNetGenerator class.

returnkeys()[source]

Return back keys to ReacNetGenerator class.

class reacnetgenerator.utils.WriteBuffer(f, linenumber=1200, sep=None)[source]

Bases: object

Store a buffer for writing files.

It is expensive to write to a file, so we need to make a buffer.

Parameters:
f: fileObject

The file object to write.

linenumber: int, default: 1200

The number of contents to store in the buffer. The buffer will be flushed if it exceeds the set number.

sep: str or bytes, default: None

The separator for contents. If None (default), there will be no separator.

Methods

append(text)

Append a text.

check()

Check if the number of stored contents exceeds.

extend(text)

Extend texts.

flush()

Flush the buffer.

append(text)[source]

Append a text.

Parameters:
text: str

The text to be appended.

check()[source]

Check if the number of stored contents exceeds.

If so, the buffer will be flushed.

extend(text)[source]

Extend texts.

flush()[source]

Flush the buffer.

reacnetgenerator.utils.appendIfNotNone(f, wbytes)[source]
reacnetgenerator.utils.bytestolist(x)[source]

Convert a compressed line to an object.

Parameters:
x: bytes

The compressed line.

Returns:
object

The decompressed object.

reacnetgenerator.utils.checksha256(filename, sha256_check)[source]

Check sha256 of a file is correct.

Parameters:
filename: str

The filename.

sha256_check: str or list of strs

The sha256 to be checked.

Returns:
bool

Indicate whether sha256 is correct.

reacnetgenerator.utils.compress(x, isbytes=False)[source]

Compress the line.

This function reduces IO overhead to speed up the program. The functions will use lz4 to compress and base64 to encode, since lz4 has better performance that any others.

Parameters:
x: str or bytes

The line to compress.

isbytes: bool, optional, default: False

If x is bytes. If not, x will be converted to bytes first.

Returns:
bytes

The compressed line, with a linebreak in the end.

reacnetgenerator.utils.decompress(x, isbytes=False)[source]

Decompress the line.

Parameters:
x: bytes

The line to decompress.

isbytes: bool, optional, default: False

If the decompressed content is bytes. If not, the line will be decoded.

Returns:
str or bytes

The decompressed line.

async reacnetgenerator.utils.download_file(urls, pathfilename, sha256)[source]

Download files from remote urls if not exists.

Parameters:
urls: str or list of strs

The url(s) that is available to download.

pathfilename: str

The downloading path of the file.

sha256: str

Sha256 of the file. If not None and match the file, the download will be skiped.

Returns:
pathfilename: str

The downloading path of the file.

reacnetgenerator.utils.download_multifiles(urls)[source]

Download multiple files from dicts.

Parameters:
urls: list of dicts
The information of download files. Each dict should contain the following key:
  • url: str or list of strs

    The url(s) that is available to download.

  • pathfilename: str

    The downloading path of the file.

  • sha256: str, optional, default: None

    Sha256 of the file. If not None and match the file, the download will be skiped.

async reacnetgenerator.utils.gather_download_files(urls)[source]

See download_multifiles function for details.

reacnetgenerator.utils.listtobytes(x)[source]

Convert an object to a compressed line.

Parameters:
x: object

The object to convert, such as numpy.ndarray.

Returns:
bytes

The compressed line.

reacnetgenerator.utils.listtostirng(l, sep)[source]

Convert a list to string, that is easier to store.

Parameters:
l: list of strs or lists

The list to convert, which can contain any number of dimensions.

sep: list of strs

The seperators for each dimension.

Returns:
str

The converted string.

reacnetgenerator.utils.multiopen(pool, func, l, semaphore=None, nlines=None, unordered=True, return_num=False, start=0, extra=None, interval=None, bar=True, desc=None, unit='it', total=None)[source]

Returns an interated object for process a file with multiple processors.

Parameters:
pool: multiprocessing.Pool

The pool for multiprocessing.

func: function

The function to process lines.

l: File object

The file object.

semaphore: multiprocessing.Semaphore, optional, default: None

The semaphore to acquire. If None (default), the object will be passed without control.

nlines: int, optional, default: None

The number of lines to pass to the function each time. If None (default), only one line will be passed to the function.

unordered: bool, optional, default: True

Whether the process can be unordered.

return_num: bool, optional, default: False

If True, adds a counter to an iterable.

start: int, optional, default: 0

The start number of the counter.

extra: object, optional, default: None

The extra object passed to the item.

interval: obj, optional, default: None

The interval of items that will be passed to the function. For example, if set to 10, a item will be passed once every 10 items and others will be dropped.

bar: bool, optional, default: True

If True, show a tqdm bar for the iteration.

desc: str, optional, default: None

The description of the iteration shown in the bar.

unit: str, optional, default: it

The unit of the iteration shown in the bar.

total: int, optional, default: None

The total number of the iteration shown in the bar.

Returns:
object

An object that can be iterated.

reacnetgenerator.utils.must_be_list(obj)[source]

Convert a object to a list if the object is not a list.

Parameters:
obj: Object

The object to convert.

Returns:
obj: list

If the input object is not a list, returns a list that only contains that object. Otherwise, returns that object.

reacnetgenerator.utils.produce(semaphore, plist, parameter)[source]

Item producer with a semaphore.

Prevent large memory usage due to slow IO.

Parameters:
semaphore: multiprocessing.Semaphore

The semaphore to acquire.

plist: list of objects

The list of items to be passed.

parameter: object

The parameter yielded with each item.

reacnetgenerator.utils.run_mp(nproc, **arg)[source]

Process a file with multiple processors.

Parameters:
nproc: int

The number of processors to be used.

Other parameters can be found in the `multiopen` function.

reacnetgenerator.utils_np module

reacnetgenerator.utils_np.check_zero_signal()

Benchmark for 1,000,000 loops (1000/2000): Cython: 1.45 s Python: 3.67 s

reacnetgenerator.utils_np.idx_to_signal()

Benchmark for 1,000,000 loops (step=250000): Cython: 18.61 s Python: 31.30 s