mysql_lib api documentation

class mysql_lib.mysql_source[source]

Bases: object

connect_db_buffered()[source]

The method creates a new connection to the mysql database. The connection is made using the dictionary type cursor factory, which is buffered.

connect_db_unbuffered()[source]

The method creates a new connection to the mysql database. The connection is made using the unbuffered cursor factory.

copy_data(schema, table)[source]

The method copy the data between the origin and destination table. The method locks the table read only mode and gets the log coordinates which are returned to the calling method.

Parameters:
  • schema – the origin’s schema
  • table – the table name
Returns:

the log coordinates for the given table

Return type:

dictionary

create_destination_schemas()[source]

Creates the loading schemas in the destination database and associated tables listed in the dictionary self.schema_tables. The method builds a dictionary which associates the destination schema to the loading schema. The loading_schema is named after the destination schema plus with the prefix _ and the _tmp suffix. As postgresql allows, by default up to 64 characters for an identifier, the original schema is truncated to 59 characters, in order to fit the maximum identifier’s length. The mappings are stored in the class dictionary schema_loading.

create_destination_tables()[source]

The method creates the destination tables in the loading schema. The tables names are looped using the values stored in the class dictionary schema_tables.

disconnect_db_buffered()[source]

The method disconnects any connection with dictionary type cursor from the mysql database.

disconnect_db_unbuffered()[source]

The method disconnects any unbuffered connection from the mysql database.

drop_loading_schemas()[source]

The method drops the loading schemas from the destination database. The drop is performed on the schemas generated in create_destination_schemas. The method assumes the class dictionary schema_loading is correctly set.

generate_select_statements(schema, table)[source]

The generates the csv output and the statements output for the given schema and table. The method assumes there is a buffered database connection active.

Parameters:
  • schema – the origin’s schema
  • table – the table name
Returns:

the select list statements for the copy to csv and the fallback to inserts.

Return type:

dictionary

get_foreign_keys_metadata()[source]

The method collects the foreign key metadata for the detach replica process. Currently doesn’t get the ON UPDATE/ON DELETE triggers

get_master_coordinates()[source]

The method gets the master’s coordinates and return them stored in a dictionary. The method assumes there is a database connection active.

Returns:the master’s log coordinates for the given table
Return type:dictionary
get_table_list()[source]

The method pulls the table list from the information_schema. The list is stored in a dictionary which key is the table’s schema.

get_table_metadata(table, schema)[source]

The method builds the table’s metadata querying the information_schema. The data is returned as a dictionary.

Parameters:
  • table – The table name
  • schema – The table’s schema
Returns:

table’s metadata as a cursor dictionary

Return type:

dictionary

get_table_type_map()[source]

The method builds a dictionary with a key per each schema replicated. Each key maps a dictionary with the schema’s tables stored as keys and the column/type mappings. The dictionary is used in the read_replica method, to determine whether a field requires hexadecimal conversion.

init_replica()[source]

The method performs a full init replica for the given sources

insert_table_data(ins_arg)[source]

This method is a fallback procedure whether copy_table_data fails. The ins_args is a list with the informations required to run the select for building the insert statements and the slices’s start and stop. The process is performed in memory and can take a very long time to complete.

Parameters:
  • pg_engine – the postgresql engine
  • ins_arg – the list with the insert arguments (slice_insert, schema, table, select_stat,column_list, copy_limit)
lock_table(schema, table)[source]

The method flushes the given table with read lock. The method assumes there is a database connection active.

Parameters:
  • schema – the origin’s schema
  • table – the table name
print_progress(iteration, total, schema, table)[source]

Print the copy progress in slices and estimated total slices. In order to reduce noise when the log level is info only the tables copied in multiple slices get the print progress.

Parameters:
  • iteration – The slice number currently processed
  • total – The estimated total slices
  • table_name – The table name
read_replica()[source]

The method gets the batch data from PostgreSQL. If the batch data is not empty then method read_replica_stream is executed to get the rows from the mysql replica stored into the PostgreSQL database. When the method exits the replica_data list is decomposed in the master_data (log name, position and last event’s timestamp). If the flag close_batch is set then the master status is saved in PostgreSQL the batch id returned by the method is is saved in the class variable id_batch. This variable is used to determine whether the old batch should be closed or not. If the variable is not empty then the previous batch gets closed with a simple update of the processed flag.

read_replica_stream(batch_data)[source]

Stream the replica using the batch data. This method evaluates the different events streamed from MySQL and manages them accordingly. The BinLogStreamReader function is called with the only_event parameter which restricts the event type received by the streamer. The events managed are the following. RotateEvent which happens whether mysql restarts or the binary log file changes. QueryEvent which happens when a new row image comes in (BEGIN statement) or a DDL is executed. The BEGIN is always skipped. The DDL is parsed using the sql_token class. [Write,Update,Delete]RowEvents are the row images pulled from the mysql replica.

The RotateEvent and the QueryEvent cause the batch to be closed.

The for loop reads the row events, builds the dictionary carrying informations like the destination schema, the binlog coordinates and store them into the group_insert list. When the number of events exceeds the replica_batch_size the group_insert is written into PostgreSQL. The batch is not closed in that case and the method exits only if there are no more rows available in the stream. Therefore the replica_batch_size is just the maximum size of the single insert and the size of replayed batch on PostgreSQL. The binlog switch or a captured DDL determines whether a batch is closed and processed.

The update row event stores in a separate key event_before the row image before the update. This is required to allow updates where the primary key is updated as well.

Each row event is scanned for data types requiring conversion to hex string.

Parameters:batch_data – The list with the master’s batch data.
Returns:the batch’s data composed by binlog name, binlog position and last event timestamp read from the mysql replica stream.
Return type:dictionary
refresh_schema()[source]

The method performs a sync for an entire schema within a source. The method works in a similar way like init_replica. The swap happens in a single transaction.

set_copy_max_memory()[source]

The method sets the class variable self.copy_max_memory using the value stored in the source setting.

sync_tables()[source]

The method performs a sync for specific tables. The method works in a similar way like init_replica except when swapping the relations. The tables are loaded into a temporary schema and the log coordinates are stored with the table in the replica catalogue. When the load is complete the method drops the existing table and changes the schema for the loaded tables to the destination schema. The swap happens in a single transaction.